Eric Lee / smarc-fsl-linux-kernel

28 May, 2016

10 commits

7ded384a1 mm: fix section mismatch warning ... Browse Code »

The register_page_bootmem_info_node() function needs to be marked __init
in order to avoid a new warning introduced by commit f65e91df25aa ("mm:
use early_pfn_to_nid in register_page_bootmem_info_node").

Otherwise you'll get a warning about how a non-init function calls
early_pfn_to_nid (which is __meminit)

Cc: Yang Shi
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-05-28 06:23:32 +0800
11e685672 mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM ... Browse Code »

When we have !NO_BOOTMEM, the deferred page struct initialization
doesn't work well because the pages reserved in bootmem are released to
the page allocator uncoditionally. It causes memory corruption and
system crash eventually.

As Mel suggested, the bootmem is retiring slowly. We fix the issue by
simply hiding DEFERRED_STRUCT_PAGE_INIT when bootmem is enabled.

Link: http://lkml.kernel.org/r/1460602170-5821-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2016-05-28 05:49:37 +0800
7cf7806ce mm/memcontrol.c: move comments for get_mctgt_type() to proper position ... Browse Code »

Move the comments for get_mctgt_type() to be before get_mctgt_type()
implementation.

Link: http://lkml.kernel.org/r/1463644638-7446-1-git-send-email-roy.qing.li@gmail.com
Signed-off-by: Li RongQing
Acked-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li RongQing
2016-05-28 05:49:37 +0800
cbedbac3e mm/memcontrol.c: fix the margin computation in mem_cgroup_margin() ... Browse Code »

mem_cgroup_margin() might return (memory.limit - memory_count) when the
memsw.limit is in excess. This doesn't happen usually because we do not
allow excess on hard limits and (memory.limit
Acked-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li RongQing
2016-05-28 05:49:37 +0800
badbda53e mm/cma: silence warnings due to max() usage ... Browse Code »

pageblock_order can be (at least) an unsigned int or an unsigned long
depending on the kernel config and architecture, so use max_t(unsigned
long, ...) when comparing it.

fixes these warnings:

In file included from include/asm-generic/bug.h:13:0,
from arch/powerpc/include/asm/bug.h:127,
from include/linux/bug.h:4,
from include/linux/mmdebug.h:4,
from include/linux/mm.h:8,
from include/linux/memblock.h:18,
from mm/cma.c:28:
mm/cma.c: In function 'cma_init_reserved_mem':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); ^
mm/cma.c:186:27: note: in expansion of macro 'max'
alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
^
mm/cma.c: In function 'cma_declare_contiguous':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); ^
include/linux/kernel.h:747:9: note: in definition of macro 'max'
typeof(y) _max2 = (y); ^
mm/cma.c:270:29: note: in expansion of macro 'max'
(phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
^
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); ^
include/linux/kernel.h:747:21: note: in definition of macro 'max'
typeof(y) _max2 = (y); ^
mm/cma.c:270:29: note: in expansion of macro 'max'
(phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
^

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20160526150748.5be38a4f@canb.auug.org.au
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2016-05-28 05:49:37 +0800
0798d3c02 mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() ... Browse Code »

If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail
page from a pte, the "address" must be THP aligned in order for the
page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds.

Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com
Fixes: 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
Signed-off-by: Kirill A. Shutemov
Reported-by: Mika Westerberg
Tested-by: Mika Westerberg
Reviewed-by: Andrea Arcangeli
Cc: [4.5]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-05-28 05:49:37 +0800
e2fe14564 oom_reaper: close race with exiting task ... Browse Code »

Tetsuo has reported:
Out of memory: Kill process 443 (oleg's-test) score 855 or sacrifice child
Killed process 443 (oleg's-test) total-vm:493248kB, anon-rss:423880kB, file-rss:4kB, shmem-rss:0kB
sh invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0
sh cpuset=/ mems_allowed=0
CPU: 2 PID: 1 Comm: sh Not tainted 4.6.0-rc7+ #51
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Call Trace:
dump_stack+0x85/0xc8
dump_header+0x5b/0x394
oom_reaper: reaped process 443 (oleg's-test), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

In other words:

__oom_reap_task exit_mm
atomic_inc_not_zero
tsk->mm = NULL
mmput
atomic_dec_and_test # > 0
exit_oom_victim # New victim will be
# selected

# no TIF_MEMDIE task so we can select a new one
unmap_page_range # to release the memory

The race exists even without the oom_reaper because anybody who pins the
address space and gets preempted might race with exit_mm but oom_reaper
made this race more probable.

We can address the oom_reaper part by using oom_lock for __oom_reap_task
because this would guarantee that a new oom victim will not be selected
if the oom reaper might race with the exit path. This doesn't solve the
original issue, though, because somebody else still might be pinning
mm_users and so __mmput won't be called to release the memory but that
is not really realiably solvable because the task will get away from the
oom sight as soon as it is unhashed from the task_list and so we cannot
guarantee a new victim won't be selected.

[akpm@linux-foundation.org: fix use of unused `mm', Per Stephen]
[akpm@linux-foundation.org: coding-style fixes]
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Link: http://lkml.kernel.org/r/1464271493-20008-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reported-by: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-28 05:49:37 +0800
f65e91df2 mm: use early_pfn_to_nid in register_page_bootmem_info_node ... Browse Code »

register_page_bootmem_info_node() is invoked in mem_init(), so it will
be called before page_alloc_init_late() if DEFERRED_STRUCT_PAGE_INIT is
enabled. But, pfn_to_nid() depends on memmap which won't be fully setup
until page_alloc_init_late() is done, so replace pfn_to_nid() by
early_pfn_to_nid().

Link: http://lkml.kernel.org/r/1464210007-30930-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi
Cc: Mel Gorman
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yang Shi
2016-05-28 05:49:37 +0800
fe53ca542 mm: use early_pfn_to_nid in page_ext_init ... Browse Code »

page_ext_init() checks suitable pages with pfn_to_nid(), but
pfn_to_nid() depends on memmap which will not be setup fully until
page_alloc_init_late() is done. Use early_pfn_to_nid() instead of
pfn_to_nid() so that page extension could be still used early even
though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
allocation call sites.

Suggested by Joonsoo Kim [1], this fix basically undoes the change
introduced by commit b8f1a75d61d840 ("mm: call page_ext_init() after all
struct pages are initialized") and fixes the same problem with a better
approach.

[1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com

Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yang Shi
2016-05-28 05:49:37 +0800
edd9f7230 mm: oom: do not reap task if there are live threads in threadgroup ... Browse Code »

If the current process is exiting, we don't invoke oom killer, instead
we give it access to memory reserves and try to reap its mm in case
nobody is going to use it. There's a mistake in the code performing
this check - we just ignore any process of the same thread group no
matter if it is exiting or not - see try_oom_reaper. Fix it.

Link: http://lkml.kernel.org/r/1464087628-7318-1-git-send-email-vdavydov@virtuozzo.com
Fixes: 3ef22dfff239 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-05-28 05:49:37 +0800

27 May, 2016

6 commits

e12fab28d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton :
drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4
update "mm/zsmalloc: don't fail if can't create debugfs info"
dma-debug: avoid spinlock recursion when disabling dma-debug
mm: oom_reaper: remove some bloat
memcg: fix mem_cgroup_out_of_memory() return value.
ocfs2: fix improper handling of return errno
mm: slub: remove unused virt_to_obj()
mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly
seqlock: fix raw_read_seqcount_latch()

Linus Torvalds
2016-05-27 12:32:40 +0800
478a1469a Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull DAX locking updates from Ross Zwisler:
"Filesystem DAX locking for 4.7

- We use a bit in an exceptional radix tree entry as a lock bit and
use it similarly to how page lock is used for normal faults. This
fixes races between hole instantiation and read faults of the same
index.

- Filesystem DAX PMD faults are disabled, and will be re-enabled when
PMD locking is implemented"

* tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Remove i_mmap_lock protection
dax: Use radix tree entry lock to protect cow faults
dax: New fault locking
dax: Allow DAX code to replace exceptional entries
dax: Define DAX lock bit for radix tree exceptional entry
dax: Make huge page handling depend of CONFIG_BROKEN
dax: Fix condition for filling of PMD holes

Linus Torvalds
2016-05-27 11:00:28 +0800
4abaac9b7 update "mm/zsmalloc: don't fail if can't create debugfs info" ... Browse Code »

Some updates to commit d34f615720d1 ("mm/zsmalloc: don't fail if can't
create debugfs info"):

- add pr_warn to all stat failure cases
- do not prevent module loading on stat failure

Link: http://lkml.kernel.org/r/1463671123-5479-1-git-send-email-ddstreet@ieee.org
Signed-off-by: Dan Streetman
Reviewed-by: Ganesh Mahendran
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-27 06:35:44 +0800
1ebab2db0 memcg: fix mem_cgroup_out_of_memory() return value. ... Browse Code »

mem_cgroup_out_of_memory() is returning "true" if it finds a TIF_MEMDIE
task after an eligible task was found, "false" if it found a TIF_MEMDIE
task before an eligible task is found.

This difference confuses memory_max_write() which checks the return
value of mem_cgroup_out_of_memory(). Since memory_max_write() wants to
continue looping, mem_cgroup_out_of_memory() should return "true" in
this case.

This patch sets a dummy pointer in order to return "true".

Link: http://lkml.kernel.org/r/1463753327-5170-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa
Acked-by: Michal Hocko
Acked-by: Vladimir Davydov
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2016-05-27 06:35:44 +0800
9725759a9 mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta ... Browse Code »

Commit cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable
stackdepot for SLAB") added 'reserved' field, but never used it.

Link: http://lkml.kernel.org/r/1464021054-2307-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin
Cc: Alexander Potapenko
Cc: Dmitry Vyukov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2016-05-27 06:35:44 +0800
957949243 mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly ... Browse Code »

Per the suggestion from Michal Hocko [1], DEFERRED_STRUCT_PAGE_INIT
requires some ordering wrt other initialization operations, e.g.
page_ext_init has to happen after the whole memmap is initialized
properly.

For SPARSEMEM this requires to wait for page_alloc_init_late. Other
memory models (e.g. flatmem) might have different initialization
layouts (page_ext_init_flatmem). Currently DEFERRED_STRUCT_PAGE_INIT
depends on MEMORY_HOTPLUG which in turn

depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG

and X86_64_ACPI_NUMA depends on NUMA which in turn disable FLATMEM
memory model:

config ARCH_FLATMEM_ENABLE
def_bool y
depends on X86_32 && !NUMA

so FLATMEM is ruled out via dependency maze. Be explicit and disable
FLATMEM for DEFERRED_STRUCT_PAGE_INIT so that we do not reintroduce
subtle initialization bugs

[1] http://lkml.kernel.org/r/20160523073157.GD2278@dhcp22.suse.cz

Link: http://lkml.kernel.org/r/1464027356-32282-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yang Shi
2016-05-27 06:35:44 +0800

24 May, 2016

8 commits

84787c572 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge yet more updates from Andrew Morton:

- Oleg's "wait/ptrace: assume __WALL if the child is traced". It's a
kernel-based workaround for existing userspace issues.

- A few hotfixes

- befs cleanups

- nilfs2 updates

- sys_wait() changes

- kexec updates

- kdump

- scripts/gdb updates

- the last of the MM queue

- a few other misc things

* emailed patches from Andrew Morton : (84 commits)
kgdb: depends on VT
drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
drm/radeon: make radeon_mn_get wait for mmap_sem killable
drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
uprobes: wait for mmap_sem for write killable
prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
exec: make exec path waiting for mmap_sem killable
aio: make aio_setup_ring killable
coredump: make coredump_wait wait for mmap_sem for write killable
vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
ipc, shm: make shmem attach/detach wait for mmap_sem killable
mm, fork: make dup_mmap wait for mmap_sem for write killable
mm, proc: make clear_refs killable
mm: make vm_brk killable
mm, elf: handle vm_brk error
mm, aout: handle vm_brk failures
mm: make vm_munmap killable
mm: make vm_mmap killable
mm: make mmap_sem for write waits killable for mm syscalls
MAINTAINERS: add co-maintainer for scripts/gdb
...

Linus Torvalds
2016-05-24 10:42:28 +0800
2d6c92824 mm: make vm_brk killable ... Browse Code »

Now that all the callers handle vm_brk failure we can change it wait for
mmap_sem killable to help oom_reaper to not get blocked just because
vm_brk gets blocked behind mmap_sem readers.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: "Kirill A. Shutemov"
Cc: Oleg Nesterov
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
ae7987835 mm: make vm_munmap killable ... Browse Code »

Almost all current users of vm_munmap are ignoring the return value and
so they do not handle potential error. This means that some VMAs might
stay behind. This patch doesn't try to solve those potential problems.
Quite contrary it adds a new failure mode by using down_write_killable
in vm_munmap. This should be safer than other failure modes, though,
because the process is guaranteed to die as soon as it leaves the kernel
and exit_mmap will clean the whole address space.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Signed-off-by: Michal Hocko
Cc: Oleg Nesterov
Cc: "Kirill A. Shutemov"
Cc: Konstantin Khlebnikov
Cc: Andrea Arcangeli
Cc: Alexander Viro
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
9fbeb5ab5 mm: make vm_mmap killable ... Browse Code »

All the callers of vm_mmap seem to check for the failure already and
bail out in one way or another on the error which means that we can
change it to use killable version of vm_mmap_pgoff and return -EINTR if
the current task gets killed while waiting for mmap_sem. This also
means that vm_mmap_pgoff can be killable by default and drop the
additional parameter.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Please note that load_elf_binary is ignoring vm_mmap error for
current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: "Kirill A. Shutemov"
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Andrea Arcangeli
Cc: Al Viro
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
dc0ef0df7 mm: make mmap_sem for write waits killable for mm syscalls ... Browse Code »

This is a follow up work for oom_reaper [1]. As the async OOM killing
depends on oom_sem for read we would really appreciate if a holder for
write didn't stood in the way. This patchset is changing many of
down_write calls to be killable to help those cases when the writer is
blocked and waiting for readers to release the lock and so help
__oom_reap_task to process the oom victim.

Most of the patches are really trivial because the lock is help from a
shallow syscall paths where we can return EINTR trivially and allow the
current task to die (note that EINTR will never get to the userspace as
the task has fatal signal pending). Others seem to be easy as well as
the callers are already handling fatal errors and bail and return to
userspace which should be sufficient to handle the failure gracefully.
I am not familiar with all those code paths so a deeper review is really
appreciated.

As this work is touching more areas which are not directly connected I
have tried to keep the CC list as small as possible and people who I
believed would be familiar are CCed only to the specific patches (all
should have received the cover though).

This patchset is based on linux-next and it depends on
down_write_killable for rw_semaphores which got merged into tip
locking/rwsem branch and it is merged into this next tree. I guess it
would be easiest to route these patches via mmotm because of the
dependency on the tip tree but if respective maintainers prefer other
way I have no objections.

I haven't covered all the mmap_write(mm->mmap_sem) instances here

$ git grep "down_write(.*\)" next/master | wc -l
98
$ git grep "down_write(.*\)" | wc -l
62

I have tried to cover those which should be relatively easy to review in
this series because this alone should be a nice improvement. Other
places can be changed on top.

[0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org

This patch (of 18):

This is the first step in making mmap_sem write waiters killable. It
focuses on the trivial ones which are taking the lock early after
entering the syscall and they are not changing state before.

Therefore it is very easy to change them to use down_write_killable and
immediately return with -EINTR. This will allow the waiter to pass away
without blocking the mmap_sem which might be required to make a forward
progress. E.g. the oom reaper will need the lock for reading to
dismantle the OOM victim address space.

The only tricky function in this patch is vm_mmap_pgoff which has many
call sites via vm_mmap. To reduce the risk keep vm_mmap with the
original non-killable semantic for now.

vm_munmap callers do not bother checking the return value so open code
it into the munmap syscall path for now for simplicity.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: "Kirill A. Shutemov"
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Andrea Arcangeli
Cc: David Rientjes
Cc: Dave Hansen
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
1383399d7 mm: memcontrol: fix possible css ref leak on oom ... Browse Code »

mem_cgroup_oom may be invoked multiple times while a process is handling
a page fault, in which case current->memcg_in_oom will be overwritten
leaking the previously taken css reference.

Link: http://lkml.kernel.org/r/1464019330-7579-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-05-24 08:04:14 +0800
1d6da87a3 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.

But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.

I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.

New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip

DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support

panel:
- Support for a bunch of new panels

i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes

radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support

nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information

vmwgfx:
- Add host messaging support

gma500:
- Some cleanups and fixes

atmel:
- Bridge support
- Async atomic commit support

fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support

rcar-du:
- Misc fixes

exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs

rockchip:
- Lots of little fixes

vc4:
- Render node support
- Gamma ramp support
- DPI output support

msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence

etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups

tegra:
- Gamme table size fix"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...

Linus Torvalds
2016-05-24 02:48:48 +0800
1f40c4957 Merge tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"The bulk of this update was stabilized before the merge window and
appeared in -next. The "device dax" implementation was revised this
week in response to review feedback, and to address failures detected
by the recently expanded ndctl unit test suite.

Not included in this pull request are two dax topic branches (dax
error handling, and dax radix-tree locking). These topics were
deferred to get a few more days of -next integration testing, and to
coordinate a branch baseline with Ted and the ext4 tree. Vishal and
Ross will send the error handling and locking topics respectively in
the next few days.

This branch has received a positive build result from the kbuild robot
across 226 configs.

Summary:

- Device DAX for persistent memory: Device DAX is the device-centric
analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory
ranges to be allocated and mapped without need of an intervening
file system. Device DAX is strict, precise and predictable.
Specifically this interface:

a) Guarantees fault granularity with respect to a given page size
(pte, pmd, or pud) set at configuration time.

b) Enforces deterministic behavior by being strict about what
fault scenarios are supported.

Persistent memory is the first target, but the mechanism is also
targeted for exclusive allocations of performance/feature
differentiated memory ranges.

- Support for the HPE DSM (device specific method) command formats.
This enables management of these first generation devices until a
unified DSM specification materializes.

- Further ACPI 6.1 compliance with support for the common dimm
identifier format.

- Various fixes and cleanups across the subsystem"

* tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (40 commits)
libnvdimm, dax: fix deletion
libnvdimm, dax: fix alignment validation
libnvdimm, dax: autodetect support
libnvdimm: release ida resources
Revert "block: enable dax for raw block devices"
/dev/dax, core: file operations and dax-mmap
/dev/dax, pmem: direct access to persistent memory
libnvdimm: stop requiring a driver ->remove() method
libnvdimm, dax: record the specified alignment of a dax-device instance
libnvdimm, dax: reserve space to store labels for device-dax
libnvdimm, dax: introduce device-dax infrastructure
nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
tools/testing/nvdimm: ND_CMD_CALL support
nfit: disable vendor specific commands
nfit: export subsystem ids as attributes
nfit: fix format interface code byte order per ACPI6.1
nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
nfit, libnvdimm: clarify "commands" vs "_DSMs"
libnvdimm: increase max envelope size for ioctl
acpi/nfit: Add sysfs "id" for NVDIMM ID
...

Linus Torvalds
2016-05-24 02:18:01 +0800

23 May, 2016

1 commit

bd28b1459 x86: remove more uaccess_32.h complexity ... Browse Code »

I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.

For example, the 32-bit version of "__copy_from_user_inatomic()" is
mostly the special cases for the constant size, and it's actually almost
never relevant. Most users aren't actually using a constant size
anyway, and the few cases that do small constant copies are better off
just using __get_user() instead.

So get rid of the unnecessary complexity.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-05-23 08:21:27 +0800

21 May, 2016

15 commits

dee410792 /dev/dax, core: file operations and dax-mmap ... Browse Code »

The "Device DAX" core enables dax mappings of performance / feature
differentiated memory. An open mapping or file handle keeps the backing
struct device live, but new mappings are only possible while the device
is enabled. Faults are handled under rcu_read_lock to synchronize
with the enabled state of the device.

Similar to the filesystem-dax case the backing memory may optionally
have struct page entries. However, unlike fs-dax there is no support
for private mappings, or mappings that are not backed by media (see
use of zero-page in fs-dax).

Mappings are always guaranteed to match the alignment of the dax_region.
If the dax_region is configured to have a 2MB alignment, all mappings
are guaranteed to be backed by a pmd entry. Contrast this determinism
with the fs-dax case where pmd mappings are opportunistic. If userspace
attempts to force a misaligned mapping, the driver will fail the mmap
attempt. See dax_dev_check_vma() for other scenarios that are rejected,
like MAP_PRIVATE mappings.

Cc: Hannes Reinecke
Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Andrew Morton
Cc: Dave Hansen
Cc: Ross Zwisler
Acked-by: "Paul E. McKenney"
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams

Dan Williams
2016-05-21 13:02:55 +0800
d604c3245 radix-tree: introduce radix_tree_replace_clear_tags() ... Browse Code »

In addition to replacing the entry, we also clear all associated tags.
This is really a one-off special for page_cache_tree_delete() which had
far too much detailed knowledge about how the radix tree works.

For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
can be used by radix_tree_delete_item() as well as
radix_tree_replace_clear_tags().

Signed-off-by: Matthew Wilcox
Cc: Konstantin Khlebnikov
Cc: Kirill Shutemov
Cc: Jan Kara
Cc: Neil Brown
Cc: Ross Zwisler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2016-05-21 08:58:30 +0800
57578c2ea raxix-tree: introduce CONFIG_RADIX_TREE_MULTIORDER ... Browse Code »

I've been receiving increasingly concerned notes from 0day about how
much my recent changes have been bloating the radix tree. Make it
happier by only including multiorder support if
CONFIG_TRANSPARENT_HUGEPAGES is set.

This is an independent Kconfig option, so other radix tree users can
also set it if they have a need.

Signed-off-by: Matthew Wilcox
Reviewed-by: Ross Zwisler
Cc: Konstantin Khlebnikov
Cc: Kirill Shutemov
Cc: Jan Kara
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2016-05-21 08:58:30 +0800
d34f61572 mm/zsmalloc: don't fail if can't create debugfs info ... Browse Code »

Change the return type of zs_pool_stat_create() to void, and remove the
logic to abort pool creation if the stat debugfs dir/file could not be
created.

The debugfs stat file is for debugging/information only, and doesn't
affect operation of zsmalloc; there is no reason to abort creating the
pool if the stat file can't be created. This was seen with zswap, which
used the same name for all pool creations, which caused zsmalloc to fail
to create a second pool for zswap if CONFIG_ZSMALLOC_STAT was enabled.

Signed-off-by: Dan Streetman
Reviewed-by: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-21 08:58:30 +0800
200867af4 mm/zswap: use workqueue to destroy pool ... Browse Code »

Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().

When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().

It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:

$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool

nor does this:

$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done

although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.

However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:

$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done

This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.

Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman
Reported-by: Yu Zhao
Reviewed-by: Sergey Senozhatsky
Cc: Minchan Kim
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-21 08:58:30 +0800
d0d8da2dc zsmalloc: require GFP in zs_malloc() ... Browse Code »

Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.

Apart from that, this also align zs_malloc() interface with zspool/zbud.

[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-05-21 08:58:30 +0800
1ee471658 zsmalloc: remove unused pool param in obj_free ... Browse Code »

Let's remove unused pool param in obj_free

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
251cbb951 zsmalloc: reorder function parameters ... Browse Code »

Clean up function parameter ordering to order higher data structure
first.

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
830e4bc5b zsmalloc: clean up many BUG_ON ... Browse Code »

There are many BUG_ON in zsmalloc.c which is not recommened so change
them as alternatives.

Normal rule is as follows:

1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE

2. use VM_BUG_ON_PAGE if we need to see struct page's fields

3. use those assertion in primitive functions so higher functions can
rely on the assertion in the primitive function.

4. Don't use assertion if following instruction can trigger Oops

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
a42094676 zsmalloc: use first_page rather than page ... Browse Code »

Clean up function parameter "struct page". Many functions of zsmalloc
expect that page paramter is "first_page" so use "first_page" rather
than "page" for code readability.

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
64f8ebaf1 mm/kasan: add API to check memory regions ... Browse Code »

Memory access coded in an assembly won't be seen by KASAN as a compiler
can instrument only C code. Add kasan_check_[read,write]() API which is
going to be used to check a certain memory range.

Link: http://lkml.kernel.org/r/1462538722-1574-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin
Acked-by: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2016-05-21 08:58:30 +0800
936bb4bbb mm/kasan: print name of mem[set,cpy,move]() caller in report ... Browse Code »

When bogus memory access happens in mem[set,cpy,move]() it's usually
caller's fault. So don't blame mem[set,cpy,move]() in bug report, blame
the caller instead.

Before:
BUG: KASAN: out-of-bounds access in memset+0x23/0x40 at

After:
BUG: KASAN: out-of-bounds access in at

Link: http://lkml.kernel.org/r/1462538722-1574-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin
Acked-by: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2016-05-21 08:58:30 +0800
4ebb31a42 mm, kasan: don't call kasan_krealloc() from ksize(). ... Browse Code »

Instead of calling kasan_krealloc(), which replaces the memory
allocation stack ID (if stack depot is used), just unpoison the whole
memory chunk.

Signed-off-by: Alexander Potapenko
Acked-by: Andrey Ryabinin
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Christoph Lameter
Cc: Konstantin Serebryany
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-05-21 08:58:30 +0800
55834c590 mm: kasan: initial memory quarantine implementation ... Browse Code »

Quarantine isolates freed objects in a separate queue. The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

When the object is freed, its state changes from KASAN_STATE_ALLOC to
KASAN_STATE_QUARANTINE. The object is poisoned and put into quarantine
instead of being returned to the allocator, therefore every subsequent
access to that object triggers a KASAN error, and the error handler is
able to say where the object has been allocated and deallocated.

When it's time for the object to leave quarantine, its state becomes
KASAN_STATE_FREE and it's returned to the allocator. From now on the
allocator may reuse it for another allocation. Before that happens,
it's still possible to detect a use-after free on that object (it
retains the allocation/deallocation stacks).

When the allocator reuses this object, the shadow is unpoisoned and old
allocation/deallocation stacks are wiped. Therefore a use of this
object, even an incorrect one, won't trigger ASan warning.

Without the quarantine, it's not guaranteed that the objects aren't
reused immediately, that's why the probability of catching a
use-after-free is lower than with quarantine in place.

Quarantine isolates freed objects in a separate queue. The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

Freed objects are first added to per-cpu quarantine queues. When a
cache is destroyed or memory shrinking is requested, the objects are
moved into the global quarantine queue. Whenever a kmalloc call allows
memory reclaiming, the oldest objects are popped out of the global queue
until the total size of objects in quarantine is less than 3/4 of the
maximum quarantine size (which is a fraction of installed physical
memory).

As long as an object remains in the quarantine, KASAN is able to report
accesses to it, so the chance of reporting a use-after-free is
increased. Once the object leaves quarantine, the allocator may reuse
it, in which case the object is unpoisoned and KASAN can't detect
incorrect accesses to it.

Right now quarantine support is only enabled in SLAB allocator.
Unification of KASAN features in SLAB and SLUB will be done later.

This patch is based on the "mm: kasan: quarantine" patch originally
prepared by Dmitry Chernenkov. A number of improvements have been
suggested by Andrey Ryabinin.

[glider@google.com: v9]
Link: http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: Steven Rostedt
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-05-21 08:58:30 +0800
dfef2ef40 mm, migrate: increment fail count on ENOMEM ... Browse Code »

If page migration fails due to -ENOMEM, nr_failed should still be
incremented for proper statistics.

This was encountered recently when all page migration vmstats showed 0,
and inferred that migrate_pages() was never called, although in reality
the first page migration failed because compaction_alloc() failed to
find a migration target.

This patch increments nr_failed so the vmstat is properly accounted on
ENOMEM.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1605191510230.32658@chino.kir.corp.google.com
Signed-off-by: David Rientjes
Cc: Vlastimil Babka
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2016-05-21 08:58:30 +0800