11 Jul, 2017

40 commits

  • 'all_var' looks like a variable, but is actually a macro. Use
    IS_ENABLED(CONFIG_KALLSYMS_ALL) for clarification.

    Link: http://lkml.kernel.org/r/1497577591-3434-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • setgroups is not exactly a hot path, so we might as well use the library
    function instead of open-coding the sorting. Saves ~150 bytes.

    Link: http://lkml.kernel.org/r/1497301378-22739-1-git-send-email-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • attribute_groups are not supposed to change at runtime. All functions
    working with attribute_groups provided by work with
    const attribute_group. So mark the non-const structs as const.

    File size before:
    text data bss dec hex filename
    1120 544 16 1680 690 kernel/ksysfs.o

    File size After adding 'const':
    text data bss dec hex filename
    1160 480 16 1656 678 kernel/ksysfs.o

    Link: http://lkml.kernel.org/r/aa224b3cc923fdbb3edd0c41b2c639c85408c9e8.1498737347.git.arvind.yadav.cs@gmail.com
    Signed-off-by: Arvind Yadav
    Acked-by: Kees Cook
    Cc: Russell King
    Cc: Dave Young
    Cc: Hari Bathini
    Cc: Petr Tesarik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Yadav
     
  • The global variable 'rd_size' is declared as 'int' in source file
    arch/arm/kernel/atags_parse.c and as 'unsigned long' in
    drivers/block/brd.c. Fix this inconsistency.

    Additionally, remove the declarations of rd_image_start, rd_prompt and
    rd_doload from parse_tag_ramdisk() since these duplicate existing
    declarations in .

    Link: http://lkml.kernel.org/r/20170627065024.12347-1-bart.vanassche@wdc.com
    Signed-off-by: Bart Van Assche
    Acked-by: Russell King
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: Jason Yan
    Cc: Zhaohongjiang
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bart Van Assche
     
  • Including pulls in a lot of bloat from and
    that is not needed to call the BUILD_BUG() family of
    macros. Split them out into their own header, .

    Also correct some checkpatch.pl errors for the BUILD_BUG_ON_ZERO() and
    BUILD_BUG_ON_NULL() macros by adding parentheses around the bitfield
    widths that begin with a minus sign.

    Link: http://lkml.kernel.org/r/20170525120316.24473-6-abbotti@mev.co.uk
    Signed-off-by: Ian Abbott
    Acked-by: Michal Nazarewicz
    Acked-by: Kees Cook
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Jakub Kicinski
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Abbott
     
  • Correct these checkpatch.pl errors:

    |ERROR: space required before that '-' (ctx:OxO)
    |#37: FILE: include/linux/bug.h:37:
    |+#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))

    |ERROR: space required before that '-' (ctx:OxO)
    |#38: FILE: include/linux/bug.h:38:
    |+#define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); }))

    I decided to wrap the bitfield expressions that begin with minus signs
    in parentheses rather than insert spaces before the minus signs.

    Link: http://lkml.kernel.org/r/20170525120316.24473-5-abbotti@mev.co.uk
    Signed-off-by: Ian Abbott
    Acked-by: Michal Nazarewicz
    Cc: Kees Cook
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Jakub Kicinski
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Abbott
     
  • Correct this checkpatch.pl error:

    |ERROR: "(foo*)" should be "(foo *)"
    |#19: FILE: include/linux/bug.h:19:
    |+#define BUILD_BUG_ON_NULL(e) ((void*)0)

    Link: http://lkml.kernel.org/r/20170525120316.24473-4-abbotti@mev.co.uk
    Signed-off-by: Ian Abbott
    Acked-by: Michal Nazarewicz
    Cc: Kees Cook
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Jakub Kicinski
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Abbott
     
  • Correct these checkpatch.pl warnings:

    |WARNING: Block comments use * on subsequent lines
    |#34: FILE: include/linux/bug.h:34:
    |+/* Force a compilation error if condition is true, but also produce a
    |+ result (of value 0 and type size_t), so the expression can be used

    |WARNING: Block comments use a trailing */ on a separate line
    |#36: FILE: include/linux/bug.h:36:
    |+ aren't permitted). */

    Link: http://lkml.kernel.org/r/20170525120316.24473-3-abbotti@mev.co.uk
    Signed-off-by: Ian Abbott
    Acked-by: Michal Nazarewicz
    Cc: Kees Cook
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Jakub Kicinski
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Abbott
     
  • This series of patches splits BUILD_BUG related macros out of
    "include/linux/bug.h" into new file "include/linux/build_bug.h" (patch
    5), and changes the pointer type checking in the `container_of()` macro
    to deal with pointers of array type better (patch 6). Patches 1 to 4
    are prerequisites.

    Patches 2, 3, 4, and 5 have been inserted since the previous version of
    this patch series. Patch 6 here corresponds to v3 and v4's patch 2.

    Patch 1 was a prerequisite in v3 of this series to avoid a lot of
    warnings when was included by . That is
    no longer relevant for v5 of the series, but I left it in because it was
    acked by a Arnd Bergmann and Michal Nazarewicz.

    Patches 2, 3, and 4 are some checkpatch clean-ups on
    "include/linux/bug.h" before splitting out the BUILD_BUG stuff in patch
    5.

    Patch 5 splits the BUILD_BUG related macros out of "include/linux/bug.h"
    into new file "include/linux/build_bug.h" because including
    in "include/linux/kernel.h" would result in build failures
    due to circular dependencies.

    Patch 6 changes the pointer type checking by `container_of()` to avoid
    some incompatible pointer warnings when the dereferenced pointer has
    array type.

    1) asm-generic/bug.h: declare struct pt_regs; before function prototype
    2) linux/bug.h: correct formatting of block comment
    3) linux/bug.h: correct "(foo*)" should be "(foo *)"
    4) linux/bug.h: correct "space required before that '-'"
    5) bug: split BUILD_BUG stuff out into
    6) kernel.h: handle pointers to arrays better in container_of()

    This patch (of 6):

    The declaration of `__warn()` has `struct pt_regs *regs` as one of its
    parameters. This can result in compiler warnings if `struct regs` is not
    already declared. Add an empty declaration of `struct pt_regs` to avoid
    the warnings.

    Link: http://lkml.kernel.org/r/20170525120316.24473-2-abbotti@mev.co.uk
    Signed-off-by: Ian Abbott
    Acked-by: Arnd Bergmann
    Acked-by: Michal Nazarewicz
    Cc: Arnd Bergmann
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Abbott
     
  • The code can be much simplified by switching to ida_simple_get/remove.

    Link: http://lkml.kernel.org/r/8d1cc9f7-5115-c9dc-028e-c0770b6bfe1f@gmail.com
    Signed-off-by: Heiner Kallweit
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiner Kallweit
     
  • FRV supports 64-bit cmpxchg, which is provided by the arch code as
    __cmpxchg_64 and subsequently used to implement atomic64_cmpxchg.

    This patch hooks up the generic cmpxchg64 API using the same function,
    which also provides default definitions of the relaxed, acquire and
    release variants. This fixes the build when COMPILE_TEST=y and
    IOMMU_IO_PGTABLE_LPAE=y.

    Link: http://lkml.kernel.org/r/1499084670-6996-1-git-send-email-will.deacon@arm.com
    Signed-off-by: Will Deacon
    Reported-by: kbuild test robot
    Cc: Joerg Roedel
    Cc: Robin Murphy
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • The arch uses a verbatim copy of the asm-generic version and does not
    add any own implementations to the header, so use asm-generic/fb.h
    instead of duplicating code.

    Link: http://lkml.kernel.org/r/20170517083307.1697-1-tklauser@distanz.ch
    Signed-off-by: Tobias Klauser
    Reviewed-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • frv's asm/device.h is merely including asm-generic/device.h. Thus, the
    arch specific header can be omitted and the generic header can be used
    directly.

    Link: http://lkml.kernel.org/r/20170517124915.26904-1-tklauser@distanz.ch
    Signed-off-by: Tobias Klauser
    Reviewed-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • The helper function get_wild_bug_type() does not need to be in global
    scope, so make it static.

    Cleans up sparse warning:

    "symbol 'get_wild_bug_type' was not declared. Should it be static?"

    Link: http://lkml.kernel.org/r/20170622090049.10658-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     
  • They return positive value, that is, true, if non-zero value is found.
    Rename them to reduce confusion.

    Link: http://lkml.kernel.org/r/20170516012350.GA16015@js1304-desktop
    Signed-off-by: Joonsoo Kim
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • KASAN doesn't happen work with memory hotplug because hotplugged memory
    doesn't have any shadow memory. So any access to hotplugged memory
    would cause a crash on shadow check.

    Use memory hotplug notifier to allocate and map shadow memory when the
    hotplugged memory is going online and free shadow after the memory
    offlined.

    Link: http://lkml.kernel.org/r/20170601162338.23540-4-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Cc: "H. Peter Anvin"
    Cc: Alexander Potapenko
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • We used to read several bytes of the shadow memory in advance.
    Therefore additional shadow memory mapped to prevent crash if
    speculative load would happen near the end of the mapped shadow memory.

    Now we don't have such speculative loads, so we no longer need to map
    additional shadow memory.

    Link: http://lkml.kernel.org/r/20170601162338.23540-3-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Acked-by: Mark Rutland
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "H. Peter Anvin"
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • We used to read several bytes of the shadow memory in advance.
    Therefore additional shadow memory mapped to prevent crash if
    speculative load would happen near the end of the mapped shadow memory.

    Now we don't have such speculative loads, so we no longer need to map
    additional shadow memory.

    Link: http://lkml.kernel.org/r/20170601162338.23540-2-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Cc: Mark Rutland
    Cc: "H. Peter Anvin"
    Cc: Alexander Potapenko
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • For some unaligned memory accesses we have to check additional byte of
    the shadow memory. Currently we load that byte speculatively to have
    only single load + branch on the optimistic fast path.

    However, this approach has some downsides:

    - It's unaligned access, so this prevents porting KASAN on
    architectures which doesn't support unaligned accesses.

    - We have to map additional shadow page to prevent crash if speculative
    load happens near the end of the mapped memory. This would
    significantly complicate upcoming memory hotplug support.

    I wasn't able to notice any performance degradation with this patch. So
    these speculative loads is just a pain with no gain, let's remove them.

    Link: http://lkml.kernel.org/r/20170601162338.23540-1-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Acked-by: Dmitry Vyukov
    Cc: Alexander Potapenko
    Cc: Mark Rutland
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • There is missing optimization in zero_p4d_populate() that can save some
    memory when mapping zero shadow. Implement it like as others.

    Link: http://lkml.kernel.org/r/1494829255-23946-1-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Andrey Ryabinin
    Cc: "Kirill A . Shutemov"
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Commit 40f9fb8cffc6 ("mm/zsmalloc: support allocating obj with size of
    ZS_MAX_ALLOC_SIZE") fixes a size calculation error that prevented
    zsmalloc to allocate an object of the maximal size (ZS_MAX_ALLOC_SIZE).
    I think however the fix is unneededly complicated.

    This patch replaces the dynamic calculation of zs_size_classes at init
    time by a compile time calculation that uses the DIV_ROUND_UP() macro
    already used in get_size_class_index().

    [akpm@linux-foundation.org: use min_t]
    Link: http://lkml.kernel.org/r/20170630114859.1979-1-jmarchan@redhat.com
    Signed-off-by: Jerome Marchand
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Mahendran Ganesh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     
  • attribute_groups are not supposed to change at runtime. All functions
    working with attribute_groups provided by work with
    const attribute_group. So mark the non-const structs as const.

    File size before:
    text data bss dec hex filename
    8293 841 4 9138 23b2 drivers/block/zram/zram_drv.o

    File size After adding 'const':
    text data bss dec hex filename
    8357 777 4 9138 23b2 drivers/block/zram/zram_drv.o

    Link: http://lkml.kernel.org/r/65680c1c4d85818f7094cbfa31c91bf28185ba1b.1499061182.git.arvind.yadav.cs@gmail.com
    Signed-off-by: Arvind Yadav
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arvind Yadav
     
  • early_pfn_to_nid will return node 0 if both HAVE_ARCH_EARLY_PFN_TO_NID
    and HAVE_MEMBLOCK_NODE_MAP are disabled. It seems we are safe now
    because all architectures which support NUMA define one of them (with an
    exception of alpha which however has CONFIG_NUMA marked as broken) so
    this works as expected. It can get silently and subtly broken too
    easily, though. Make sure we fail the compilation if NUMA is enabled
    and there is no proper implementation for this function. If that ever
    happens we know that either the specific configuration is invalid and
    the fix should either disable NUMA or enable one of the above configs.

    Link: http://lkml.kernel.org/r/20170704075803.15979-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Yang Shi
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Andrey reported a potential deadlock with the memory hotplug lock and
    the cpu hotplug lock.

    The reason is that memory hotplug takes the memory hotplug lock and then
    calls stop_machine() which calls get_online_cpus(). That's the reverse
    lock order to get_online_cpus(); get_online_mems(); in mm/slub_common.c

    The problem has been there forever. The reason why this was never
    reported is that the cpu hotplug locking had this homebrewn recursive
    reader writer semaphore construct which due to the recursion evaded the
    full lock dep coverage. The memory hotplug code copied that construct
    verbatim and therefor has similar issues.

    Three steps to fix this:

    1) Convert the memory hotplug locking to a per cpu rwsem so the
    potential issues get reported proper by lockdep.

    2) Lock the online cpus in mem_hotplug_begin() before taking the memory
    hotplug rwsem and use stop_machine_cpuslocked() in the page_alloc
    code to avoid recursive locking.

    3) The cpu hotpluck locking in #2 causes a recursive locking of the cpu
    hotplug lock via __offline_pages() -> lru_add_drain_all(). Solve this
    by invoking lru_add_drain_all_cpuslocked() instead.

    Link: http://lkml.kernel.org/r/20170704093421.506836322@linutronix.de
    Reported-by: Andrey Ryabinin
    Signed-off-by: Thomas Gleixner
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Vladimir Davydov
    Cc: Peter Zijlstra
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The rework of the cpu hotplug locking unearthed potential deadlocks with
    the memory hotplug locking code.

    The solution for these is to rework the memory hotplug locking code as
    well and take the cpu hotplug lock before the memory hotplug lock in
    mem_hotplug_begin(), but this will cause a recursive locking of the cpu
    hotplug lock when the memory hotplug code calls lru_add_drain_all().

    Split out the inner workings of lru_add_drain_all() into
    lru_add_drain_all_cpuslocked() so this function can be invoked from the
    memory hotplug code with the cpu hotplug lock held.

    Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.de
    Signed-off-by: Thomas Gleixner
    Reported-by: Andrey Ryabinin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Vladimir Davydov
    Cc: Peter Zijlstra
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Use rlimit() helper instead of manually writing whole chain from current
    task to rlim_cur.

    Link: http://lkml.kernel.org/r/20170705172811.8027-1-k.opasiak@samsung.com
    Signed-off-by: Krzysztof Opasiak
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Opasiak
     
  • __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
    duration if there are more number of items in the lru list. As per the
    current code, it can hold the spin lock for upto maximum UINT_MAX
    entries at a time. So if there are more number of items in the lru
    list, then "BUG: spinlock lockup suspected" is observed in the below
    path:

    spin_bug+0x90
    do_raw_spin_lock+0xfc
    _raw_spin_lock+0x28
    list_lru_add+0x28
    dput+0x1c8
    path_put+0x20
    terminate_walk+0x3c
    path_lookupat+0x100
    filename_lookup+0x6c
    user_path_at_empty+0x54
    SyS_faccessat+0xd0
    el0_svc_naked+0x24

    This nlru->lock is acquired by another CPU in this path -

    d_lru_shrink_move+0x34
    dentry_lru_isolate_shrink+0x48
    __list_lru_walk_one.isra.10+0x94
    list_lru_walk_node+0x40
    shrink_dcache_sb+0x60
    do_remount_sb+0xbc
    do_emergency_remount+0xb0
    process_one_work+0x228
    worker_thread+0x2e0
    kthread+0xf4
    ret_from_fork+0x10

    Fix this lockup by reducing the number of entries to be shrinked from
    the lru list to 1024 at once. Also, add cond_resched() before
    processing the lru list again.

    Link: http://marc.info/?t=149722864900001&r=1&w=2
    Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org
    Signed-off-by: Sahitya Tummala
    Suggested-by: Jan Kara
    Suggested-by: Vladimir Davydov
    Acked-by: Vladimir Davydov
    Cc: Alexander Polakov
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sahitya Tummala
     
  • list_lru_count_node() iterates over all memcgs to get the total number of
    entries on the node but it can race with memcg_drain_all_list_lrus(),
    which migrates the entries from a dead cgroup to another. This can return
    incorrect number of entries from list_lru_count_node().

    Fix this by keeping track of entries per node and simply return it in
    list_lru_count_node().

    Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.org
    Signed-off-by: Sahitya Tummala
    Acked-by: Vladimir Davydov
    Cc: Jan Kara
    Cc: Alexander Polakov
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sahitya Tummala
     
  • expand_stack(vma) fails if address < stack_guard_gap even if there is no
    vma->vm_prev. I don't think this makes sense, and we didn't do this
    before the recent commit 1be7107fbe18 ("mm: larger stack guard gap,
    between vmas").

    We do not need a gap in this case, any address is fine as long as
    security_mmap_addr() doesn't object.

    This also simplifies the code, we know that address >= prev->vm_end and
    thus underflow is not possible.

    Link: http://lkml.kernel.org/r/20170628175258.GA24881@redhat.com
    Signed-off-by: Oleg Nesterov
    Acked-by: Michal Hocko
    Cc: Hugh Dickins
    Cc: Larry Woodman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has
    introduced a regression in some rust and Java environments which are
    trying to implement their own stack guard page. They are punching a new
    MAP_FIXED mapping inside the existing stack Vma.

    This will confuse expand_{downwards,upwards} into thinking that the
    stack expansion would in fact get us too close to an existing non-stack
    vma which is a correct behavior wrt safety. It is a real regression on
    the other hand.

    Let's work around the problem by considering PROT_NONE mapping as a part
    of the stack. This is a gros hack but overflowing to such a mapping
    would trap anyway an we only can hope that usespace knows what it is
    doing and handle it propely.

    Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
    Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Debugged-by: Vlastimil Babka
    Cc: Ben Hutchings
    Cc: Willy Tarreau
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • presently pages in the balloon device have random value, and these pages
    will be scanned by ksmd on the host. They usually cannot be merged.
    Enqueue zero pages will resolve this problem.

    Link: http://lkml.kernel.org/r/1498698637-26389-1-git-send-email-zhenwei.pi@youruncloud.com
    Signed-off-by: zhenwei.pi
    Cc: Gioh Kim
    Cc: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Konstantin Khlebnikov
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhenwei.pi
     
  • The align_offset parameter is used by bitmap_find_next_zero_area_off()
    to represent the offset of map's base from the previous alignment
    boundary; the function ensures that the returned index, plus the
    align_offset, honors the specified align_mask.

    The logic introduced by commit b5be83e308f7 ("mm: cma: align to physical
    address, not CMA region position") has the cma driver calculate the
    offset to the *next* alignment boundary. In most cases, the base
    alignment is greater than that specified when making allocations,
    resulting in a zero offset whether we align up or down. In the example
    given with the commit, the base alignment (8MB) was half the requested
    alignment (16MB) so the math also happened to work since the offset is
    8MB in both directions. However, when requesting allocations with an
    alignment greater than twice that of the base, the returned index would
    not be correctly aligned.

    Also, the align_order arguments of cma_bitmap_aligned_mask() and
    cma_bitmap_aligned_offset() should not be negative so the argument type
    was made unsigned.

    Fixes: b5be83e308f7 ("mm: cma: align to physical address, not CMA region position")
    Link: http://lkml.kernel.org/r/20170628170742.2895-1-opendmb@gmail.com
    Signed-off-by: Angus Clark
    Signed-off-by: Doug Berger
    Acked-by: Gregory Fong
    Cc: Doug Berger
    Cc: Angus Clark
    Cc: Laura Abbott
    Cc: Vlastimil Babka
    Cc: Greg Kroah-Hartman
    Cc: Lucas Stach
    Cc: Catalin Marinas
    Cc: Shiraz Hashim
    Cc: Jaewon Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Berger
     
  • __remove_zone() sets up up zone_type, but never uses it for anything.
    This does not cause a warning, due to the (necessary) use of
    -Wno-unused-but-set-variable. However, it's noise, so just delete it.

    Link: http://lkml.kernel.org/r/20170624043421.24465-2-jhubbard@nvidia.com
    Signed-off-by: John Hubbard
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • It seems that there are still people using 32b kernels which a lot of
    memory and the IO tend to suck a lot for them by default. Mostly
    because writers are throttled too when the lowmem is used. We have
    highmem_is_dirtyable to work around that issue but it seems we never
    bothered to document it. Let's do it now, finally.

    Link: http://lkml.kernel.org/r/20170626093200.18958-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: Alkis Georgopoulos
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • wb_stat_sum() disables interrupts and calls __wb_stat_sum() which
    eventually calls __percpu_counter_sum(). However, the percpu routine is
    already irq-safe. Simplify the code a bit by making wb_stat_sum()
    directly call percpu_counter_sum_positive() and not disable interrupts.

    Also remove the now-uneeded __wb_stat_sum() which was just a wrapper
    over percpu_counter_sum_positive().

    Link: http://lkml.kernel.org/r/1498230681-29103-1-git-send-email-nborisov@suse.com
    Signed-off-by: Nikolay Borisov
    Acked-by: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikolay Borisov
     
  • Currently pg_data_t is just a struct which describes a NUMA node memory
    layout. Let's keep the comment simple and remove ambiguity.

    Link: http://lkml.kernel.org/r/1498220534-22717-1-git-send-email-nborisov@suse.com
    Signed-off-by: Nikolay Borisov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikolay Borisov
     
  • get_cpu_var() disables preemption and returns the per-CPU version of the
    variable. Disabling preemption is useful to ensure atomic access to the
    variable within the critical section.

    In this case however, after the per-CPU version of the variable is
    obtained the ->free_lock is acquired. For that reason it seems the raw
    accessor could be used. It only seems that ->slots_ret should be
    retested (because with disabled preemption this variable can not be set
    to NULL otherwise).

    This popped up during PREEMPT-RT testing because it tries to take
    spinlocks in a preempt disabled section. In RT, spinlocks can sleep.

    Link: http://lkml.kernel.org/r/20170623114755.2ebxdysacvgxzott@linutronix.de
    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Michal Hocko
    Cc: Tim Chen
    Cc: Thomas Gleixner
    Cc: Ying Huang
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     
  • Since current_order starts as MAX_ORDER-1 and is then only decremented,
    the second half of the loop condition seems superfluous. However, if
    order is 0, we may decrement current_order past 0, making it UINT_MAX.
    This is obviously too subtle ([1], [2]).

    Since we need to add some comment anyway, change the two variables to
    signed, making the counting-down for loop look more familiar, and
    apparently also making gcc generate slightly smaller code.

    [1] https://lkml.org/lkml/2016/6/20/493
    [2] https://lkml.org/lkml/2017/6/19/345

    [akpm@linux-foundation.org: fix up reject fixupping]
    Link: http://lkml.kernel.org/r/20170621185529.2265-1-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Reported-by: Hao Lee
    Acked-by: Wei Yang
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • After commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
    we do not hide stack guard page in /proc//maps

    Link: http://lkml.kernel.org/r/211f3c2a-f7ef-7c13-82bf-46fd426f6e1b@virtuozzo.com
    Signed-off-by: Vasily Averin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • __register_one_node() initializes local parameters "p_node" & "parent"
    for register_node().

    But, register_node() does not use them.

    Remove the related code of "parent" node, cleanup __register_one_node()
    and register_node().

    Link: http://lkml.kernel.org/r/1498013846-20149-1-git-send-email-douly.fnst@cn.fujitsu.com
    Signed-off-by: Dou Liyang
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dou Liyang