24 Aug, 2018

1 commit

  • [ Upstream commit 1e8e18f694a52d703665012ca486826f64bac29d ]

    There is a special case that the size is "(N << KASAN_SHADOW_SCALE_SHIFT)
    Pages plus X", the value of X is [1, KASAN_SHADOW_SCALE_SIZE-1]. The
    operation "size >> KASAN_SHADOW_SCALE_SHIFT" will drop X, and the
    roundup operation can not retrieve the missed one page. For example:
    size=0x28006, PAGE_SIZE=0x1000, KASAN_SHADOW_SCALE_SHIFT=3, we will get
    shadow_size=0x5000, but actually we need 6 pages.

    shadow_size = round_up(size >> KASAN_SHADOW_SCALE_SHIFT, PAGE_SIZE);

    This can lead to a kernel crash when kasan is enabled and the value of
    mod->core_layout.size or mod->init_layout.size is like above. Because
    the shadow memory of X has not been allocated and mapped.

    move_module:
    ptr = module_alloc(mod->core_layout.size);
    ...
    memset(ptr, 0, mod->core_layout.size); //crashed

    Unable to handle kernel paging request at virtual address ffff0fffff97b000
    ......
    Call trace:
    __asan_storeN+0x174/0x1a8
    memset+0x24/0x48
    layout_and_allocate+0xcd8/0x1800
    load_module+0x190/0x23e8
    SyS_finit_module+0x148/0x180

    Link: http://lkml.kernel.org/r/1529659626-12660-1-git-send-email-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei
    Reviewed-by: Dmitriy Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Hanjun Guo
    Cc: Libin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Zhen Lei
     

30 May, 2018

3 commits

  • commit 3f1959721558a976aaf9c2024d5bc884e6411bf7 upstream.

    Using module_init() is wrong. E.g. ACPI adds and onlines memory before
    our memory notifier gets registered.

    This makes sure that ACPI memory detected during boot up will not result
    in a kernel crash.

    Easily reproducible with QEMU, just specify a DIMM when starting up.

    Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
    Fixes: 786a8959912e ("kasan: disable memory hotplug")
    Signed-off-by: David Hildenbrand
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit ed1596f9ab958dd156a66c9ff1029d3761c1786a upstream.

    We have to free memory again when we cancel onlining, otherwise a later
    onlining attempt will fail.

    Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
    Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
    Signed-off-by: David Hildenbrand
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 0f901dcbc31f88ae41a2aaa365f7802b5d520a28 upstream.

    KASAN uses different routines to map shadow for hot added memory and
    memory obtained in boot process. Attempt to offline memory onlined by
    normal boot process leads to this:

    Trying to vfree() nonexistent vm area (000000005d3b34b9)
    WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190

    Call Trace:
    kasan_mem_notifier+0xad/0xb9
    notifier_call_chain+0x166/0x260
    __blocking_notifier_call_chain+0xdb/0x140
    __offline_pages+0x96a/0xb10
    memory_subsys_offline+0x76/0xc0
    device_offline+0xb8/0x120
    store_mem_state+0xfa/0x120
    kernfs_fop_write+0x1d5/0x320
    __vfs_write+0xd4/0x530
    vfs_write+0x105/0x340
    SyS_write+0xb0/0x140

    Obviously we can't call vfree() to free memory that wasn't allocated via
    vmalloc(). Use find_vm_area() to see if we can call vfree().

    Unfortunately it's a bit tricky to properly unmap and free shadow
    allocated during boot, so we'll have to keep it. If memory will come
    online again that shadow will be reused.

    Matthew asked: how can you call vfree() on something that isn't a
    vmalloc address?

    vfree() is able to free any address returned by
    __vmalloc_node_range(). And __vmalloc_node_range() gives you any
    address you ask. It doesn't have to be an address in [VMALLOC_START,
    VMALLOC_END] range.

    That's also how the module_alloc()/module_memfree() works on
    architectures that have designated area for modules.

    [aryabinin@virtuozzo.com: improve comments]
    Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
    [akpm@linux-foundation.org: fix typos, reflow comment]
    Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
    Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
    Signed-off-by: Andrey Ryabinin
    Reported-by: Paul Menzel
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Andrey Ryabinin
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 Aug, 2017

1 commit


03 Aug, 2017

1 commit

  • gcc-7 produces this warning:

    mm/kasan/report.c: In function 'kasan_report':
    mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    print_shadow_for_address(info->first_bad_addr);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here

    The code seems fine as we only print info.first_bad_addr when there is a
    shadow, and we always initialize it in that case, but this is relatively
    hard for gcc to figure out after the latest rework.

    Adding an intialization to the most likely value together with the other
    struct members shuts up that warning.

    Fixes: b235b9808664 ("kasan: unify report headers")
    Link: https://patchwork.kernel.org/patch/9641417/
    Link: http://lkml.kernel.org/r/20170725152739.4176967-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Suggested-by: Alexander Potapenko
    Suggested-by: Andrey Ryabinin
    Acked-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

26 Jul, 2017

1 commit

  • Currently kasan_check_read/write() accept 'const void*', make them
    accept 'const volatile void*'. This is required for instrumentation
    of atomic operations and there is just no reason to not allow that.

    Signed-off-by: Dmitry Vyukov
    Reviewed-by: Andrey Ryabinin
    Acked-by: Mark Rutland
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: kasan-dev@googlegroups.com
    Cc: linux-mm@kvack.org
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/33e5ec275c1ee89299245b2ebbccd63709c6021f.1498140838.git.dvyukov@google.com
    Signed-off-by: Ingo Molnar

    Dmitry Vyukov
     

11 Jul, 2017

5 commits

  • The helper function get_wild_bug_type() does not need to be in global
    scope, so make it static.

    Cleans up sparse warning:

    "symbol 'get_wild_bug_type' was not declared. Should it be static?"

    Link: http://lkml.kernel.org/r/20170622090049.10658-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     
  • They return positive value, that is, true, if non-zero value is found.
    Rename them to reduce confusion.

    Link: http://lkml.kernel.org/r/20170516012350.GA16015@js1304-desktop
    Signed-off-by: Joonsoo Kim
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • KASAN doesn't happen work with memory hotplug because hotplugged memory
    doesn't have any shadow memory. So any access to hotplugged memory
    would cause a crash on shadow check.

    Use memory hotplug notifier to allocate and map shadow memory when the
    hotplugged memory is going online and free shadow after the memory
    offlined.

    Link: http://lkml.kernel.org/r/20170601162338.23540-4-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Cc: "H. Peter Anvin"
    Cc: Alexander Potapenko
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • For some unaligned memory accesses we have to check additional byte of
    the shadow memory. Currently we load that byte speculatively to have
    only single load + branch on the optimistic fast path.

    However, this approach has some downsides:

    - It's unaligned access, so this prevents porting KASAN on
    architectures which doesn't support unaligned accesses.

    - We have to map additional shadow page to prevent crash if speculative
    load happens near the end of the mapped memory. This would
    significantly complicate upcoming memory hotplug support.

    I wasn't able to notice any performance degradation with this patch. So
    these speculative loads is just a pain with no gain, let's remove them.

    Link: http://lkml.kernel.org/r/20170601162338.23540-1-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Acked-by: Dmitry Vyukov
    Cc: Alexander Potapenko
    Cc: Mark Rutland
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • There is missing optimization in zero_p4d_populate() that can save some
    memory when mapping zero shadow. Implement it like as others.

    Link: http://lkml.kernel.org/r/1494829255-23946-1-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Andrey Ryabinin
    Cc: "Kirill A . Shutemov"
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

09 May, 2017

1 commit

  • __vmalloc* allows users to provide gfp flags for the underlying
    allocation. This API is quite popular

    $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
    77

    The only problem is that many people are not aware that they really want
    to give __GFP_HIGHMEM along with other flags because there is really no
    reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
    which are mapped to the kernel vmalloc space. About half of users don't
    use this flag, though. This signals that we make the API unnecessarily
    too complex.

    This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
    be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
    are simplified and drop the flag.

    Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Matthew Wilcox
    Cc: Al Viro
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Cristopher Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

04 May, 2017

9 commits

  • Makes the report easier to read.

    Link: http://lkml.kernel.org/r/20170302134851.101218-10-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Changes double-free report header from

    BUG: Double free or freeing an invalid pointer
    Unexpected shadow byte: 0xFB

    to

    BUG: KASAN: double-free or invalid-free in kmalloc_oob_left+0xe5/0xef

    This makes a bug uniquely identifiable by the first report line. To
    account for removing of the unexpected shadow value, print shadow bytes
    at the end of the report as in reports for other kinds of bugs.

    Link: http://lkml.kernel.org/r/20170302134851.101218-9-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Moves page description after the stacks since it's less important.

    Link: http://lkml.kernel.org/r/20170302134851.101218-8-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Changes slab object description from:

    Object at ffff880068388540, in cache kmalloc-128 size: 128

    to:

    The buggy address belongs to the object at ffff880068388540
    which belongs to the cache kmalloc-128 of size 128
    The buggy address is located 123 bytes inside of
    128-byte region [ffff880068388540, ffff8800683885c0)

    Makes it more explanatory and adds information about relative offset of
    the accessed address to the start of the object.

    Link: http://lkml.kernel.org/r/20170302134851.101218-7-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Change report header format from:

    BUG: KASAN: use-after-free in unwind_get_return_address+0x28a/0x2c0 at addr ffff880069437950
    Read of size 8 by task insmod/3925

    to:

    BUG: KASAN: use-after-free in unwind_get_return_address+0x28a/0x2c0
    Read of size 8 at addr ffff880069437950 by task insmod/3925

    The exact access address is not usually important, so move it to the
    second line. This also makes the header look visually balanced.

    Link: http://lkml.kernel.org/r/20170302134851.101218-6-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Simplify logic for describing a memory address. Add addr_to_page()
    helper function.

    Makes the code easier to follow.

    Link: http://lkml.kernel.org/r/20170302134851.101218-5-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Change stack traces headers from:

    Allocated:
    PID = 42

    to:

    Allocated by task 42:

    Makes the report one line shorter and look better.

    Link: http://lkml.kernel.org/r/20170302134851.101218-4-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Unify KASAN report header format for different kinds of bad memory
    accesses. Makes the code simpler.

    Link: http://lkml.kernel.org/r/20170302134851.101218-3-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Patch series "kasan: improve error reports", v2.

    This patchset improves KASAN reports by making them easier to read and a
    little more detailed. Also improves mm/kasan/report.c readability.

    Effectively changes a use-after-free report to:

    ==================================================================
    BUG: KASAN: use-after-free in kmalloc_uaf+0xaa/0xb6 [test_kasan]
    Write of size 1 at addr ffff88006aa59da8 by task insmod/3951

    CPU: 1 PID: 3951 Comm: insmod Tainted: G B 4.10.0+ #84
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    dump_stack+0x292/0x398
    print_address_description+0x73/0x280
    kasan_report.part.2+0x207/0x2f0
    __asan_report_store1_noabort+0x2c/0x30
    kmalloc_uaf+0xaa/0xb6 [test_kasan]
    kmalloc_tests_init+0x4f/0xa48 [test_kasan]
    do_one_initcall+0xf3/0x390
    do_init_module+0x215/0x5d0
    load_module+0x54de/0x82b0
    SYSC_init_module+0x3be/0x430
    SyS_init_module+0x9/0x10
    entry_SYSCALL_64_fastpath+0x1f/0xc2
    RIP: 0033:0x7f22cfd0b9da
    RSP: 002b:00007ffe69118a78 EFLAGS: 00000206 ORIG_RAX: 00000000000000af
    RAX: ffffffffffffffda RBX: 0000555671242090 RCX: 00007f22cfd0b9da
    RDX: 00007f22cffcaf88 RSI: 000000000004df7e RDI: 00007f22d0399000
    RBP: 00007f22cffcaf88 R08: 0000000000000003 R09: 0000000000000000
    R10: 00007f22cfd07d0a R11: 0000000000000206 R12: 0000555671243190
    R13: 000000000001fe81 R14: 0000000000000000 R15: 0000000000000004

    Allocated by task 3951:
    save_stack_trace+0x16/0x20
    save_stack+0x43/0xd0
    kasan_kmalloc+0xad/0xe0
    kmem_cache_alloc_trace+0x82/0x270
    kmalloc_uaf+0x56/0xb6 [test_kasan]
    kmalloc_tests_init+0x4f/0xa48 [test_kasan]
    do_one_initcall+0xf3/0x390
    do_init_module+0x215/0x5d0
    load_module+0x54de/0x82b0
    SYSC_init_module+0x3be/0x430
    SyS_init_module+0x9/0x10
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    Freed by task 3951:
    save_stack_trace+0x16/0x20
    save_stack+0x43/0xd0
    kasan_slab_free+0x72/0xc0
    kfree+0xe8/0x2b0
    kmalloc_uaf+0x85/0xb6 [test_kasan]
    kmalloc_tests_init+0x4f/0xa48 [test_kasan]
    do_one_initcall+0xf3/0x390
    do_init_module+0x215/0x5d0
    load_module+0x54de/0x82b0
    SYSC_init_module+0x3be/0x430
    SyS_init_module+0x9/0x10
    entry_SYSCALL_64_fastpath+0x1f/0xc

    The buggy address belongs to the object at ffff88006aa59da0
    which belongs to the cache kmalloc-16 of size 16
    The buggy address is located 8 bytes inside of
    16-byte region [ffff88006aa59da0, ffff88006aa59db0)
    The buggy address belongs to the page:
    page:ffffea0001aa9640 count:1 mapcount:0 mapping: (null) index:0x0
    flags: 0x100000000000100(slab)
    raw: 0100000000000100 0000000000000000 0000000000000000 0000000180800080
    raw: ffffea0001abe380 0000000700000007 ffff88006c401b40 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff88006aa59c80: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc
    ffff88006aa59d00: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc
    >ffff88006aa59d80: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
    ^
    ffff88006aa59e00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
    ffff88006aa59e80: fb fb fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc
    ==================================================================

    from:

    ==================================================================
    BUG: KASAN: use-after-free in kmalloc_uaf+0xaa/0xb6 [test_kasan] at addr ffff88006c4dcb28
    Write of size 1 by task insmod/3984
    CPU: 1 PID: 3984 Comm: insmod Tainted: G B 4.10.0+ #83
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    dump_stack+0x292/0x398
    kasan_object_err+0x1c/0x70
    kasan_report.part.1+0x20e/0x4e0
    __asan_report_store1_noabort+0x2c/0x30
    kmalloc_uaf+0xaa/0xb6 [test_kasan]
    kmalloc_tests_init+0x4f/0xa48 [test_kasan]
    do_one_initcall+0xf3/0x390
    do_init_module+0x215/0x5d0
    load_module+0x54de/0x82b0
    SYSC_init_module+0x3be/0x430
    SyS_init_module+0x9/0x10
    entry_SYSCALL_64_fastpath+0x1f/0xc2
    RIP: 0033:0x7feca0f779da
    RSP: 002b:00007ffdfeae5218 EFLAGS: 00000206 ORIG_RAX: 00000000000000af
    RAX: ffffffffffffffda RBX: 000055a064c13090 RCX: 00007feca0f779da
    RDX: 00007feca1236f88 RSI: 000000000004df7e RDI: 00007feca1605000
    RBP: 00007feca1236f88 R08: 0000000000000003 R09: 0000000000000000
    R10: 00007feca0f73d0a R11: 0000000000000206 R12: 000055a064c14190
    R13: 000000000001fe81 R14: 0000000000000000 R15: 0000000000000004
    Object at ffff88006c4dcb20, in cache kmalloc-16 size: 16
    Allocated:
    PID = 3984
    save_stack_trace+0x16/0x20
    save_stack+0x43/0xd0
    kasan_kmalloc+0xad/0xe0
    kmem_cache_alloc_trace+0x82/0x270
    kmalloc_uaf+0x56/0xb6 [test_kasan]
    kmalloc_tests_init+0x4f/0xa48 [test_kasan]
    do_one_initcall+0xf3/0x390
    do_init_module+0x215/0x5d0
    load_module+0x54de/0x82b0
    SYSC_init_module+0x3be/0x430
    SyS_init_module+0x9/0x10
    entry_SYSCALL_64_fastpath+0x1f/0xc2
    Freed:
    PID = 3984
    save_stack_trace+0x16/0x20
    save_stack+0x43/0xd0
    kasan_slab_free+0x73/0xc0
    kfree+0xe8/0x2b0
    kmalloc_uaf+0x85/0xb6 [test_kasan]
    kmalloc_tests_init+0x4f/0xa48 [test_kasan]
    do_one_initcall+0xf3/0x390
    do_init_module+0x215/0x5d0
    load_module+0x54de/0x82b0
    SYSC_init_module+0x3be/0x430
    SyS_init_module+0x9/0x10
    entry_SYSCALL_64_fastpath+0x1f/0xc2
    Memory state around the buggy address:
    ffff88006c4dca00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
    ffff88006c4dca80: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
    >ffff88006c4dcb00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
    ^
    ffff88006c4dcb80: fb fb fc fc 00 00 fc fc fb fb fc fc fb fb fc fc
    ffff88006c4dcc00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
    ==================================================================

    This patch (of 9):

    Introduce get_shadow_bug_type() function, which determines bug type
    based on the shadow value for a particular kernel address. Introduce
    get_wild_bug_type() function, which determines bug type for addresses
    which don't have a corresponding shadow value.

    Link: http://lkml.kernel.org/r/20170302134851.101218-2-andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

23 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

01 Apr, 2017

1 commit

  • Disable kasan after the first report. There are several reasons for
    this:

    - Single bug quite often has multiple invalid memory accesses causing
    storm in the dmesg.

    - Write OOB access might corrupt metadata so the next report will print
    bogus alloc/free stacktraces.

    - Reports after the first easily could be not bugs by itself but just
    side effects of the first one.

    Given that multiple reports usually only do harm, it makes sense to
    disable kasan after the first one. If user wants to see all the
    reports, the boot-time parameter kasan_multi_shot must be used.

    [aryabinin@virtuozzo.com: wrote changelog and doc, added missing include]
    Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.com
    Signed-off-by: Mark Rutland
    Signed-off-by: Andrey Ryabinin
    Cc: Andrey Konovalov
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Rutland
     

11 Mar, 2017

1 commit

  • Merge 5-level page table prep from Kirill Shutemov:
    "Here's relatively low-risk part of 5-level paging patchset. Merging it
    now will make x86 5-level paging enabling in v4.12 easier.

    The first patch is actually x86-specific: detect 5-level paging
    support. It boils down to single define.

    The rest of patchset converts Linux MMU abstraction from 4- to 5-level
    paging.

    Enabling of new abstraction in most cases requires adding single line
    of code in arch-specific code. The rest is taken care by asm-generic/.

    Changes to mm/ code are mostly mechanical: add support for new page
    table level -- p4d_t -- where we deal with pud_t now.

    v2:
    - fix build on microblaze (Michal);
    - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow();
    - acks from Michal"

    * emailed patches from Kirill A Shutemov :
    mm: introduce __p4d_alloc()
    mm: convert generic code to 5-level paging
    asm-generic: introduce
    arch, mm: convert all architectures to use 5level-fixup.h
    asm-generic: introduce __ARCH_USE_5LEVEL_HACK
    asm-generic: introduce 5level-fixup.h
    x86/cpufeature: Add 5-level paging detection

    Linus Torvalds
     

10 Mar, 2017

3 commits

  • quarantine_remove_cache() frees all pending objects that belong to the
    cache, before we destroy the cache itself. However there are currently
    two possibilities how it can fail to do so.

    First, another thread can hold some of the objects from the cache in
    temp list in quarantine_put(). quarantine_put() has a windows of
    enabled interrupts, and on_each_cpu() in quarantine_remove_cache() can
    finish right in that window. These objects will be later freed into the
    destroyed cache.

    Then, quarantine_reduce() has the same problem. It grabs a batch of
    objects from the global quarantine, then unlocks quarantine_lock and
    then frees the batch. quarantine_remove_cache() can finish while some
    objects from the cache are still in the local to_free list in
    quarantine_reduce().

    Fix the race with quarantine_put() by disabling interrupts for the whole
    duration of quarantine_put(). In combination with on_each_cpu() in
    quarantine_remove_cache() it ensures that quarantine_remove_cache()
    either sees the objects in the per-cpu list or in the global list.

    Fix the race with quarantine_reduce() by protecting quarantine_reduce()
    with srcu critical section and then doing synchronize_srcu() at the end
    of quarantine_remove_cache().

    I've done some assessment of how good synchronize_srcu() works in this
    case. And on a 4 CPU VM I see that it blocks waiting for pending read
    critical sections in about 2-3% of cases. Which looks good to me.

    I suspect that these races are the root cause of some GPFs that I
    episodically hit. Previously I did not have any explanation for them.

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
    IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
    PGD 6aeea067
    PUD 60ed7067
    PMD 0
    Oops: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff88005f948040 task.stack: ffff880069818000
    RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
    RSP: 0018:ffff88006981f298 EFLAGS: 00010246
    RAX: ffffea0000ffff00 RBX: 0000000000000000 RCX: ffffea0000ffff1f
    RDX: 0000000000000000 RSI: ffff88003fffc3e0 RDI: 0000000000000000
    RBP: ffff88006981f2c0 R08: ffff88002fed7bd8 R09: 00000001001f000d
    R10: 00000000001f000d R11: ffff88006981f000 R12: ffff88003fffc3e0
    R13: ffff88006981f2d0 R14: ffffffff81877fae R15: 0000000080000000
    FS: 00007fb911a2d700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000c8 CR3: 0000000060ed6000 CR4: 00000000000006f0
    Call Trace:
    quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239
    kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
    slab_post_alloc_hook mm/slab.h:456 [inline]
    slab_alloc_node mm/slub.c:2718 [inline]
    kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754
    __alloc_skb+0x10f/0x770 net/core/skbuff.c:219
    alloc_skb include/linux/skbuff.h:932 [inline]
    _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388
    sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline]
    sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746
    sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266
    sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962
    inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    SYSC_sendto+0x660/0x810 net/socket.c:1685
    SyS_sendto+0x40/0x50 net/socket.c:1653

    I am not sure about backporting. The bug is quite hard to trigger, I've
    seen it few times during our massive continuous testing (however, it
    could be cause of some other episodic stray crashes as it leads to
    memory corruption...). If it is triggered, the consequences are very
    bad -- almost definite bad memory corruption. The fix is non trivial
    and has chances of introducing new bugs. I am also not sure how
    actively people use KASAN on older releases.

    [dvyukov@google.com: - sorted includes[
    Link: http://lkml.kernel.org/r/20170309094028.51088-1-dvyukov@google.com
    Link: http://lkml.kernel.org/r/20170308151532.5070-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • We see reported stalls/lockups in quarantine_remove_cache() on machines
    with large amounts of RAM. quarantine_remove_cache() needs to scan
    whole quarantine in order to take out all objects belonging to the
    cache. Quarantine is currently 1/32-th of RAM, e.g. on a machine with
    256GB of memory that will be 8GB. Moreover quarantine scanning is a
    walk over uncached linked list, which is slow.

    Add cond_resched() after scanning of each non-empty batch of objects.
    Batches are specifically kept of reasonable size for quarantine_put().
    On a machine with 256GB of RAM we should have ~512 non-empty batches,
    each with 16MB of objects.

    Link: http://lkml.kernel.org/r/20170308154239.25440-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Greg Thelen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • Convert all non-architecture-specific code to 5-level paging.

    It's mostly mechanical adding handling one more page table level in
    places where we deal with pud_t.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

02 Mar, 2017

2 commits

  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • is a low level header that is included early
    in affected kernel headers. But it includes
    which complicates the cleanup of sched.h dependencies.

    But kasan.h has almost no need for sched.h: its only use of
    scheduler functionality is in two inline functions which are
    not used very frequently - so uninline kasan_enable_current()
    and kasan_disable_current().

    Also add a dependency to a .c file that depended
    on kasan.h including it.

    This paves the way to remove the include from kasan.h.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

25 Feb, 2017

1 commit

  • Per memcg slab accounting and kasan have a problem with kmem_cache
    destruction.
    - kmem_cache_create() allocates a kmem_cache, which is used for
    allocations from processes running in root (top) memcg.
    - Processes running in non root memcg and allocating with either
    __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
    kmem_cache.
    - Kasan catches use-after-free by having kfree() and kmem_cache_free()
    defer freeing of objects. Objects are placed in a quarantine.
    - kmem_cache_destroy() destroys root and non root kmem_caches. It takes
    care to drain the quarantine of objects from the root memcg's
    kmem_cache, but ignores objects associated with non root memcg. This
    causes leaks because quarantined per memcg objects refer to per memcg
    kmem cache being destroyed.

    To see the problem:

    1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
    2) from non root memcg, allocate and free a few objects from cache
    3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
    will trigger a "Slab cache still has objects" warning indicating
    that the per memcg kmem_cache structure was leaked.

    Fix the leak by draining kasan quarantined objects allocated from non
    root memcg.

    Racing memcg deletion is tricky, but handled. kmem_cache_destroy() =>
    shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
    flushes per memcg quarantined objects, even if that memcg has been
    rmdir'd and gone through memcg_deactivate_kmem_caches().

    This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
    enabled. So I don't think it's worth patching stable kernels.

    Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com
    Signed-off-by: Greg Thelen
    Reviewed-by: Vladimir Davydov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     

23 Feb, 2017

1 commit

  • Pull arm64 updates from Will Deacon:
    - Errata workarounds for Qualcomm's Falkor CPU
    - Qualcomm L2 Cache PMU driver
    - Qualcomm SMCCC firmware quirk
    - Support for DEBUG_VIRTUAL
    - CPU feature detection for userspace via MRS emulation
    - Preliminary work for the Statistical Profiling Extension
    - Misc cleanups and non-critical fixes

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (74 commits)
    arm64/kprobes: consistently handle MRS/MSR with XZR
    arm64: cpufeature: correctly handle MRS to XZR
    arm64: traps: correctly handle MRS/MSR with XZR
    arm64: ptrace: add XZR-safe regs accessors
    arm64: include asm/assembler.h in entry-ftrace.S
    arm64: fix warning about swapper_pg_dir overflow
    arm64: Work around Falkor erratum 1003
    arm64: head.S: Enable EL1 (host) access to SPE when entered at EL2
    arm64: arch_timer: document Hisilicon erratum 161010101
    arm64: use is_vmalloc_addr
    arm64: use linux/sizes.h for constants
    arm64: uaccess: consistently check object sizes
    perf: add qcom l2 cache perf events driver
    arm64: remove wrong CONFIG_PROC_SYSCTL ifdef
    ARM: smccc: Update HVC comment to describe new quirk parameter
    arm64: do not trace atomic operations
    ACPI/IORT: Fix the error return code in iort_add_smmu_platform_device()
    ACPI/IORT: Fix iort_node_get_id() mapping entries indexing
    arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA
    perf: xgene: Include module.h
    ...

    Linus Torvalds
     

04 Feb, 2017

1 commit

  • After much waiting I finally reproduced a KASAN issue, only to find my
    trace-buffer empty of useful information because it got spooled out :/

    Make kasan_report honour the /proc/sys/kernel/traceoff_on_warning
    interface.

    Link: http://lkml.kernel.org/r/20170125164106.3514-1-aryabinin@virtuozzo.com
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Andrey Ryabinin
    Acked-by: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

11 Jan, 2017

1 commit

  • __pa_symbol is the correct API to find the physical address of symbols.
    Switch to it to allow for debugging APIs to work correctly. Other
    functions such as p*d_populate may call __pa internally. Ensure that the
    address passed is in the linear region by calling lm_alias.

    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Signed-off-by: Laura Abbott
    Signed-off-by: Will Deacon

    Laura Abbott
     

14 Dec, 2016

1 commit

  • Pull power management updates from Rafael Wysocki:
    "Again, cpufreq gets more changes than the other parts this time (one
    new driver, one old driver less, a bunch of enhancements of the
    existing code, new CPU IDs, fixes, cleanups)

    There also are some changes in cpuidle (idle injection rework, a
    couple of new CPU IDs, online/offline rework in intel_idle, fixes and
    cleanups), in the generic power domains framework (mostly related to
    supporting power domains containing CPUs), and in the Operating
    Performance Points (OPP) library (mostly related to supporting devices
    with multiple voltage regulators)

    In addition to that, the system sleep state selection interface is
    modified to make it easier for distributions with unchanged user space
    to support suspend-to-idle as the default system suspend method, some
    issues are fixed in the PM core, the latency tolerance PM QoS
    framework is improved a bit, the Intel RAPL power capping driver is
    cleaned up and there are some fixes and cleanups in the devfreq
    subsystem

    Specifics:

    - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding
    for it (Markus Mayer)

    - Support for ARM Integrator/AP and Integrator/CP in the generic DT
    cpufreq driver and elimination of the old Integrator cpufreq driver
    (Linus Walleij)

    - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier,
    and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie,
    Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik)

    - cpufreq core fix to eliminate races that may lead to using inactive
    policy objects and related cleanups (Rafael Wysocki)

    - cpufreq schedutil governor update to make it use SCHED_FIFO kernel
    threads (instead of regular workqueues) for doing delayed work (to
    reduce the response latency in some cases) and related cleanups
    (Viresh Kumar)

    - New cpufreq sysfs attribute for resetting statistics (Markus Mayer)

    - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis,
    Viresh Kumar)

    - Support for using generic cpufreq governors in the intel_pstate
    driver (Rafael Wysocki)

    - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy
    Performance Preference/Energy Performance Bias) knobs in the
    intel_pstate driver (Srinivas Pandruvada)

    - New CPU ID for Knights Mill in intel_pstate (Piotr Luc)

    - intel_pstate driver modification to use the P-state selection
    algorithm based on CPU load on platforms with the system profile in
    the ACPI tables set to "mobile" (Srinivas Pandruvada)

    - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki,
    Srinivas Pandruvada)

    - cpufreq powernv driver updates including fast switching support
    (for the schedutil governor), fixes and cleanus (Akshay Adiga,
    Andrew Donnellan, Denis Kirjanov)

    - acpi-cpufreq driver rework to switch it over to the new CPU
    offline/online state machine (Sebastian Andrzej Siewior)

    - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth
    Prakash)

    - Idle injection rework (to make it use the regular idle path instead
    of a home-grown custom one) and related powerclamp thermal driver
    updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej
    Siewior)

    - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy
    Shevchenko, Piotr Luc)

    - intel_idle driver cleanups and switch over to using the new CPU
    offline/online state machine (Anna-Maria Gleixner, Sebastian
    Andrzej Siewior)

    - cpuidle DT driver update to support suspend-to-idle properly
    (Sudeep Holla)

    - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian,
    Rafael Wysocki)

    - Preliminary support for power domains including CPUs in the generic
    power domains (genpd) framework and related DT bindings (Lina Iyer)

    - Assorted fixes and cleanups in the generic power domains (genpd)
    framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven)

    - Preliminary support for devices with multiple voltage regulators
    and related fixes and cleanups in the Operating Performance Points
    (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd)

    - System sleep state selection interface rework to make it easier to
    support suspend-to-idle as the default system suspend method
    (Rafael Wysocki)

    - PM core fixes and cleanups, mostly related to the interactions
    between the system suspend and runtime PM frameworks (Ulf Hansson,
    Sahitya Tummala, Tony Lindgren)

    - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski)

    - New Knights Mill CPU ID for the Intel RAPL power capping driver
    (Piotr Luc)

    - Intel RAPL power capping driver fixes, cleanups and switch over to
    using the new CPU offline/online state machine (Jacob Pan, Thomas
    Gleixner, Sebastian Andrzej Siewior)

    - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc,
    rockchip-dfi devfreq drivers and the devfreq core (Axel Lin,
    Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar)

    - Fix for false-positive KASAN warnings during resume from ACPI S3
    (suspend-to-RAM) on x86 (Josh Poimboeuf)

    - Memory map verification during resume from hibernation on x86 to
    ensure a consistent address space layout (Chen Yu)

    - Wakeup sources debugging enhancement (Xing Wei)

    - rockchip-io AVS driver cleanup (Shawn Lin)"

    * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits)
    devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks
    devfreq: rk3399_dmc: Remove dangling rcu_read_unlock()
    devfreq: exynos: Don't use OPP structures outside of RCU locks
    Documentation: intel_pstate: Document HWP energy/performance hints
    cpufreq: intel_pstate: Support for energy performance hints with HWP
    cpufreq: intel_pstate: Add locking around HWP requests
    PM / sleep: Print active wakeup sources when blocking on wakeup_count reads
    PM / core: Fix bug in the error handling of async suspend
    PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend
    PM / Domains: Fix compatible for domain idle state
    PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
    PM / OPP: Allow platform specific custom set_opp() callbacks
    PM / OPP: Separate out _generic_set_opp()
    PM / OPP: Add infrastructure to manage multiple regulators
    PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
    PM / OPP: Manage supply's voltage/current in a separate structure
    PM / OPP: Don't use OPP structure outside of rcu protected section
    PM / OPP: Reword binding supporting multiple regulators per device
    PM / OPP: Fix incorrect cpu-supply property in binding
    cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state()
    ..

    Linus Torvalds
     

13 Dec, 2016

2 commits

  • Currently we dedicate 1/32 of RAM for quarantine and then reduce it by
    1/4 of total quarantine size. This can be a significant amount of
    memory. For example, with 4GB of RAM total quarantine size is 128MB and
    it is reduced by 32MB at a time. With 128GB of RAM total quarantine
    size is 4GB and it is reduced by 1GB. This leads to several problems:

    - freeing 1GB can take tens of seconds, causes rcu stall warnings and
    just introduces unexpected long delays at random places
    - if kmalloc() is called under a mutex, other threads stall on that
    mutex while a thread reduces quarantine
    - threads wait on quarantine_lock while one thread grabs a large batch
    of objects to evict
    - we walk the uncached list of object to free twice which makes all of
    the above worse
    - when a thread frees objects, they are already not accounted against
    global_quarantine.bytes; as the result we can have quarantine_size
    bytes in quarantine + unbounded amount of memory in large batches in
    threads that are in process of freeing

    Reduce size of quarantine in smaller batches to reduce the delays. The
    only reason to reduce it in batches is amortization of overheads, the
    new batch size of 1MB should be well enough to amortize spinlock
    lock/unlock and few function calls.

    Plus organize quarantine as a FIFO array of batches. This allows to not
    walk the list in quarantine_reduce() under quarantine_lock, which in
    turn reduces contention and is just faster.

    This improves performance of heavy load (syzkaller fuzzing) by ~20% with
    4 CPUs and 32GB of RAM. Also this eliminates frequent (every 5 sec)
    drops of CPU consumption from ~400% to ~100% (one thread reduces
    quarantine while others are waiting on a mutex).

    Some reference numbers:
    1. Machine with 4 CPUs and 4GB of memory. Quarantine size 128MB.
    Currently we free 32MB at at time.
    With new code we free 1MB at a time (1024 batches, ~128 are used).
    2. Machine with 32 CPUs and 128GB of memory. Quarantine size 4GB.
    Currently we free 1GB at at time.
    With new code we free 8MB at a time (1024 batches, ~512 are used).
    3. Machine with 4096 CPUs and 1TB of memory. Quarantine size 32GB.
    Currently we free 8GB at at time.
    With new code we free 4MB at a time (16K batches, ~8K are used).

    Link: http://lkml.kernel.org/r/1478756952-18695-1-git-send-email-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Eric Dumazet
    Cc: Greg Thelen
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • If user sets panic_on_warn, he wants kernel to panic if there is
    anything barely wrong with the kernel. KASAN-detected errors are
    definitely not less benign than an arbitrary kernel WARNING.

    Panic after KASAN errors if panic_on_warn is set.

    We use this for continuous fuzzing where we want kernel to stop and
    reboot on any error.

    Link: http://lkml.kernel.org/r/1476694764-31986-1-git-send-email-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov