10 Mar, 2017

2 commits

  • quarantine_remove_cache() frees all pending objects that belong to the
    cache, before we destroy the cache itself. However there are currently
    two possibilities how it can fail to do so.

    First, another thread can hold some of the objects from the cache in
    temp list in quarantine_put(). quarantine_put() has a windows of
    enabled interrupts, and on_each_cpu() in quarantine_remove_cache() can
    finish right in that window. These objects will be later freed into the
    destroyed cache.

    Then, quarantine_reduce() has the same problem. It grabs a batch of
    objects from the global quarantine, then unlocks quarantine_lock and
    then frees the batch. quarantine_remove_cache() can finish while some
    objects from the cache are still in the local to_free list in
    quarantine_reduce().

    Fix the race with quarantine_put() by disabling interrupts for the whole
    duration of quarantine_put(). In combination with on_each_cpu() in
    quarantine_remove_cache() it ensures that quarantine_remove_cache()
    either sees the objects in the per-cpu list or in the global list.

    Fix the race with quarantine_reduce() by protecting quarantine_reduce()
    with srcu critical section and then doing synchronize_srcu() at the end
    of quarantine_remove_cache().

    I've done some assessment of how good synchronize_srcu() works in this
    case. And on a 4 CPU VM I see that it blocks waiting for pending read
    critical sections in about 2-3% of cases. Which looks good to me.

    I suspect that these races are the root cause of some GPFs that I
    episodically hit. Previously I did not have any explanation for them.

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
    IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
    PGD 6aeea067
    PUD 60ed7067
    PMD 0
    Oops: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff88005f948040 task.stack: ffff880069818000
    RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
    RSP: 0018:ffff88006981f298 EFLAGS: 00010246
    RAX: ffffea0000ffff00 RBX: 0000000000000000 RCX: ffffea0000ffff1f
    RDX: 0000000000000000 RSI: ffff88003fffc3e0 RDI: 0000000000000000
    RBP: ffff88006981f2c0 R08: ffff88002fed7bd8 R09: 00000001001f000d
    R10: 00000000001f000d R11: ffff88006981f000 R12: ffff88003fffc3e0
    R13: ffff88006981f2d0 R14: ffffffff81877fae R15: 0000000080000000
    FS: 00007fb911a2d700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000c8 CR3: 0000000060ed6000 CR4: 00000000000006f0
    Call Trace:
    quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239
    kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
    slab_post_alloc_hook mm/slab.h:456 [inline]
    slab_alloc_node mm/slub.c:2718 [inline]
    kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754
    __alloc_skb+0x10f/0x770 net/core/skbuff.c:219
    alloc_skb include/linux/skbuff.h:932 [inline]
    _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388
    sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline]
    sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746
    sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266
    sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962
    inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    SYSC_sendto+0x660/0x810 net/socket.c:1685
    SyS_sendto+0x40/0x50 net/socket.c:1653

    I am not sure about backporting. The bug is quite hard to trigger, I've
    seen it few times during our massive continuous testing (however, it
    could be cause of some other episodic stray crashes as it leads to
    memory corruption...). If it is triggered, the consequences are very
    bad -- almost definite bad memory corruption. The fix is non trivial
    and has chances of introducing new bugs. I am also not sure how
    actively people use KASAN on older releases.

    [dvyukov@google.com: - sorted includes[
    Link: http://lkml.kernel.org/r/20170309094028.51088-1-dvyukov@google.com
    Link: http://lkml.kernel.org/r/20170308151532.5070-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • We see reported stalls/lockups in quarantine_remove_cache() on machines
    with large amounts of RAM. quarantine_remove_cache() needs to scan
    whole quarantine in order to take out all objects belonging to the
    cache. Quarantine is currently 1/32-th of RAM, e.g. on a machine with
    256GB of memory that will be 8GB. Moreover quarantine scanning is a
    walk over uncached linked list, which is slow.

    Add cond_resched() after scanning of each non-empty batch of objects.
    Batches are specifically kept of reasonable size for quarantine_put().
    On a machine with 256GB of RAM we should have ~512 non-empty batches,
    each with 16MB of objects.

    Link: http://lkml.kernel.org/r/20170308154239.25440-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Greg Thelen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

25 Feb, 2017

1 commit

  • Per memcg slab accounting and kasan have a problem with kmem_cache
    destruction.
    - kmem_cache_create() allocates a kmem_cache, which is used for
    allocations from processes running in root (top) memcg.
    - Processes running in non root memcg and allocating with either
    __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
    kmem_cache.
    - Kasan catches use-after-free by having kfree() and kmem_cache_free()
    defer freeing of objects. Objects are placed in a quarantine.
    - kmem_cache_destroy() destroys root and non root kmem_caches. It takes
    care to drain the quarantine of objects from the root memcg's
    kmem_cache, but ignores objects associated with non root memcg. This
    causes leaks because quarantined per memcg objects refer to per memcg
    kmem cache being destroyed.

    To see the problem:

    1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
    2) from non root memcg, allocate and free a few objects from cache
    3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
    will trigger a "Slab cache still has objects" warning indicating
    that the per memcg kmem_cache structure was leaked.

    Fix the leak by draining kasan quarantined objects allocated from non
    root memcg.

    Racing memcg deletion is tricky, but handled. kmem_cache_destroy() =>
    shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
    flushes per memcg quarantined objects, even if that memcg has been
    rmdir'd and gone through memcg_deactivate_kmem_caches().

    This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
    enabled. So I don't think it's worth patching stable kernels.

    Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com
    Signed-off-by: Greg Thelen
    Reviewed-by: Vladimir Davydov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     

13 Dec, 2016

1 commit

  • Currently we dedicate 1/32 of RAM for quarantine and then reduce it by
    1/4 of total quarantine size. This can be a significant amount of
    memory. For example, with 4GB of RAM total quarantine size is 128MB and
    it is reduced by 32MB at a time. With 128GB of RAM total quarantine
    size is 4GB and it is reduced by 1GB. This leads to several problems:

    - freeing 1GB can take tens of seconds, causes rcu stall warnings and
    just introduces unexpected long delays at random places
    - if kmalloc() is called under a mutex, other threads stall on that
    mutex while a thread reduces quarantine
    - threads wait on quarantine_lock while one thread grabs a large batch
    of objects to evict
    - we walk the uncached list of object to free twice which makes all of
    the above worse
    - when a thread frees objects, they are already not accounted against
    global_quarantine.bytes; as the result we can have quarantine_size
    bytes in quarantine + unbounded amount of memory in large batches in
    threads that are in process of freeing

    Reduce size of quarantine in smaller batches to reduce the delays. The
    only reason to reduce it in batches is amortization of overheads, the
    new batch size of 1MB should be well enough to amortize spinlock
    lock/unlock and few function calls.

    Plus organize quarantine as a FIFO array of batches. This allows to not
    walk the list in quarantine_reduce() under quarantine_lock, which in
    turn reduces contention and is just faster.

    This improves performance of heavy load (syzkaller fuzzing) by ~20% with
    4 CPUs and 32GB of RAM. Also this eliminates frequent (every 5 sec)
    drops of CPU consumption from ~400% to ~100% (one thread reduces
    quarantine while others are waiting on a mutex).

    Some reference numbers:
    1. Machine with 4 CPUs and 4GB of memory. Quarantine size 128MB.
    Currently we free 32MB at at time.
    With new code we free 1MB at a time (1024 batches, ~128 are used).
    2. Machine with 32 CPUs and 128GB of memory. Quarantine size 4GB.
    Currently we free 1GB at at time.
    With new code we free 8MB at a time (1024 batches, ~512 are used).
    3. Machine with 4096 CPUs and 1TB of memory. Quarantine size 32GB.
    Currently we free 8GB at at time.
    With new code we free 4MB at a time (16K batches, ~8K are used).

    Link: http://lkml.kernel.org/r/1478756952-18695-1-git-send-email-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Eric Dumazet
    Cc: Greg Thelen
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

12 Aug, 2016

1 commit

  • It's quite unlikely that the user will so little memory that the per-CPU
    quarantines won't fit into the given fraction of the available memory.
    Even in that case he won't be able to do anything with the information
    given in the warning.

    Link: http://lkml.kernel.org/r/1470929182-101413-1-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Acked-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Andrey Konovalov
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Kuthonuzo Luruo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

03 Aug, 2016

3 commits

  • If the total amount of memory assigned to quarantine is less than the
    amount of memory assigned to per-cpu quarantines, |new_quarantine_size|
    may overflow. Instead, set it to zero.

    [akpm@linux-foundation.org: cleanup: use WARN_ONCE return value]
    Link: http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-glider@google.com
    Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
    Signed-off-by: Alexander Potapenko
    Reported-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • The state of object currently tracked in two places - shadow memory, and
    the ->state field in struct kasan_alloc_meta. We can get rid of the
    latter. The will save us a little bit of memory. Also, this allow us
    to move free stack into struct kasan_alloc_meta, without increasing
    memory consumption. So now we should always know when the last time the
    object was freed. This may be useful for long delayed use-after-free
    bugs.

    As a side effect this fixes following UBSAN warning:
    UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
    member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
    which requires 8 byte alignment

    Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
    Reported-by: kernel test robot
    Signed-off-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • SLUB doesn't require disabled interrupts to call ___cache_free().

    Link: http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Acked-by: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

15 Jul, 2016

1 commit

  • There are two bugs on qlist_move_cache(). One is that qlist's tail
    isn't set properly. curr->next can be NULL since it is singly linked
    list and NULL value on tail is invalid if there is one item on qlist.
    Another one is that if cache is matched, qlist_put() is called and it
    will set curr->next to NULL. It would cause to stop the loop
    prematurely.

    These problems come from complicated implementation so I'd like to
    re-implement it completely. Implementation in this patch is really
    simple. Iterate all qlist_nodes and put them to appropriate list.

    Unfortunately, I got this bug sometime ago and lose oops message. But,
    the bug looks trivial and no need to attach oops.

    Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
    Link: http://lkml.kernel.org/r/1467766348-22419-1-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Reviewed-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Acked-by: Alexander Potapenko
    Cc: Kuthonuzo Luruo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

21 May, 2016

1 commit

  • Quarantine isolates freed objects in a separate queue. The objects are
    returned to the allocator later, which helps to detect use-after-free
    errors.

    When the object is freed, its state changes from KASAN_STATE_ALLOC to
    KASAN_STATE_QUARANTINE. The object is poisoned and put into quarantine
    instead of being returned to the allocator, therefore every subsequent
    access to that object triggers a KASAN error, and the error handler is
    able to say where the object has been allocated and deallocated.

    When it's time for the object to leave quarantine, its state becomes
    KASAN_STATE_FREE and it's returned to the allocator. From now on the
    allocator may reuse it for another allocation. Before that happens,
    it's still possible to detect a use-after free on that object (it
    retains the allocation/deallocation stacks).

    When the allocator reuses this object, the shadow is unpoisoned and old
    allocation/deallocation stacks are wiped. Therefore a use of this
    object, even an incorrect one, won't trigger ASan warning.

    Without the quarantine, it's not guaranteed that the objects aren't
    reused immediately, that's why the probability of catching a
    use-after-free is lower than with quarantine in place.

    Quarantine isolates freed objects in a separate queue. The objects are
    returned to the allocator later, which helps to detect use-after-free
    errors.

    Freed objects are first added to per-cpu quarantine queues. When a
    cache is destroyed or memory shrinking is requested, the objects are
    moved into the global quarantine queue. Whenever a kmalloc call allows
    memory reclaiming, the oldest objects are popped out of the global queue
    until the total size of objects in quarantine is less than 3/4 of the
    maximum quarantine size (which is a fraction of installed physical
    memory).

    As long as an object remains in the quarantine, KASAN is able to report
    accesses to it, so the chance of reporting a use-after-free is
    increased. Once the object leaves quarantine, the allocator may reuse
    it, in which case the object is unpoisoned and KASAN can't detect
    incorrect accesses to it.

    Right now quarantine support is only enabled in SLAB allocator.
    Unification of KASAN features in SLAB and SLUB will be done later.

    This patch is based on the "mm: kasan: quarantine" patch originally
    prepared by Dmitry Chernenkov. A number of improvements have been
    suggested by Andrey Ryabinin.

    [glider@google.com: v9]
    Link: http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko