03 Apr, 2019

1 commit

  • commit 6d6ea1e967a246f12cfe2f5fb743b70b2e608d4a upstream.

    Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
    v6.

    This is a followup to the discussion in [1], [2].

    IOMMUs using ARMv7 short-descriptor format require page tables (level 1
    and 2) to be allocated within the first 4GB of RAM, even on 64-bit
    systems.

    For L1 tables that are bigger than a page, we can just use
    __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
    use GFP_DMA).

    For L2 tables that only take 1KB, it would be a waste to allocate a full
    page, so we considered 3 approaches:
    1. This series, adding support for GFP_DMA32 slab caches.
    2. genalloc, which requires pre-allocating the maximum number of L2 page
    tables (4096, so 4MB of memory).
    3. page_frag, which is not very memory-efficient as it is unable to reuse
    freed fragments until the whole page is freed. [3]

    This series is the most memory-efficient approach.

    stable@ note:
    We confirmed that this is a regression, and IOMMU errors happen on 4.19
    and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
    most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA
    with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
    platforms (and maybe others?).

    [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
    [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
    [3] https://patchwork.codeaurora.org/patch/671639/

    This patch (of 3):

    IOMMUs using ARMv7 short-descriptor format require page tables to be
    allocated within the first 4GB of RAM, even on 64-bit systems. On arm64,
    this is done by passing GFP_DMA32 flag to memory allocation functions.

    For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
    a full page using get_free_pages, so we considered 3 approaches:
    1. This patch, adding support for GFP_DMA32 slab caches.
    2. genalloc, which requires pre-allocating the maximum number of L2
    page tables (4096, so 4MB of memory).
    3. page_frag, which is not very memory-efficient as it is unable
    to reuse freed fragments until the whole page is freed.

    This change makes it possible to create a custom cache in DMA32 zone using
    kmem_cache_create, then allocate memory using kmem_cache_alloc.

    We do not create a DMA32 kmalloc cache array, as there are currently no
    users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a
    warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.

    This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
    kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
    unnecessary).

    Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org
    Signed-off-by: Nicolas Boichat
    Acked-by: Vlastimil Babka
    Acked-by: Will Deacon
    Cc: Robin Murphy
    Cc: Joerg Roedel
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Sasha Levin
    Cc: Huaisheng Ye
    Cc: Mike Rapoport
    Cc: Yong Wu
    Cc: Matthias Brugger
    Cc: Tomasz Figa
    Cc: Yingjoe Chen
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Hsin-Yi Wang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Boichat
     

30 Aug, 2018

1 commit


18 Aug, 2018

1 commit

  • In SLUB, prefetch_freepointer() is used when allocating an object from
    cache's freelist, to make sure the next object in the list is cache-hot,
    since it's probable it will be allocated soon.

    Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has
    unintentionally changed the prefetch in a way where the prefetch is
    turned to a real fetch, and only the next->next pointer is prefetched.
    In case there is not a stream of allocations that would benefit from
    prefetching, the extra real fetch might add a useless cache miss to the
    allocation. Restore the previous behavior.

    Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz
    Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation")
    Signed-off-by: Vlastimil Babka
    Acked-by: Kees Cook
    Cc: Daniel Micay
    Cc: Eric Dumazet
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

29 Jun, 2018

1 commit

  • In kernel 4.17 I removed some code from dm-bufio that did slab cache
    merging (commit 21bb13276768: "dm bufio: remove code that merges slab
    caches") - both slab and slub support merging caches with identical
    attributes, so dm-bufio now just calls kmem_cache_create and relies on
    implicit merging.

    This uncovered a bug in the slub subsystem - if we delete a cache and
    immediatelly create another cache with the same attributes, it fails
    because of duplicate filename in /sys/kernel/slab/. The slub subsystem
    offloads freeing the cache to a workqueue - and if we create the new
    cache before the workqueue runs, it complains because of duplicate
    filename in sysfs.

    This patch fixes the bug by moving the call of kobject_del from
    sysfs_slab_remove_workfn to shutdown_cache. kobject_del must be called
    while we hold slab_mutex - so that the sysfs entry is deleted before a
    cache with the same attributes could be created.

    Running device-mapper-test-suite with:

    dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/

    triggered:

    Buffer I/O error on dev dm-0, logical block 1572848, async page read
    device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5
    device-mapper: thin: 253:1: aborting current metadata transaction
    sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144'
    CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25
    Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017
    Workqueue: dm-thin do_worker [dm_thin_pool]
    Call Trace:
    dump_stack+0x5a/0x73
    sysfs_warn_dup+0x58/0x70
    sysfs_create_dir_ns+0x77/0x80
    kobject_add_internal+0xba/0x2e0
    kobject_init_and_add+0x70/0xb0
    sysfs_slab_add+0xb1/0x250
    __kmem_cache_create+0x116/0x150
    create_cache+0xd9/0x1f0
    kmem_cache_create_usercopy+0x1c1/0x250
    kmem_cache_create+0x18/0x20
    dm_bufio_client_create+0x1ae/0x410 [dm_bufio]
    dm_block_manager_create+0x5e/0x90 [dm_persistent_data]
    __create_persistent_data_objects+0x38/0x940 [dm_thin_pool]
    dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool]
    metadata_operation_failed+0x59/0x100 [dm_thin_pool]
    alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool]
    process_cell+0x2a3/0x550 [dm_thin_pool]
    do_worker+0x28d/0x8f0 [dm_thin_pool]
    process_one_work+0x171/0x370
    worker_thread+0x49/0x3f0
    kthread+0xf8/0x130
    ret_from_fork+0x35/0x40
    kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory.
    kmem_cache_create(dm_bufio_buffer-16) failed with error -17

    Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Mikulas Patocka
    Reported-by: Mike Snitzer
    Tested-by: Mike Snitzer
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

13 Jun, 2018

2 commits

  • The kzalloc() function has a 2-factor argument form, kcalloc(). This
    patch replaces cases of:

    kzalloc(a * b, gfp)

    with:
    kcalloc(a * b, gfp)

    as well as handling cases of:

    kzalloc(a * b * c, gfp)

    with:

    kzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kzalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kzalloc
    + kcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kzalloc(sizeof(THING) * C2, ...)
    |
    kzalloc(sizeof(TYPE) * C2, ...)
    |
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(C1 * C2, ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

08 Jun, 2018

9 commits

  • Christoph doubts anyone was using the 'reserved' file in sysfs, so remove
    it.

    Link: http://lkml.kernel.org/r/20180518194519.3820-17-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Christoph Lameter
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Cc: "Kirill A . Shutemov"
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Pekka Enberg
    Cc: Randy Dunlap
    Cc: Vlastimil Babka
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The reserved field was only used for embedding an rcu_head in the data
    structure. With the previous commit, we no longer need it. That lets us
    remove the 'reserved' argument to a lot of functions.

    Link: http://lkml.kernel.org/r/20180518194519.3820-16-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Christoph Lameter
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Cc: "Kirill A . Shutemov"
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Pekka Enberg
    Cc: Randy Dunlap
    Cc: Vlastimil Babka
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • rcu_head may now grow larger than list_head without affecting slab or
    slub.

    Link: http://lkml.kernel.org/r/20180518194519.3820-15-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Christoph Lameter
    Acked-by: Vlastimil Babka
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Cc: "Kirill A . Shutemov"
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Pekka Enberg
    Cc: Randy Dunlap
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Since the LRU is two words, this does not affect the double-word alignment
    of SLUB's freelist.

    Link: http://lkml.kernel.org/r/20180518194519.3820-10-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Vlastimil Babka
    Acked-by: Kirill A. Shutemov
    Cc: Christoph Lameter
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Pekka Enberg
    Cc: Randy Dunlap
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • By moving page->private to the fourth word of struct page, we can put the
    SLUB counters in the same word as SLAB's s_mem and still do the
    cmpxchg_double trick. Now the SLUB counters no longer overlap with the
    mapcount or refcount so we can drop the call to page_mapcount_reset() and
    simplify set_page_slub_counters() to a single line.

    Link: http://lkml.kernel.org/r/20180518194519.3820-6-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Vlastimil Babka
    Acked-by: Kirill A. Shutemov
    Cc: Christoph Lameter
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Pekka Enberg
    Cc: Randy Dunlap
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This will allow us to store slub's counters in the same bits as slab's
    s_mem. slub now needs to set page->mapping to NULL as it frees the page,
    just like slab does.

    Link: http://lkml.kernel.org/r/20180518194519.3820-5-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Christoph Lameter
    Acked-by: Vlastimil Babka
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Cc: "Kirill A . Shutemov"
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Pekka Enberg
    Cc: Randy Dunlap
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The obsolete comment removed in this patch was introduced by
    51df1142816e4 ("slub: Dynamically size kmalloc cache allocations").

    I paste related modification from that commit:

    +#ifdef CONFIG_NUMA
    + /*
    + * Allocate kmem_cache_node properly from the kmem_cache slab.
    + * kmem_cache_node is separately allocated so no need to
    + * update any list pointers.
    + */
    + temp_kmem_cache_node = kmem_cache_node;

    + kmem_cache_node = kmem_cache_alloc(kmem_cache, GFP_NOWAIT);
    + memcpy(kmem_cache_node, temp_kmem_cache_node, kmem_size);
    +
    + kmem_cache_bootstrap_fixup(kmem_cache_node);
    +
    + caches++;
    +#else
    + /*
    + * kmem_cache has kmem_cache_node embedded and we moved it!
    + * Update the list heads
    + */
    + INIT_LIST_HEAD(&kmem_cache->local_node.partial);
    + list_splice(&temp_kmem_cache->local_node.partial, &kmem_cache->local_node.partial);
    +#ifdef CONFIG_SLUB_DEBUG
    + INIT_LIST_HEAD(&kmem_cache->local_node.full);
    + list_splice(&temp_kmem_cache->local_node.full, &kmem_cache->local_node.full);
    +#endif

    As we can see there're used to distinguish the difference handling
    between NUMA/non-NUMA configuration in the original commit. I think it
    doesn't make any sense in current implementation which is placed above
    kmem_cache_node = bootstrap(&boot_kmem_cache_node); So maybe it's better
    to remove them now?

    Link: http://lkml.kernel.org/r/5af26f58.1c69fb81.1be0e.c520SMTPIN_ADDED_BROKEN@mx.google.com
    Signed-off-by: Canjiang Lu
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Canjiang Lu
     
  • __printf is useful to verify format and arguments. Remove the following
    warning (with W=1):

    mm/slub.c:721:2: warning: function might be possible candidate for `gnu_printf' format attribute [-Wsuggest-attribute=format]

    Link: http://lkml.kernel.org/r/20180505200706.19986-1-malat@debian.org
    Signed-off-by: Mathieu Malaterre
    Reviewed-by: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Malaterre
     
  • __GFP_ZERO requests that the object be initialised to all-zeroes, while
    the purpose of a constructor is to initialise an object to a particular
    pattern. We cannot do both. Add a warning to catch any users who
    mistakenly pass a __GFP_ZERO flag when allocating a slab with a
    constructor.

    Link: http://lkml.kernel.org/r/20180412191322.GA21205@bombadil.infradead.org
    Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
    Signed-off-by: Matthew Wilcox
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

12 Apr, 2018

1 commit

  • The kasan_slab_free hook's return value denotes whether the reuse of a
    slab object must be delayed (e.g. when the object is put into memory
    qurantine).

    The current way SLUB handles this hook is by ignoring its return value
    and hardcoding checks similar (but not exactly the same) to the ones
    performed in kasan_slab_free, which is prone to making mistakes.

    The main difference between the hardcoded checks and the ones in
    kasan_slab_free is whether we want to perform a free in case when an
    invalid-free or a double-free was detected (we don't).

    This patch changes the way SLUB handles this by:
    1. taking into account the return value of kasan_slab_free for each of
    the objects, that are being freed;
    2. reconstructing the freelist of objects to exclude the ones, whose
    reuse must be delayed.

    [andreyknvl@google.com: eliminate unnecessary branch in slab_free]
    Link: http://lkml.kernel.org/r/a62759a2545fddf69b0c034547212ca1eb1b3ce2.1520359686.git.andreyknvl@google.com
    Link: http://lkml.kernel.org/r/083f58501e54731203801d899632d76175868e97.1519400992.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Kostya Serebryany
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

06 Apr, 2018

16 commits

  • The kasan quarantine is designed to delay freeing slab objects to catch
    use-after-free. The quarantine can be large (several percent of machine
    memory size). When kmem_caches are deleted related objects are flushed
    from the quarantine but this requires scanning the entire quarantine
    which can be very slow. We have seen the kernel busily working on this
    while holding slab_mutex and badly affecting cache_reaper, slabinfo
    readers and memcg kmem cache creations.

    It can easily reproduced by following script:

    yes . | head -1000000 | xargs stat > /dev/null
    for i in `seq 1 10`; do
    seq 500 | (cd /cg/memory && xargs mkdir)
    seq 500 | xargs -I{} sh -c 'echo $BASHPID > \
    /cg/memory/{}/tasks && exec stat .' > /dev/null
    seq 500 | (cd /cg/memory && xargs rmdir)
    done

    The busy stack:
    kasan_cache_shutdown
    shutdown_cache
    memcg_destroy_kmem_caches
    mem_cgroup_css_free
    css_free_rwork_fn
    process_one_work
    worker_thread
    kthread
    ret_from_fork

    This patch is based on the observation that if the kmem_cache to be
    destroyed is empty then there should not be any objects of this cache in
    the quarantine.

    Without the patch the script got stuck for couple of hours. With the
    patch the script completed within a second.

    Link: http://lkml.kernel.org/r/20180327230603.54721-1-shakeelb@google.com
    Signed-off-by: Shakeel Butt
    Reviewed-by: Andrew Morton
    Acked-by: Andrey Ryabinin
    Acked-by: Christoph Lameter
    Cc: Vladimir Davydov
    Cc: Alexander Potapenko
    Cc: Greg Thelen
    Cc: Dmitry Vyukov
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     
  • Function returns size of the object without red zone which can't be
    negative.

    Link: http://lkml.kernel.org/r/20180305200730.15812-24-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • struct kmem_cache_order_objects is for mixing order and number of
    objects, and orders aren't big enough to warrant 64-bit width.

    Propagate unsignedness down so that everything fits.

    !!! Patch assumes that "PAGE_SIZE << order" doesn't overflow. !!!

    Link: http://lkml.kernel.org/r/20180305200730.15812-23-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • slab_index() returns index of an object within a slab which is at most
    u15 (or u16?).

    Iterators additionally guarantee that "p >= addr".

    Link: http://lkml.kernel.org/r/20180305200730.15812-22-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If kmem case sizes are 32-bit, then usecopy region should be too.

    Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Cc: David Miller
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If SLAB doesn't support 4GB+ kmem caches (it never did), KASAN should
    not do it as well.

    Link: http://lkml.kernel.org/r/20180305200730.15812-20-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Now that all sizes are properly typed, propagate "unsigned int" down the
    callgraph.

    Link: http://lkml.kernel.org/r/20180305200730.15812-19-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Linux doesn't support negative length objects (including meta data).

    Link: http://lkml.kernel.org/r/20180305200730.15812-18-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Linux doesn't support negative length objects.

    Link: http://lkml.kernel.org/r/20180305200730.15812-17-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • /*
    * cpu_partial determined the maximum number of objects
    * kept in the per cpu partial lists of a processor.
    */

    Can't be negative.

    Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • ->inuse is "the number of bytes in actual use by the object",
    can't be negative.

    Link: http://lkml.kernel.org/r/20180305200730.15812-14-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Kmem cache alignment can't be negative.

    Link: http://lkml.kernel.org/r/20180305200730.15812-13-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • ->reserved is either 0 or sizeof(struct rcu_head), can't be negative.

    Link: http://lkml.kernel.org/r/20180305200730.15812-12-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • ->remote_node_defrag_ratio is in range 0..1000.

    This also adds a check and modifies the behavior to return an error
    code. Before this patch invalid values were ignored.

    Link: http://lkml.kernel.org/r/20180305200730.15812-9-adobriyan@gmail.com
    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • struct kmem_cache::size and ::align were always 32-bit.

    Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0.
    kmem_cache_create(1UL<
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • When SLUB_DEBUG catches some issues, it prints all the required debug
    info. However, in a few cases where allocation and free of the object
    has happened in a very short time, 'age' might be misleading. See the
    example below:

    =============================================================================
    BUG kmalloc-256 (Tainted: G W O ): Poison overwritten
    -----------------------------------------------------------------------------
    ...
    INFO: Allocated in binder_transaction+0x4b0/0x2448 age=731 cpu=3 pid=5314
    ...
    INFO: Freed in binder_free_transaction+0x2c/0x58 age=735 cpu=6 pid=2079
    ...
    Object fffffff14956a870: 6b 6b 6b 6b 6b 6b 6b 6b 67 6b 6b 6b 6b 6b 6b a5 kkkkkkkkgkkkk

    In this case, object got freed later but 'age' shows otherwise. This
    could be because, while printing this info, we print allocation traces
    first and free traces thereafter. In between, if we get schedule out or
    jiffies increment, (jiffies - t->when) could become meaningless.

    Use the jitter free reference to calculate age.

    New output will exactly be same. 'age' is still staying with single
    jiffies ref in both prints.

    Change-Id: I0846565807a4229748649bbecb1ffb743d71fcd8
    Link: http://lkml.kernel.org/r/1520492010-19389-1-git-send-email-cpandya@codeaurora.org
    Signed-off-by: Chintan Pandya
    Acked-by: Christoph Lameter
    Reviewed-by: Andrew Morton
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chintan Pandya
     

07 Feb, 2018

2 commits

  • __builtin_return_address(1) is unreliable without frame pointers.
    With defconfig on kmalloc_pagealloc_invalid_free test I am getting:

    BUG: KASAN: double-free or invalid-free in (null)

    Pass caller PC from callers explicitly.

    Link: http://lkml.kernel.org/r/9b01bc2d237a4df74ff8472a3bf6b7635908de01.1514378558.git.dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Andrey Ryabinin a
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • Patch series "kasan: detect invalid frees".

    KASAN detects double-frees, but does not detect invalid-frees (when a
    pointer into a middle of heap object is passed to free). We recently had
    a very unpleasant case in crypto code which freed an inner object inside
    of a heap allocation. This left unnoticed during free, but totally
    corrupted heap and later lead to a bunch of random crashes all over kernel
    code.

    Detect invalid frees.

    This patch (of 5):

    Detect frees of pointers into middle of large heap objects.

    I dropped const from kasan_kfree_large() because it starts propagating
    through a bunch of functions in kasan_report.c, slab/slub nearest_obj(),
    all of their local variables, fixup_red_left(), etc.

    Link: http://lkml.kernel.org/r/1b45b4fe1d20fc0de1329aab674c1dd973fee723.1514378558.git.dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Andrey Ryabinin a
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

04 Feb, 2018

1 commit

  • Pull hardened usercopy whitelisting from Kees Cook:
    "Currently, hardened usercopy performs dynamic bounds checking on slab
    cache objects. This is good, but still leaves a lot of kernel memory
    available to be copied to/from userspace in the face of bugs.

    To further restrict what memory is available for copying, this creates
    a way to whitelist specific areas of a given slab cache object for
    copying to/from userspace, allowing much finer granularity of access
    control.

    Slab caches that are never exposed to userspace can declare no
    whitelist for their objects, thereby keeping them unavailable to
    userspace via dynamic copy operations. (Note, an implicit form of
    whitelisting is the use of constant sizes in usercopy operations and
    get_user()/put_user(); these bypass all hardened usercopy checks since
    these sizes cannot change at runtime.)

    This new check is WARN-by-default, so any mistakes can be found over
    the next several releases without breaking anyone's system.

    The series has roughly the following sections:
    - remove %p and improve reporting with offset
    - prepare infrastructure and whitelist kmalloc
    - update VFS subsystem with whitelists
    - update SCSI subsystem with whitelists
    - update network subsystem with whitelists
    - update process memory with whitelists
    - update per-architecture thread_struct with whitelists
    - update KVM with whitelists and fix ioctl bug
    - mark all other allocations as not whitelisted
    - update lkdtm for more sensible test overage"

    * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
    lkdtm: Update usercopy tests for whitelisting
    usercopy: Restrict non-usercopy caches to size 0
    kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
    kvm: whitelist struct kvm_vcpu_arch
    arm: Implement thread_struct whitelist for hardened usercopy
    arm64: Implement thread_struct whitelist for hardened usercopy
    x86: Implement thread_struct whitelist for hardened usercopy
    fork: Provide usercopy whitelisting for task_struct
    fork: Define usercopy region in thread_stack slab caches
    fork: Define usercopy region in mm_struct slab caches
    net: Restrict unwhitelisted proto caches to size 0
    sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
    sctp: Define usercopy region in SCTP proto slab cache
    caif: Define usercopy region in caif proto slab cache
    ip: Define usercopy region in IP proto slab cache
    net: Define usercopy region in struct proto slab cache
    scsi: Define usercopy region in scsi_sense_cache slab cache
    cifs: Define usercopy region in cifs_request slab cache
    vxfs: Define usercopy region in vxfs_inode slab cache
    ufs: Define usercopy region in ufs_inode_cache slab cache
    ...

    Linus Torvalds
     

01 Feb, 2018

2 commits

  • Commit d6e0b7fa1186 ("slub: make dead caches discard free slabs
    immediately") makes put_cpu_partial() run with preemption disabled and
    interrupts disabled when calling unfreeze_partials().

    The comment: "put_cpu_partial() is done without interrupts disabled and
    without preemption disabled" looks obsolete, so remove it.

    Link: http://lkml.kernel.org/r/1516968550-1520-1-git-send-email-miles.chen@mediatek.com
    Signed-off-by: Miles Chen
    Reviewed-by: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miles Chen
     
  • Start address calculated for slab padding restoration was wrong. Wrong
    address would point to some section before padding and could cause
    corruption

    Link: http://lkml.kernel.org/r/1516604578-4577-1-git-send-email-balasubramani_vivekanandan@mentor.com
    Signed-off-by: Balasubramani Vivekanandan
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balasubramani Vivekanandan
     

16 Jan, 2018

3 commits

  • This introduces CONFIG_HARDENED_USERCOPY_FALLBACK to control the
    behavior of hardened usercopy whitelist violations. By default, whitelist
    violations will continue to WARN() so that any bad or missing usercopy
    whitelists can be discovered without being too disruptive.

    If this config is disabled at build time or a system is booted with
    "slab_common.usercopy_fallback=0", usercopy whitelists will BUG() instead
    of WARN(). This is useful for admins that want to use usercopy whitelists
    immediately.

    Suggested-by: Matthew Garrett
    Signed-off-by: Kees Cook

    Kees Cook
     
  • This patch adds checking of usercopy cache whitelisting, and is modified
    from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the
    last public patch of grsecurity/PaX based on my understanding of the
    code. Changes or omissions from the original code are mine and don't
    reflect the original grsecurity/PaX code.

    The SLAB and SLUB allocators are modified to WARN() on all copy operations
    in which the kernel heap memory being modified falls outside of the cache's
    defined usercopy region.

    Based on an earlier patch from David Windsor.

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc: Laura Abbott
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Signed-off-by: Kees Cook

    Kees Cook
     
  • This patch prepares the slab allocator to handle caches having annotations
    (useroffset and usersize) defining usercopy regions.

    This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
    whitelisting code in the last public patch of grsecurity/PaX based on
    my understanding of the code. Changes or omissions from the original
    code are mine and don't reflect the original grsecurity/PaX code.

    Currently, hardened usercopy performs dynamic bounds checking on slab
    cache objects. This is good, but still leaves a lot of kernel memory
    available to be copied to/from userspace in the face of bugs. To further
    restrict what memory is available for copying, this creates a way to
    whitelist specific areas of a given slab cache object for copying to/from
    userspace, allowing much finer granularity of access control. Slab caches
    that are never exposed to userspace can declare no whitelist for their
    objects, thereby keeping them unavailable to userspace via dynamic copy
    operations. (Note, an implicit form of whitelisting is the use of constant
    sizes in usercopy operations and get_user()/put_user(); these bypass
    hardened usercopy checks since these sizes cannot change at runtime.)

    To support this whitelist annotation, usercopy region offset and size
    members are added to struct kmem_cache. The slab allocator receives a
    new function, kmem_cache_create_usercopy(), that creates a new cache
    with a usercopy region defined, suitable for declaring spans of fields
    within the objects that get copied to/from userspace.

    In this patch, the default kmem_cache_create() marks the entire allocation
    as whitelisted, leaving it semantically unchanged. Once all fine-grained
    whitelists have been added (in subsequent patches), this will be changed
    to a usersize of 0, making caches created with kmem_cache_create() not
    copyable to/from userspace.

    After the entire usercopy whitelist series is applied, less than 15%
    of the slab cache memory remains exposed to potential usercopy bugs
    after a fresh boot:

    Total Slab Memory: 48074720
    Usercopyable Memory: 6367532 13.2%
    task_struct 0.2% 4480/1630720
    RAW 0.3% 300/96000
    RAWv6 2.1% 1408/64768
    ext4_inode_cache 3.0% 269760/8740224
    dentry 11.1% 585984/5273856
    mm_struct 29.1% 54912/188448
    kmalloc-8 100.0% 24576/24576
    kmalloc-16 100.0% 28672/28672
    kmalloc-32 100.0% 81920/81920
    kmalloc-192 100.0% 96768/96768
    kmalloc-128 100.0% 143360/143360
    names_cache 100.0% 163840/163840
    kmalloc-64 100.0% 167936/167936
    kmalloc-256 100.0% 339968/339968
    kmalloc-512 100.0% 350720/350720
    kmalloc-96 100.0% 455616/455616
    kmalloc-8192 100.0% 655360/655360
    kmalloc-1024 100.0% 812032/812032
    kmalloc-4096 100.0% 819200/819200
    kmalloc-2048 100.0% 1310720/1310720

    After some kernel build workloads, the percentage (mainly driven by
    dentry and inode caches expanding) drops under 10%:

    Total Slab Memory: 95516184
    Usercopyable Memory: 8497452 8.8%
    task_struct 0.2% 4000/1456000
    RAW 0.3% 300/96000
    RAWv6 2.1% 1408/64768
    ext4_inode_cache 3.0% 1217280/39439872
    dentry 11.1% 1623200/14608800
    mm_struct 29.1% 73216/251264
    kmalloc-8 100.0% 24576/24576
    kmalloc-16 100.0% 28672/28672
    kmalloc-32 100.0% 94208/94208
    kmalloc-192 100.0% 96768/96768
    kmalloc-128 100.0% 143360/143360
    names_cache 100.0% 163840/163840
    kmalloc-64 100.0% 245760/245760
    kmalloc-256 100.0% 339968/339968
    kmalloc-512 100.0% 350720/350720
    kmalloc-96 100.0% 563520/563520
    kmalloc-8192 100.0% 655360/655360
    kmalloc-1024 100.0% 794624/794624
    kmalloc-4096 100.0% 819200/819200
    kmalloc-2048 100.0% 1257472/1257472

    Signed-off-by: David Windsor
    [kees: adjust commit log, split out a few extra kmalloc hunks]
    [kees: add field names to function declarations]
    [kees: convert BUGs to WARNs and fail closed]
    [kees: add attack surface reduction analysis to commit log]
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Signed-off-by: Kees Cook
    Acked-by: Christoph Lameter

    David Windsor