03 Mar, 2017

1 commit

  • …sors into <linux/sched/signal.h>

    task_struct::signal and task_struct::sighand are pointers, which would normally make it
    straightforward to not define those types in sched.h.

    That is not so, because the types are accompanied by a myriad of APIs (macros and inline
    functions) that dereference them.

    Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.

    With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
    trying to put accessors into sched.h as a test fails the following way:

    ./include/linux/sched.h: In function ‘test_signal_types’:
    ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
    ^

    This reduces the size and complexity of sched.h significantly.

    Update all headers and .c code that relied on getting the signal handling
    functionality from <linux/sched.h> to include <linux/sched/signal.h>.

    The list of affected files in the preparatory patch was partly generated by
    grepping for the APIs, and partly by doing coverage build testing, both
    all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
    cross-architecture builds.

    Nevertheless some (trivial) build breakage is still expected related to rare
    Kconfig combinations and in-flight patches to various kernel code, but most
    of it should be handled by this patch.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

02 Mar, 2017

17 commits


01 Mar, 2017

1 commit

  • Pull IDR rewrite from Matthew Wilcox:
    "The most significant part of the following is the patch to rewrite the
    IDR & IDA to be clients of the radix tree. But there's much more,
    including an enhancement of the IDA to be significantly more space
    efficient, an IDR & IDA test suite, some improvements to the IDR API
    (and driver changes to take advantage of those improvements), several
    improvements to the radix tree test suite and RCU annotations.

    The IDR & IDA rewrite had a good spin in linux-next and Andrew's tree
    for most of the last cycle. Coupled with the IDR test suite, I feel
    pretty confident that any remaining bugs are quite hard to hit. 0-day
    did a great job of watching my git tree and pointing out problems; as
    it hit them, I added new test-cases to be sure not to be caught the
    same way twice"

    Willy goes on to expand a bit on the IDR rewrite rationale:
    "The radix tree and the IDR use very similar data structures.

    Merging the two codebases lets us share the memory allocation pools,
    and results in a net deletion of 500 lines of code. It also opens up
    the possibility of exposing more of the features of the radix tree to
    users of the IDR (and I have some interesting patches along those
    lines waiting for 4.12)

    It also shrinks the size of the 'struct idr' from 40 bytes to 24 which
    will shrink a fair few data structures that embed an IDR"

    * 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax: (32 commits)
    radix tree test suite: Add config option for map shift
    idr: Add missing __rcu annotations
    radix-tree: Fix __rcu annotations
    radix-tree: Add rcu_dereference and rcu_assign_pointer calls
    radix tree test suite: Run iteration tests for longer
    radix tree test suite: Fix split/join memory leaks
    radix tree test suite: Fix leaks in regression2.c
    radix tree test suite: Fix leaky tests
    radix tree test suite: Enable address sanitizer
    radix_tree_iter_resume: Fix out of bounds error
    radix-tree: Store a pointer to the root in each node
    radix-tree: Chain preallocated nodes through ->parent
    radix tree test suite: Dial down verbosity with -v
    radix tree test suite: Introduce kmalloc_verbose
    idr: Return the deleted entry from idr_remove
    radix tree test suite: Build separate binaries for some tests
    ida: Use exceptional entries for small IDAs
    ida: Move ida_bitmap to a percpu variable
    Reimplement IDR and IDA using the radix tree
    radix-tree: Add radix_tree_iter_delete
    ...

    Linus Torvalds
     

28 Feb, 2017

12 commits

  • This patch makes arch-independent testcases for RODATA. Both x86 and
    x86_64 already have testcases for RODATA, But they are arch-specific
    because using inline assembly directly.

    And cacheflush.h is not a suitable location for rodata-test related
    things. Since they were in cacheflush.h, If someone change the state of
    CONFIG_DEBUG_RODATA_TEST, It cause overhead of kernel build.

    To solve the above issues, write arch-independent testcases and move it
    to shared location.

    [jinb.park7@gmail.com: fix config dependency]
    Link: http://lkml.kernel.org/r/20170209131625.GA16954@pjb1027-Latitude-E5410
    Link: http://lkml.kernel.org/r/20170129105436.GA9303@pjb1027-Latitude-E5410
    Signed-off-by: Jinbum Park
    Acked-by: Kees Cook
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Arjan van de Ven
    Cc: Laura Abbott
    Cc: Russell King
    Cc: Valentin Rothberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jinbum Park
     
  • We already have the helper, we can convert the rest of the kernel
    mechanically using:

    git grep -l 'atomic_inc_not_zero.*mm_users' | xargs sed -i 's/atomic_inc_not_zero(&\(.*\)->mm_users)/mmget_not_zero\(\1\)/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    Link: http://lkml.kernel.org/r/20161218123229.22952-3-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • Apart from adding the helper function itself, the rest of the kernel is
    converted mechanically using:

    git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_users);/mmget\(\1\);/'
    git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_users);/mmget\(\&\1\);/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    (Michal Hocko provided most of the kerneldoc comment.)

    Link: http://lkml.kernel.org/r/20161218123229.22952-2-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • Apart from adding the helper function itself, the rest of the kernel is
    converted mechanically using:

    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    (Michal Hocko provided most of the kerneldoc comment.)

    Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • Now that %z is standartised in C99 there is no reason to support %Z.
    Unlike %L it doesn't even make format strings smaller.

    Use BUILD_BUG_ON in a couple ATM drivers.

    In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
    is in my opinion is quite an achievement. Hopefully this patch inspires
    someone else to trim vsprintf.c more.

    Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Fix typos and add the following to the scripts/spelling.txt:

    followings||following

    While we are here, add a missing colon in the boilerplate in DT binding
    documents. The "you SoC" in allwinner,sunxi-pinctrl.txt was fixed as
    well.

    I reworded "as the followings:" to "as follows:" for
    drivers/usb/gadget/udc/renesas_usb3.c.

    Link: http://lkml.kernel.org/r/1481573103-11329-32-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Fix typos and add the following to the scripts/spelling.txt:

    comsume||consume
    comsumer||consumer
    comsuming||consuming

    I see some variable names with this pattern, but this commit is only
    touching comment blocks to avoid unexpected impact.

    Link: http://lkml.kernel.org/r/1481573103-11329-19-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Fix typos and add the following to the scripts/spelling.txt:

    algined||aligned

    While we are here, fix the "appplication" in the touched line in
    drivers/block/loop.c. Also, fix the "may not naturally ..." to
    "may not be naturally ..." in the touched line in mm/page_alloc.

    Link: http://lkml.kernel.org/r/1481573103-11329-9-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
    branch.

    This patch also fixes multiple checkpatch warnings: WARNING: Prefer
    'unsigned int' to bare use of 'unsigned'

    Thanks to Andrew Morton for suggesting more appropriate function instead
    of macro.

    [geliangtang@gmail.com: truncate: use i_blocksize()]
    Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
    Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Signed-off-by: Geliang Tang
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Change the zpool/compressor param callback function to release the
    zswap_pools_lock spinlock before calling param_set_charp, since that
    function may sleep when it calls kmalloc with GFP_KERNEL.

    While this problem has existed for a while, I wasn't able to trigger it
    using a tight loop changing either/both the zpool and compressor params; I
    think it's very unlikely to be an issue on the stable kernels, especially
    since most zswap users will change the compressor and/or zpool from sysfs
    only one time each boot - or zero times, if they add the params to the
    kernel boot.

    Fixes: c99b42c3529e ("zswap: use charp for zswap param strings")
    Link: http://lkml.kernel.org/r/20170126155821.4545-1-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Reported-by: Sergey Senozhatsky
    Cc: Michal Hocko
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • If either the compressor and/or zpool param are invalid at boot, and
    their default value is also invalid, set the param to the empty string
    to indicate there is no compressor and/or zpool configured. This allows
    users to check the sysfs interface to see which param needs changing.

    Link: http://lkml.kernel.org/r/20170124200259.16191-4-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Allow zswap to initialize at boot even if it can't create its pool due
    to a failure to create a zpool and/or compressor. Allow those to be
    created later, from the sysfs module param interface.

    Link: http://lkml.kernel.org/r/20170124200259.16191-3-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

25 Feb, 2017

9 commits

  • Per memcg slab accounting and kasan have a problem with kmem_cache
    destruction.
    - kmem_cache_create() allocates a kmem_cache, which is used for
    allocations from processes running in root (top) memcg.
    - Processes running in non root memcg and allocating with either
    __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
    kmem_cache.
    - Kasan catches use-after-free by having kfree() and kmem_cache_free()
    defer freeing of objects. Objects are placed in a quarantine.
    - kmem_cache_destroy() destroys root and non root kmem_caches. It takes
    care to drain the quarantine of objects from the root memcg's
    kmem_cache, but ignores objects associated with non root memcg. This
    causes leaks because quarantined per memcg objects refer to per memcg
    kmem cache being destroyed.

    To see the problem:

    1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
    2) from non root memcg, allocate and free a few objects from cache
    3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
    will trigger a "Slab cache still has objects" warning indicating
    that the per memcg kmem_cache structure was leaked.

    Fix the leak by draining kasan quarantined objects allocated from non
    root memcg.

    Racing memcg deletion is tricky, but handled. kmem_cache_destroy() =>
    shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
    flushes per memcg quarantined objects, even if that memcg has been
    rmdir'd and gone through memcg_deactivate_kmem_caches().

    This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
    enabled. So I don't think it's worth patching stable kernels.

    Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com
    Signed-off-by: Greg Thelen
    Reviewed-by: Vladimir Davydov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • Commit 31bc3858ea3e ("add automatic onlining policy for the newly added
    memory") provides the capability to have added memory automatically
    onlined during add, but this appears to be slightly broken.

    The current implementation uses walk_memory_range() to call
    online_memory_block, which uses memory_block_change_state() to online
    the memory. Instead, we should be calling device_online() for the
    memory block in online_memory_block(). This would online the memory
    (the memory bus online routine memory_subsys_online() called from
    device_online calls memory_block_change_state()) and properly update the
    device struct offline flag.

    As a result of the current implementation, attempting to remove a memory
    block after adding it using auto online fails. This is because doing a
    remove, for instance

    echo offline > /sys/devices/system/memory/memoryXXX/state

    uses device_offline() which checks the dev->offline flag.

    Link: http://lkml.kernel.org/r/20170222220744.8119.19687.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com
    Signed-off-by: Nathan Fontenot
    Cc: Michael Ellerman
    Cc: Michael Roth
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Fontenot
     
  • With rw_page, page_endio is used for completing IO on a page and it
    propagates write error to the address space if the IO fails. The
    problem is it accesses page->mapping directly which might be okay for
    file-backed pages but it shouldn't for anonymous page. Otherwise, it
    can corrupt one of field from anon_vma under us and system goes panic
    randomly.

    swap_writepage
    bdev_writepage
    ops->rw_page

    I encountered the BUG during developing new zram feature and it was
    really hard to figure it out because it made random crash, somtime
    mmap_sem lockdep, sometime other places where places never related to
    zram/zsmalloc, and not reproducible with some configuration.

    When I consider how that bug is subtle and people do fast-swap test with
    brd, it's worth to add stable mark, I think.

    Fixes: dd6bd0d9c7db ("swap: use bdev_read_page() / bdev_write_page()")
    Signed-off-by: Minchan Kim
    Acked-by: Michal Hocko
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • We are using the wrong flag value in task_numa_falt function. This can
    result in us doing wrong numa fault statistics update, because we update
    num_pages_migrate and numa_fault_locality etc based on the flag argument
    passed.

    Fixes: bae473a423 ("mm: introduce fault_env")
    Link: http://lkml.kernel.org/r/1487498395-9544-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Acked-by: Hillf Danton
    Acked-by: Kirill A. Shutemov
    Cc: Rik van Riel
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Do the prot_none/FOLL_NUMA check after we are sure this is a THP pte.
    Archs can implement prot_none such that it can return true for regular
    pmd entries.

    Link: http://lkml.kernel.org/r/1487498326-8734-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Hillf Danton
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • cleanup rest of dma_addr_t and phys_addr_t type casting in mm
    use %pad for dma_addr_t
    use %pa for phys_addr_t

    Link: http://lkml.kernel.org/r/1486618489-13912-1-git-send-email-miles.chen@mediatek.com
    Signed-off-by: Miles Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miles Chen
     
  • The class index and fullness group are not encoded in
    (first)page->mapping any more, after commit 3783689a1aa8 ("zsmalloc:
    introduce zspage structure"). Instead, they are store in struct zspage.

    Just delete this unneeded comment.

    Link: http://lkml.kernel.org/r/1486620822-36826-1-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Yisheng Xie
    Suggested-by: Sergey Senozhatsky
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Cc: Hanjun Guo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie
     
  • arch_zone_lowest/highest_possible_pfn[] is set to 0 and [ZONE_MOVABLE]
    is skipped in the loop. No need to reset them to 0 again.

    This patch just removes the redundant code.

    Link: http://lkml.kernel.org/r/20170209141731.60208-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Cc: Anshuman Khandual
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • We had used page->lru to link the component pages (except the first
    page) of a zspage, and used INIT_LIST_HEAD(&page->lru) to init it.
    Therefore, to get the last page's next page, which is NULL, we had to
    use page flag PG_Private_2 to identify it.

    But now, we use page->freelist to link all of the pages in zspage and
    init the page->freelist as NULL for last page, so no need to use
    PG_Private_2 anymore.

    This remove redundant SetPagePrivate2 in create_page_chain and
    ClearPagePrivate2 in reset_page(). Save a few cycles for migration of
    zsmalloc page :)

    Link: http://lkml.kernel.org/r/1487076509-49270-1-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Yisheng Xie
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie