28 Jun, 2019

2 commits


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

10 May, 2019

1 commit

  • Pull workqueue updates from Tejun Heo:
    "Only three commits, of which two are trivial.

    The non-trivial chagne is Thomas's patch to switch workqueue from
    sched RCU to regular one. The use of sched RCU is mostly historic and
    doesn't really buy us anything noticeable"

    * 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: Use normal rcu
    kernel/workqueue: Document wq_worker_last_func() argument
    kernel/workqueue: Use __printf markup to silence compiler in function 'alloc_workqueue'

    Linus Torvalds
     

08 May, 2019

1 commit

  • Pull printk updates from Petr Mladek:

    - Allow state reset of printk_once() calls.

    - Prevent crashes when dereferencing invalid pointers in vsprintf().
    Only the first byte is checked for simplicity.

    - Make vsprintf warnings consistent and inlined.

    - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
    modifiers.

    - Some clean up of vsprintf and test_printf code.

    * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    lib/vsprintf: Make function pointer_string static
    vsprintf: Limit the length of inlined error messages
    vsprintf: Avoid confusion between invalid address and value
    vsprintf: Prevent crash when dereferencing invalid pointers
    vsprintf: Consolidate handling of unknown pointer specifiers
    vsprintf: Factor out %pO handler as kobject_string()
    vsprintf: Factor out %pV handler as va_format()
    vsprintf: Factor out %p[iI] handler as ip_addr_string()
    vsprintf: Do not check address of well-known strings
    vsprintf: Consistent %pK handling for kptr_restrict == 0
    vsprintf: Shuffle restricted_pointer()
    printk: Tie printk_once / printk_deferred_once into .data.once for reset
    treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
    lib/test_printf: Switch to bitmap_zalloc()

    Linus Torvalds
     

16 Apr, 2019

1 commit

  • The worker accounting for CPU bound workers is plugged into the core
    scheduler code and the wakeup code. This is not a hard requirement and
    can be avoided by keeping track of the state in the workqueue code
    itself.

    Keep track of the sleeping state in the worker itself and call the
    notifier before entering the core scheduler. There might be false
    positives when the task is woken between that call and actually
    scheduling, but that's not really different from scheduling and being
    woken immediately after switching away. When nr_running is updated when
    the task is retunrning from schedule() then it is later compared when it
    is done from ttwu().

    [ bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de Oliveira ]

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Daniel Bristot de Oliveira
    Cc: Lai Jiangshan
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

09 Apr, 2019

2 commits

  • %pF and %pf are functionally equivalent to %pS and %ps conversion
    specifiers. The former are deprecated, therefore switch the current users
    to use the preferred variant.

    The changes have been produced by the following command:

    git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
    while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

    And verifying the result.

    Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
    Cc: Andy Shevchenko
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-um@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: drbd-dev@lists.linbit.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-mm@kvack.org
    Cc: ceph-devel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Sakari Ailus
    Acked-by: David Sterba (for btrfs)
    Acked-by: Mike Rapoport (for mm/memblock.c)
    Acked-by: Bjorn Helgaas (for drivers/pci)
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Petr Mladek

    Sakari Ailus
     
  • There is no need for sched_rcu. The undocumented reason why sched_rcu
    is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
    the fact that sched_rcu reader side critical sections are also protected
    by preempt or irq disabled regions.

    Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock
    where it is not yet explicit acquired. Replace local_irq_disable() with
    rcu_read_lock(). Update asserts.

    Signed-off-by: Thomas Gleixner
    [bigeasy: mangle changelog a little]
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Tejun Heo

    Thomas Gleixner
     

21 Mar, 2019

1 commit

  • The recent change to prevent use after free and a memory leak introduced an
    unconditional call to wq_unregister_lockdep() in the error handling
    path. If the lockdep key had not been registered yet, then the lockdep core
    emits a warning.

    Only call wq_unregister_lockdep() if wq_register_lockdep() has been
    called first.

    Fixes: 009bb421b6ce ("workqueue, lockdep: Fix an alloc_workqueue() error path")
    Reported-by: syzbot+be0c198232f86389c3dd@syzkaller.appspotmail.com
    Signed-off-by: Bart Van Assche
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Tejun Heo
    Cc: Qian Cai
    Link: https://lkml.kernel.org/r/20190311230255.176081-1-bvanassche@acm.org

    Bart Van Assche
     

20 Mar, 2019

1 commit


15 Mar, 2019

1 commit


11 Mar, 2019

1 commit

  • Pull locking fixes from Thomas Gleixner:
    "A few fixes for lockdep:

    - initialize lockdep internal RCU head after initializing RCU

    - prevent use after free in a alloc_workqueue() error handling path

    - plug a memory leak in the workqueue core which fails to free a
    dynamically allocated lock name.

    - make Clang happy"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    workqueue, lockdep: Fix a memory leak in wq->lock_name
    workqueue, lockdep: Fix an alloc_workqueue() error path
    locking/lockdep: Only call init_rcu_head() after RCU has been initialized
    locking/lockdep: Avoid a Clang warning

    Linus Torvalds
     

09 Mar, 2019

2 commits

  • The following commit:

    669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")

    introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name),
    but wq_init_lockdep() does not point wq->lock_name to the newly allocated
    slab object.

    This can be reproduced by running LTP fallocate04 followed by oom01 tests:

    unreferenced object 0xc0000005876384d8 (size 64):
    comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s)
    hex dump (first 32 bytes):
    28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65 (wq_completion)e
    78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69 xt4-rsv-conversi
    backtrace:
    [] kvasprintf+0x6c/0xe0
    [] kasprintf+0x34/0x60
    [] alloc_workqueue+0x1f8/0x6ac
    [] ext4_fill_super+0x23d4/0x3c80 [ext4]
    [] mount_bdev+0x25c/0x290
    [] ext4_mount+0x28/0x50 [ext4]
    [] legacy_get_tree+0x4c/0xb0
    [] vfs_get_tree+0x6c/0x190
    [] do_mount+0xb9c/0x1100
    [] ksys_mount+0x158/0x180
    [] sys_mount+0x20/0x30
    [] system_call+0x5c/0x70

    Signed-off-by: Qian Cai
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Bart Van Assche
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: catalin.marinas@arm.com
    Cc: jiangshanlai@gmail.com
    Cc: tj@kernel.org
    Fixes: 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
    Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pw
    Signed-off-by: Ingo Molnar

    Qian Cai
     
  • This patch fixes a use-after-free and a memory leak in an alloc_workqueue()
    error path.

    Repoted by syzkaller and KASAN:

    BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline]
    BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
    Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858

    CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
    kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
    __read_once_size include/linux/compiler.h:197 [inline]
    lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
    wq_init_lockdep kernel/workqueue.c:3444 [inline]
    alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263
    ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
    misc_open+0x398/0x4c0 drivers/char/misc.c:141
    chrdev_open+0x247/0x6b0 fs/char_dev.c:417
    do_dentry_open+0x488/0x1160 fs/open.c:771
    vfs_open+0xa0/0xd0 fs/open.c:880
    do_last fs/namei.c:3416 [inline]
    path_openat+0x10e9/0x46e0 fs/namei.c:3533
    do_filp_open+0x1a1/0x280 fs/namei.c:3563
    do_sys_open+0x3fe/0x5d0 fs/open.c:1063
    __do_sys_openat fs/open.c:1090 [inline]
    __se_sys_openat fs/open.c:1084 [inline]
    __x64_sys_openat+0x9d/0x100 fs/open.c:1084
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Allocated by task 7789:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_kmalloc mm/kasan/common.c:497 [inline]
    __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
    kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511
    __do_kmalloc mm/slab.c:3726 [inline]
    __kmalloc+0x15c/0x740 mm/slab.c:3735
    kmalloc include/linux/slab.h:553 [inline]
    kzalloc include/linux/slab.h:743 [inline]
    alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236
    ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
    misc_open+0x398/0x4c0 drivers/char/misc.c:141
    chrdev_open+0x247/0x6b0 fs/char_dev.c:417
    do_dentry_open+0x488/0x1160 fs/open.c:771
    vfs_open+0xa0/0xd0 fs/open.c:880
    do_last fs/namei.c:3416 [inline]
    path_openat+0x10e9/0x46e0 fs/namei.c:3533
    do_filp_open+0x1a1/0x280 fs/namei.c:3563
    do_sys_open+0x3fe/0x5d0 fs/open.c:1063
    __do_sys_openat fs/open.c:1090 [inline]
    __se_sys_openat fs/open.c:1084 [inline]
    __x64_sys_openat+0x9d/0x100 fs/open.c:1084
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 7789:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
    kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
    __cache_free mm/slab.c:3498 [inline]
    kfree+0xcf/0x230 mm/slab.c:3821
    alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295
    ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
    misc_open+0x398/0x4c0 drivers/char/misc.c:141
    chrdev_open+0x247/0x6b0 fs/char_dev.c:417
    do_dentry_open+0x488/0x1160 fs/open.c:771
    vfs_open+0xa0/0xd0 fs/open.c:880
    do_last fs/namei.c:3416 [inline]
    path_openat+0x10e9/0x46e0 fs/namei.c:3533
    do_filp_open+0x1a1/0x280 fs/namei.c:3563
    do_sys_open+0x3fe/0x5d0 fs/open.c:1063
    __do_sys_openat fs/open.c:1090 [inline]
    __se_sys_openat fs/open.c:1084 [inline]
    __x64_sys_openat+0x9d/0x100 fs/open.c:1084
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff888090fc2580
    which belongs to the cache kmalloc-512 of size 512
    The buggy address is located 280 bytes inside of
    512-byte region [ffff888090fc2580, ffff888090fc2780)

    Reported-by: syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com
    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Fixes: 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
    Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     

08 Mar, 2019

3 commits

  • Merge more updates from Andrew Morton:

    - some of the rest of MM

    - various misc things

    - dynamic-debug updates

    - checkpatch

    - some epoll speedups

    - autofs

    - rapidio

    - lib/, lib/lzo/ updates

    * emailed patches from Andrew Morton : (83 commits)
    samples/mic/mpssd/mpssd.h: remove duplicate header
    kernel/fork.c: remove duplicated include
    include/linux/relay.h: fix percpu annotation in struct rchan
    arch/nios2/mm/fault.c: remove duplicate include
    unicore32: stop printing the virtual memory layout
    MAINTAINERS: fix GTA02 entry and mark as orphan
    mm: create the new vm_fault_t type
    arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
    arch: simplify several early memory allocations
    openrisc: simplify pte_alloc_one_kernel()
    sh: prefer memblock APIs returning virtual address
    microblaze: prefer memblock API returning virtual address
    powerpc: prefer memblock APIs returning virtual address
    lib/lzo: separate lzo-rle from lzo
    lib/lzo: implement run-length encoding
    lib/lzo: fast 8-byte copy on arm64
    lib/lzo: 64-bit CTZ on arm64
    lib/lzo: tidy-up ifdefs
    ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
    ipc: annotate implicit fall through
    ...

    Linus Torvalds
     
  • This function can only be called safely from very specific scheduler
    contexts. Document those.

    Link: http://lkml.kernel.org/r/20190206150528.31198-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Suggested-by: Andrew Morton
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Pull workqueue updates from Tejun Heo:
    "All trivial. Two comment updates and one more initialization sanity
    check in flush_work()"

    * 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: Fix spelling in source code comments
    workqueue: fix typo in comment
    workqueue: Try to catch flush_work() without INIT_WORK().

    Linus Torvalds
     

07 Mar, 2019

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the big driver core patchset for 5.1-rc1

    More patches than "normal" here this merge window, due to some work in
    the driver core by Alexander Duyck to rework the async probe
    functionality to work better for a number of devices, and independant
    work from Rafael for the device link functionality to make it work
    "correctly".

    Also in here is:

    - lots of BUS_ATTR() removals, the macro is about to go away

    - firmware test fixups

    - ihex fixups and simplification

    - component additions (also includes i915 patches)

    - lots of minor coding style fixups and cleanups.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
    driver core: platform: remove misleading err_alloc label
    platform: set of_node in platform_device_register_full()
    firmware: hardcode the debug message for -ENOENT
    driver core: Add missing description of new struct device_link field
    driver core: Fix PM-runtime for links added during consumer probe
    drivers/component: kerneldoc polish
    async: Add cmdline option to specify drivers to be async probed
    driver core: Fix possible supplier PM-usage counter imbalance
    PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
    driver: platform: Support parsing GpioInt 0 in platform_get_irq()
    selftests: firmware: fix verify_reqs() return value
    Revert "selftests: firmware: remove use of non-standard diff -Z option"
    Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
    device: Fix comment for driver_data in struct device
    kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
    sysfs: remove unused include of kernfs-internal.h
    driver core: Postpone DMA tear-down until after devres release
    driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
    PM-runtime: Take suppliers into account in __pm_runtime_set_status()
    device.h: Add __cold to dev_ logging functions
    ...

    Linus Torvalds
     

05 Mar, 2019

1 commit


28 Feb, 2019

1 commit

  • The following commit:

    87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing")

    improved deadlock checking in the workqueue implementation. Unfortunately
    that patch also introduced a few false positive lockdep complaints.

    This patch suppresses these false positives by allocating the workqueue mutex
    lockdep key dynamically.

    An example of a false positive lockdep complaint suppressed by this patch
    can be found below. The root cause of the lockdep complaint shown below
    is that the direct I/O code can call alloc_workqueue() from inside a work
    item created by another alloc_workqueue() call and that both workqueues
    share the same lockdep key. This patch avoids that that lockdep complaint
    is triggered by allocating the work queue lockdep keys dynamically.

    In other words, this patch guarantees that a unique lockdep key is
    associated with each work queue mutex.

    ======================================================
    WARNING: possible circular locking dependency detected
    4.19.0-dbg+ #1 Not tainted
    fio/4129 is trying to acquire lock:
    00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970

    but task is already holding lock:
    00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (&sb->s_type->i_mutex_key#14){+.+.}:
    down_write+0x3d/0x80
    __generic_file_fsync+0x77/0xf0
    ext4_sync_file+0x3c9/0x780
    vfs_fsync_range+0x66/0x100
    dio_complete+0x2f5/0x360
    dio_aio_complete_work+0x1c/0x20
    process_one_work+0x481/0x9f0
    worker_thread+0x63/0x5a0
    kthread+0x1cf/0x1f0
    ret_from_fork+0x24/0x30

    -> #1 ((work_completion)(&dio->complete_work)){+.+.}:
    process_one_work+0x447/0x9f0
    worker_thread+0x63/0x5a0
    kthread+0x1cf/0x1f0
    ret_from_fork+0x24/0x30

    -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
    lock_acquire+0xc5/0x200
    flush_workqueue+0xf3/0x970
    drain_workqueue+0xec/0x220
    destroy_workqueue+0x23/0x350
    sb_init_dio_done_wq+0x6a/0x80
    do_blockdev_direct_IO+0x1f33/0x4be0
    __blockdev_direct_IO+0x79/0x86
    ext4_direct_IO+0x5df/0xbb0
    generic_file_direct_write+0x119/0x220
    __generic_file_write_iter+0x131/0x2d0
    ext4_file_write_iter+0x3fa/0x710
    aio_write+0x235/0x330
    io_submit_one+0x510/0xeb0
    __x64_sys_io_submit+0x122/0x340
    do_syscall_64+0x71/0x220
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Chain exists of:
    (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&sb->s_type->i_mutex_key#14);
    lock((work_completion)(&dio->complete_work));
    lock(&sb->s_type->i_mutex_key#14);
    lock((wq_completion)"dio/%s"sb->s_id);

    *** DEADLOCK ***

    1 lock held by fio/4129:
    #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710

    stack backtrace:
    CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    Call Trace:
    dump_stack+0x86/0xc5
    print_circular_bug.isra.32+0x20a/0x218
    __lock_acquire+0x1c68/0x1cf0
    lock_acquire+0xc5/0x200
    flush_workqueue+0xf3/0x970
    drain_workqueue+0xec/0x220
    destroy_workqueue+0x23/0x350
    sb_init_dio_done_wq+0x6a/0x80
    do_blockdev_direct_IO+0x1f33/0x4be0
    __blockdev_direct_IO+0x79/0x86
    ext4_direct_IO+0x5df/0xbb0
    generic_file_direct_write+0x119/0x220
    __generic_file_write_iter+0x131/0x2d0
    ext4_file_write_iter+0x3fa/0x710
    aio_write+0x235/0x330
    io_submit_one+0x510/0xeb0
    __x64_sys_io_submit+0x122/0x340
    do_syscall_64+0x71/0x220
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org
    [ Reworked the changelog a bit. ]
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     

22 Feb, 2019

1 commit


11 Feb, 2019

1 commit


02 Feb, 2019

1 commit

  • psi has provisions to shut off the periodic aggregation worker when
    there is a period of no task activity - and thus no data that needs
    aggregating. However, while developing psi monitoring, Suren noticed
    that the aggregation clock currently won't stay shut off for good.

    Debugging this revealed a flaw in the idle design: an aggregation run
    will see no task activity and decide to go to sleep; shortly thereafter,
    the kworker thread that executed the aggregation will go idle and cause
    a scheduling change, during which the psi callback will kick the
    !pending worker again. This will ping-pong forever, and is equivalent
    to having no shut-off logic at all (but with more code!)

    Fix this by exempting aggregation workers from psi's clock waking logic
    when the state change is them going to sleep. To do this, tag workers
    with the last work function they executed, and if in psi we see a worker
    going to sleep after aggregating psi data, we will not reschedule the
    aggregation work item.

    What if the worker is also executing other items before or after?

    Any psi state times that were incurred by work items preceding the
    aggregation work will have been collected from the per-cpu buckets
    during the aggregation itself. If there are work items following the
    aggregation work, the worker's last_func tag will be overwritten and the
    aggregator will be kept alive to process this genuine new activity.

    If the aggregation work is the last thing the worker does, and we decide
    to go idle, the brief period of non-idle time incurred between the
    aggregation run and the kworker's dequeue will be stranded in the
    per-cpu buckets until the clock is woken by later activity. But that
    should not be a problem. The buckets can hold 4s worth of time, and
    future activity will wake the clock with a 2s delay, giving us 2s worth
    of data we can leave behind when disabling aggregation. If it takes a
    worker more than two seconds to go idle after it finishes its last work
    item, we likely have bigger problems in the system, and won't notice one
    sample that was averaged with a bogus per-CPU weight.

    Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
    Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
    Signed-off-by: Johannes Weiner
    Reported-by: Suren Baghdasaryan
    Acked-by: Tejun Heo
    Cc: Peter Zijlstra
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

31 Jan, 2019

1 commit

  • Provide a new function, queue_work_node, which is meant to schedule work on
    a "random" CPU of the requested NUMA node. The main motivation for this is
    to help assist asynchronous init to better improve boot times for devices
    that are local to a specific node.

    For now we just default to the first CPU that is in the intersection of the
    cpumask of the node and the online cpumask. The only exception is if the
    CPU is local to the node we will just use the current CPU. This should work
    for our purposes as we are currently only using this for unbound work so
    the CPU will be translated to a node anyway instead of being directly used.

    As we are only using the first CPU to represent the NUMA node for now I am
    limiting the scope of the function so that it can only be used with unbound
    workqueues.

    Acked-by: Tejun Heo
    Reviewed-by: Bart Van Assche
    Acked-by: Dan Williams
    Signed-off-by: Alexander Duyck
    Signed-off-by: Greg Kroah-Hartman

    Alexander Duyck
     

25 Jan, 2019

1 commit

  • syzbot found a flush_work() caller who forgot to call INIT_WORK()
    because that work_struct was allocated by kzalloc() [1]. But the message

    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.

    by lock_map_acquire() is failing to tell that INIT_WORK() is missing.

    Since flush_work() without INIT_WORK() is a bug, and INIT_WORK() should
    set ->func field to non-zero, let's warn if ->func field is zero.

    [1] https://syzkaller.appspot.com/bug?id=a5954455fcfa51c29ca2ab55b203076337e1c770

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Tejun Heo

    Tetsuo Handa
     

28 Nov, 2018

1 commit

  • Now that call_rcu()'s callback is not invoked until after all
    preempt-disable regions of code have completed (in addition to explicitly
    marked RCU read-side critical sections), call_rcu() can be used in place
    of call_rcu_sched(). This commit therefore makes that change.

    Signed-off-by: Paul E. McKenney
    Cc: Lai Jiangshan
    Acked-by: Tejun Heo

    Paul E. McKenney
     

30 Aug, 2018

1 commit

  • Some architectures need to use stop_machine() to patch functions for
    ftrace, and the assumption is that the stopped CPUs do not make function
    calls to traceable functions when they are in the stopped state.

    Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
    MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
    the stopped CPUs and those functions lack notrace annotations. This
    leads to crashes when enabling/disabling ftrace on ARM kernels built
    with the Thumb-2 instruction set.

    Fix it by adding the necessary notrace annotations.

    Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
    Signed-off-by: Vincent Whitchurch
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com

    Vincent Whitchurch
     

25 Aug, 2018

1 commit


22 Aug, 2018

2 commits

  • In flush_work(), we need to create a lockdep dependency so that
    the following scenario is appropriately tagged as a problem:

    work_function()
    {
    mutex_lock(&mutex);
    ...
    }

    other_function()
    {
    mutex_lock(&mutex);
    flush_work(&work); // or cancel_work_sync(&work);
    }

    This is a problem since the work might be running and be blocked
    on trying to acquire the mutex.

    Similarly, in flush_workqueue().

    These were removed after cross-release partially caught these
    problems, but now cross-release was reverted anyway. IMHO the
    removal was erroneous anyway though, since lockdep should be
    able to catch potential problems, not just actual ones, and
    cross-release would only have caught the problem when actually
    invoking wait_for_completion().

    Fixes: fd1a5b04dfb8 ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes")
    Signed-off-by: Johannes Berg
    Signed-off-by: Tejun Heo

    Johannes Berg
     
  • In cancel_work_sync(), we can only have one of two cases, even
    with an ordered workqueue:
    * the work isn't running, just cancelled before it started
    * the work is running, but then nothing else can be on the
    workqueue before it

    Thus, we need to skip the lockdep workqueue dependency handling,
    otherwise we get false positive reports from lockdep saying that
    we have a potential deadlock when the workqueue also has other
    work items with locking, e.g.

    work1_function() { mutex_lock(&mutex); ... }
    work2_function() { /* nothing */ }

    other_function() {
    queue_work(ordered_wq, &work1);
    queue_work(ordered_wq, &work2);
    mutex_lock(&mutex);
    cancel_work_sync(&work2);
    }

    As described above, this isn't a problem, but lockdep will
    currently flag it as if cancel_work_sync() was flush_work(),
    which *is* a problem.

    Signed-off-by: Johannes Berg
    Signed-off-by: Tejun Heo

    Johannes Berg
     

13 Jun, 2018

1 commit

  • The kzalloc() function has a 2-factor argument form, kcalloc(). This
    patch replaces cases of:

    kzalloc(a * b, gfp)

    with:
    kcalloc(a * b, gfp)

    as well as handling cases of:

    kzalloc(a * b * c, gfp)

    with:

    kzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kzalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kzalloc
    + kcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kzalloc(sizeof(THING) * C2, ...)
    |
    kzalloc(sizeof(TYPE) * C2, ...)
    |
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(C1 * C2, ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

11 Jun, 2018

1 commit

  • Pull SCSI updates from James Bottomley:
    "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
    xfcp, hisi_sas, cxlflash, qla2xxx.

    In the absence of Nic, we're also taking target updates which are
    mostly minor except for the tcmu refactor.

    The only real core change to worry about is the removal of high page
    bouncing (in sas, storvsc and iscsi). This has been well tested and no
    problems have shown up so far"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
    scsi: lpfc: update driver version to 12.0.0.4
    scsi: lpfc: Fix port initialization failure.
    scsi: lpfc: Fix 16gb hbas failing cq create.
    scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
    scsi: lpfc: correct oversubscription of nvme io requests for an adapter
    scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
    scsi: hisi_sas: Mark PHY as in reset for nexus reset
    scsi: hisi_sas: Fix return value when get_free_slot() failed
    scsi: hisi_sas: Terminate STP reject quickly for v2 hw
    scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
    scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
    scsi: hisi_sas: Try wait commands before before controller reset
    scsi: hisi_sas: Init disks after controller reset
    scsi: hisi_sas: Create a scsi_host_template per HW module
    scsi: hisi_sas: Reset disks when discovered
    scsi: hisi_sas: Add LED feature for v3 hw
    scsi: hisi_sas: Change common allocation mode of device id
    scsi: hisi_sas: change slot index allocation mode
    scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
    scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
    ...

    Linus Torvalds
     

07 Jun, 2018

2 commits

  • Pull overflow updates from Kees Cook:
    "This adds the new overflow checking helpers and adds them to the
    2-factor argument allocators. And this adds the saturating size
    helpers and does a treewide replacement for the struct_size() usage.
    Additionally this adds the overflow testing modules to make sure
    everything works.

    I'm still working on the treewide replacements for allocators with
    "simple" multiplied arguments:

    *alloc(a * b, ...) -> *alloc_array(a, b, ...)

    and

    *zalloc(a * b, ...) -> *calloc(a, b, ...)

    as well as the more complex cases, but that's separable from this
    portion of the series. I expect to have the rest sent before -rc1
    closes; there are a lot of messy cases to clean up.

    Summary:

    - Introduce arithmetic overflow test helper functions (Rasmus)

    - Use overflow helpers in 2-factor allocators (Kees, Rasmus)

    - Introduce overflow test module (Rasmus, Kees)

    - Introduce saturating size helper functions (Matthew, Kees)

    - Treewide use of struct_size() for allocators (Kees)"

    * tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    treewide: Use struct_size() for devm_kmalloc() and friends
    treewide: Use struct_size() for vmalloc()-family
    treewide: Use struct_size() for kmalloc()-family
    device: Use overflow helpers for devm_kmalloc()
    mm: Use overflow helpers in kvmalloc()
    mm: Use overflow helpers in kmalloc_array*()
    test_overflow: Add memory allocation overflow tests
    overflow.h: Add allocation size calculation helpers
    test_overflow: Report test failures
    test_overflow: macrofy some more, do more tests for free
    lib: add runtime test of check_*_overflow functions
    compiler.h: enable builtin overflow checkers and add fallback code

    Linus Torvalds
     
  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct foo {
    int stuff;
    void *entry[];
    };

    instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

    Instead of leaving these open-coded and prone to type mistakes, we can
    now use the new struct_size() helper:

    instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

    This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
    uses. It was done via automatic conversion with manual review for the
    "CHECKME" non-standard cases noted below, using the following Coccinelle
    script:

    // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
    // sizeof *pkey_cache->table, GFP_KERNEL);
    @@
    identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
    expression GFP;
    identifier VAR, ELEMENT;
    expression COUNT;
    @@

    - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
    + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

    // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
    @@
    identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
    expression GFP;
    identifier VAR, ELEMENT;
    expression COUNT;
    @@

    - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
    + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

    // Same pattern, but can't trivially locate the trailing element name,
    // or variable name.
    @@
    identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
    expression GFP;
    expression SOMETHING, COUNT, ELEMENT;
    @@

    - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
    + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

    Signed-off-by: Kees Cook

    Kees Cook
     

24 May, 2018

1 commit

  • In commit 7ee681b25284 ("workqueue: Convert to state machine callbacks"),
    three new function definitions were added: ‘workqueue_prepare_cpu’,
    ‘workqueue_online_cpu’ and ‘workqueue_offline_cpu’.

    Move these function definitions within a CONFIG_SMP block since they are
    not used outside of it. This will match function declarations in header
    , and silence the following gcc warning (W=1):

    kernel/workqueue.c:4743:5: warning: no previous prototype for ‘workqueue_prepare_cpu’ [-Wmissing-prototypes]
    kernel/workqueue.c:4756:5: warning: no previous prototype for ‘workqueue_online_cpu’ [-Wmissing-prototypes]
    kernel/workqueue.c:4783:5: warning: no previous prototype for ‘workqueue_offline_cpu’ [-Wmissing-prototypes]

    Signed-off-by: Mathieu Malaterre
    Signed-off-by: Tejun Heo

    Mathieu Malaterre
     

21 May, 2018

1 commit

  • The worker struct could already be freed when wq_worker_comm() tries
    to access it for reporting. This patch protects PF_WQ_WORKER
    modifications with wq_pool_attach_mutex and makes wq_worker_comm()
    test the flag before dereferencing worker from kthread_data(), which
    ensures that it only dereferences when the worker struct is valid.

    Signed-off-by: Tejun Heo
    Reported-by: Lai Jiangshan
    Fixes: 6b59808bfe48 ("workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}")

    Tejun Heo
     

18 May, 2018

4 commits

  • There can be a lot of workqueue workers and they all show up with the
    cryptic kworker/* names making it difficult to understand which is
    doing what and how they came to be.

    # ps -ef | grep kworker
    root 4 2 0 Feb25 ? 00:00:00 [kworker/0:0H]
    root 6 2 0 Feb25 ? 00:00:00 [kworker/u112:0]
    root 19 2 0 Feb25 ? 00:00:00 [kworker/1:0H]
    root 25 2 0 Feb25 ? 00:00:00 [kworker/2:0H]
    root 31 2 0 Feb25 ? 00:00:00 [kworker/3:0H]
    ...

    This patch makes workqueue workers report the latest workqueue it was
    executing for through /proc/PID/{comm,stat,status}. The extra
    information is appended to the kthread name with intervening '+' if
    currently executing, otherwise '-'.

    # cat /proc/25/comm
    kworker/2:0-events_power_efficient
    # cat /proc/25/stat
    25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
    # grep Name /proc/25/status
    Name: kworker/2:0-events_power_efficient

    Unfortunately, ps(1) truncates comm to 15 characters,

    # ps 25
    PID TTY STAT TIME COMMAND
    25 ? I 0:00 [kworker/2:0-eve]

    making it a lot less useful; however, this should be an easy fix from
    ps(1) side.

    Signed-off-by: Tejun Heo
    Suggested-by: Linus Torvalds
    Cc: Craig Small

    Tejun Heo
     
  • Work functions can use set_worker_desc() to improve the visibility of
    what the worker task is doing. Currently, the desc field is unset at
    the beginning of each execution and there is a separate field to track
    the field is set during the current execution.

    Instead of leaving empty till desc is set, worker->desc can be used to
    remember the last workqueue the worker worked on by default and users
    that use set_worker_desc() can override it to something more
    informative as necessary.

    This simplifies desc handling and helps tracking the last workqueue
    that the worker exected on to improve visibility.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • For historical reasons, the worker attach/detach functions don't
    currently manage worker->pool and the callers are manually and
    inconsistently updating it.

    This patch moves worker->pool updates into the worker attach/detach
    functions. This makes worker->pool consistent and clearly defines how
    worker->pool updates are synchronized.

    This will help later workqueue visibility improvements by allowing
    safe access to workqueue information from worker->task.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • To improve workqueue visibility, we want to be able to access
    workqueue information from worker tasks. The per-pool attach mutex
    makes that difficult because there's no way of stabilizing task ->
    worker pool association without knowing the pool first.

    Worker attach/detach is a slow path and there's no need for different
    pools to be able to perform them concurrently. This patch replaces
    the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
    for visibility improvement changes.

    Signed-off-by: Tejun Heo

    Tejun Heo