12 Oct, 2016

40 commits

  • This patch allows to make kthread worker freezable via a new @flags
    parameter. It will allow to avoid an init work in some kthreads.

    It currently does not affect the function of kthread_worker_fn()
    but it might help to do some optimization or fixes eventually.

    I currently do not know about any other use for the @flags
    parameter but I believe that we will want more flags
    in the future.

    Finally, I hope that it will not cause confusion with @flags member
    in struct kthread. Well, I guess that we will want to rework the
    basic kthreads implementation once all kthreads are converted into
    kthread workers or workqueues. It is possible that we will merge
    the two structures.

    Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • There are situations when we need to modify the delay of a delayed kthread
    work. For example, when the work depends on an event and the initial delay
    means a timeout. Then we want to queue the work immediately when the event
    happens.

    This patch implements kthread_mod_delayed_work() as inspired workqueues.
    It cancels the timer, removes the work from any worker list and queues it
    again with the given timeout.

    A very special case is when the work is being canceled at the same time.
    It might happen because of the regular kthread_cancel_delayed_work_sync()
    or by another kthread_mod_delayed_work(). In this case, we do nothing and
    let the other operation win. This should not normally happen as the caller
    is supposed to synchronize these operations a reasonable way.

    Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • We are going to use kthread workers more widely and sometimes we will need
    to make sure that the work is neither pending nor running.

    This patch implements cancel_*_sync() operations as inspired by
    workqueues. Well, we are synchronized against the other operations via
    the worker lock, we use del_timer_sync() and a counter to count parallel
    cancel operations. Therefore the implementation might be easier.

    First, we check if a worker is assigned. If not, the work has newer been
    queued after it was initialized.

    Second, we take the worker lock. It must be the right one. The work must
    not be assigned to another worker unless it is initialized in between.

    Third, we try to cancel the timer when it exists. The timer is deleted
    synchronously to make sure that the timer call back is not running. We
    need to temporary release the worker->lock to avoid a possible deadlock
    with the callback. In the meantime, we set work->canceling counter to
    avoid any queuing.

    Fourth, we try to remove the work from a worker list. It might be
    the list of either normal or delayed works.

    Fifth, if the work is running, we call kthread_flush_work(). It might
    take an arbitrary time. We need to release the worker-lock again. In the
    meantime, we again block any queuing by the canceling counter.

    As already mentioned, the check for a pending kthread work is done under a
    lock. In compare with workqueues, we do not need to fight for a single
    PENDING bit to block other operations. Therefore we do not suffer from
    the thundering storm problem and all parallel canceling jobs might use
    kthread_flush_work(). Any queuing is blocked until the counter gets zero.

    Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • We are going to use kthread_worker more widely and delayed works
    will be pretty useful.

    The implementation is inspired by workqueues. It uses a timer to queue
    the work after the requested delay. If the delay is zero, the work is
    queued immediately.

    In compare with workqueues, each work is associated with a single worker
    (kthread). Therefore the implementation could be much easier. In
    particular, we use the worker->lock to synchronize all the operations with
    the work. We do not need any atomic operation with a flags variable.

    In fact, we do not need any state variable at all. Instead, we add a list
    of delayed works into the worker. Then the pending work is listed either
    in the list of queued or delayed works. And the existing check of pending
    works is the same even for the delayed ones.

    A work must not be assigned to another worker unless reinitialized.
    Therefore the timer handler might expect that dwork->work->worker is valid
    and it could simply take the lock. We just add some sanity checks to help
    with debugging a potential misuse.

    Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Nothing currently prevents a work from queuing for a kthread worker when
    it is already running on another one. This means that the work might run
    in parallel on more than one worker. Also some operations are not
    reliable, e.g. flush.

    This problem will be even more visible after we add kthread_cancel_work()
    function. It will only have "work" as the parameter and will use
    worker->lock to synchronize with others.

    Well, normally this is not a problem because the API users are sane.
    But bugs might happen and users also might be crazy.

    This patch adds a warning when we try to insert the work for another
    worker. It does not fully prevent the misuse because it would make the
    code much more complicated without a big benefit.

    It adds the same warning also into kthread_flush_work() instead of the
    repeated attempts to get the right lock.

    A side effect is that one needs to explicitly reinitialize the work if it
    must be queued into another worker. This is needed, for example, when the
    worker is stopped and started again. It is a bit inconvenient. But it
    looks like a good compromise between the stability and complexity.

    I have double checked all existing users of the kthread worker API and
    they all seems to initialize the work after the worker gets started.

    Just for completeness, the patch adds a check that the work is not already
    in a queue.

    The patch also puts all the checks into a separate function. It will be
    reused when implementing delayed works.

    Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • The current kthread worker users call flush() and stop() explicitly.
    This function does the same plus it frees the kthread_worker struct
    in one call.

    It is supposed to be used together with kthread_create_worker*() that
    allocates struct kthread_worker.

    Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Kthread workers are currently created using the classic kthread API,
    namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
    parameter.

    This patch defines kthread_create_worker() and
    kthread_create_worker_on_cpu() functions that hide implementation details.

    They enforce using kthread_worker_fn() for the main thread. But I doubt
    that there are any plans to create any alternative. In fact, I think that
    we do not want any alternative main thread because it would be hard to
    support consistency with the rest of the kthread worker API.

    The naming and function of kthread_create_worker() is inspired by the
    workqueues API like the rest of the kthread worker API.

    The kthread_create_worker_on_cpu() variant is motivated by the original
    kthread_create_on_cpu(). Note that we need to bind per-CPU kthread
    workers already when they are created. It makes the life easier.
    kthread_bind() could not be used later for an already running worker.

    This patch does _not_ convert existing kthread workers. The kthread
    worker API need more improvements first, e.g. a function to destroy the
    worker.

    IMPORTANT:

    kthread_create_worker_on_cpu() allows to use any format of the worker
    name, in compare with kthread_create_on_cpu(). The good thing is that it
    is more generic. The bad thing is that most users will need to pass the
    cpu number in two parameters, e.g. kthread_create_worker_on_cpu(cpu,
    "helper/%d", cpu).

    To be honest, the main motivation was to avoid the need for an empty
    va_list. The only legal way was to create a helper function that would be
    called with an empty list. Other attempts caused compilation warnings or
    even errors on different architectures.

    There were also other alternatives, for example, using #define or
    splitting __kthread_create_worker(). The used solution looked like the
    least ugly.

    Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • kthread_create_on_node() implements a bunch of logic to create the
    kthread. It is already called by kthread_create_on_cpu().

    We are going to extend the kthread worker API and will need to call
    kthread_create_on_node() with va_list args there.

    This patch does only a refactoring and does not modify the existing
    behavior.

    Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
    ("kthread: Implement park/unpark facility"). It is currently used only
    when enabling new CPU. For this purpose, the newly created kthread has to
    be parked.

    The CPU binding is a bit tricky. The kthread is parked when the CPU has
    not been allowed yet. And the CPU is bound when the kthread is unparked.

    The function would be useful for more per-CPU kthreads, e.g.
    bnx2fc_thread, fcoethread. For this purpose, the newly created kthread
    should stay in the uninterruptible state.

    This patch moves the parking into smpboot. It binds the thread already
    when created. Then the function might be used universally. Also the
    behavior is consistent with kthread_create() and kthread_create_on_node().

    Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Reviewed-by: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • A good practice is to prefix the names of functions by the name
    of the subsystem.

    The kthread worker API is a mix of classic kthreads and workqueues. Each
    worker has a dedicated kthread. It runs a generic function that process
    queued works. It is implemented as part of the kthread subsystem.

    This patch renames the existing kthread worker API to use
    the corresponding name from the workqueues API prefixed by
    kthread_:

    __init_kthread_worker() -> __kthread_init_worker()
    init_kthread_worker() -> kthread_init_worker()
    init_kthread_work() -> kthread_init_work()
    insert_kthread_work() -> kthread_insert_work()
    queue_kthread_work() -> kthread_queue_work()
    flush_kthread_work() -> kthread_flush_work()
    flush_kthread_worker() -> kthread_flush_worker()

    Note that the names of DEFINE_KTHREAD_WORK*() macros stay
    as they are. It is common that the "DEFINE_" prefix has
    precedence over the subsystem names.

    Note that INIT() macros and init() functions use different
    naming scheme. There is no good solution. There are several
    reasons for this solution:

    + "init" in the function names stands for the verb "initialize"
    aka "initialize worker". While "INIT" in the macro names
    stands for the noun "INITIALIZER" aka "worker initializer".

    + INIT() macros are used only in DEFINE() macros

    + init() functions are used close to the other kthread()
    functions. It looks much better if all the functions
    use the same scheme.

    + There will be also kthread_destroy_worker() that will
    be used close to kthread_cancel_work(). It is related
    to the init() function. Again it looks better if all
    functions use the same naming scheme.

    + there are several precedents for such init() function
    names, e.g. amd_iommu_init_device(), free_area_init_node(),
    jump_label_init_type(), regmap_init_mmio_clk(),

    + It is not an argument but it was inconsistent even before.

    [arnd@arndb.de: fix linux-next merge conflict]
    Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
    Suggested-by: Andrew Morton
    Signed-off-by: Petr Mladek
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Patch series "kthread: Kthread worker API improvements"

    The intention of this patchset is to make it easier to manipulate and
    maintain kthreads. Especially, I want to replace all the custom main
    cycles with a generic one. Also I want to make the kthreads sleep in a
    consistent state in a common place when there is no work.

    This patch (of 11):

    A good practice is to prefix the names of functions by the name of the
    subsystem.

    This patch fixes the name of probe_kthread_data(). The other wrong
    functions names are part of the kthread worker API and will be fixed
    separately.

    Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
    Signed-off-by: Petr Mladek
    Suggested-by: Andrew Morton
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Jiri Kosina
    Cc: Borislav Petkov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Vim, with the omnicppcomplete(#1) plugin, can do code completion using
    information build by ctags. Add flags needed by omnicppcomplete(#2) to
    have completion on member of structure.

    1: https://github.com/vim-scripts/omnicppcomplete
    2: https://github.com/vim-scripts/OmniCppComplete/blob/master/doc/omnicppcomplete.txt#L93

    Link: http://lkml.kernel.org/r/20160830191546.4469-1-mathieu.maret@gmail.com
    Signed-off-by: Mathieu Maret
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Maret
     
  • Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
    physical address to a virtual one using __va(). However, such physical
    addresses may sometimes be located in highmem and using __va() is
    incorrect, leading to inconsistent object tracking in kmemleak.

    The following functions have been added to the kmemleak API and they take
    a physical address as the object pointer. They only perform the
    corresponding action if the address has a lowmem mapping:

    kmemleak_alloc_phys
    kmemleak_free_part_phys
    kmemleak_not_leak_phys
    kmemleak_ignore_phys

    The affected calling places have been updated to use the new kmemleak
    API.

    Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Reported-by: Vignesh R
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     
  • KASLR memory randomization can randomize the base of the physical memory
    mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
    (VMEMMAP_START). Adding these variables on VMCOREINFO so tools can easily
    identify the base of each memory section.

    Link: http://lkml.kernel.org/r/1471531632-23003-1-git-send-email-thgarnie@google.com
    Signed-off-by: Thomas Garnier
    Acked-by: Baoquan He
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Eric Biederman
    Cc: Xunlei Pang
    Cc: HATAYAMA Daisuke
    Cc: Kees Cook
    Cc: Eugene Surovegin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Garnier
     
  • In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
    exit_sem. Apparently it's possible for the loop to take quite a long time
    and it doesn't have a scheduling point in it. Since the codes is
    executing under an rcu read section this may also cause rcu stalls, which
    in turn block synchronize_rcu operations, which more or less de-stabilises
    the whole system.

    Fix this by introducing a cond_resched() at the beginning of the loop.

    So this patch fixes the following:

    NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
    CPU: 10 PID: 18119 Comm: httpd Tainted: G O 4.4.20-clouder2 #6
    Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
    task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
    RIP: 0010:[] [] _raw_spin_lock+0x17/0x30
    RSP: 0018:ffff881c95553e40 EFLAGS: 00000246
    RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
    RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
    RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
    R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
    R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
    FS: 0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
    Call Trace:
    ? exit_sem+0x7c/0x280
    do_exit+0x338/0xb40
    do_group_exit+0x43/0xd0
    SyS_exit_group+0x14/0x20
    entry_SYSCALL_64_fastpath+0x16/0x6e

    Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com
    Signed-off-by: Nikolay Borisov
    Cc: Herton R. Krzesinski
    Cc: Fabian Frederick
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikolay Borisov
     
  • Blocked tasks queued in q_senders waiting for their message to fit in the
    queue are blindly awoken every time we think there's a remote chance this
    might happen. This could cause numerous (and expensive -- thundering
    herd-ish) bogus wakeups if the queue is still really full. Adding to the
    scheduling cost/overhead, there's also the fact that we need to take the
    ipc object lock and requeue ourselves in the q_senders list.

    By keeping track of the blocked sender's message size, we can know
    previously if the wakeup ought to occur or not. Otherwise, to maintain
    the current wakeup order we just move it to the tail. This is exactly
    what occurs right now if the sender needs to go back to sleep.

    The case of EIDRM is left completely untouched, as we need to wakeup all
    the tasks, and shouldn't be playing games in the first place.

    This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
    switches (avg out of ten runs). Although these tests are really about
    functionality (as opposed to performance), is does show the direct
    benefits of the optimization.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • ... 'tis annoying.

    Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Currently the use of wake_qs in sysv msg queues are only for the receiver
    tasks that are blocked on the queue. But blocked sender tasks (due to
    queue size constraints) still are awoken with the ipc object lock held,
    which can be a problem particularly for small sized queues and far from
    gracious for -rt (just like it was for the receiver side).

    The paths that actually wakeup a sender are obviously related to when we
    are either getting rid of the queue or after (some) space is freed-up
    after a receiver takes the msg (msgrcv). Furthermore, with the exception
    of msgrcv, we can always piggy-back on expunge_all that has its own tasks
    lined-up for waking. Finally, upon unlinking the message, it should be no
    problem delaying the wakeups a bit until after we've released the lock.

    Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patch moves the wakeup_process() invocation so it is not done under
    the ipc global lock by making use of a lockless wake_q. With this change,
    the waiter is woken up once the message has been assigned and it does not
    need to loop on SMP if the message points to NULL. In the signal case we
    still need to check the pointer under the lock to verify the state.

    This change should also avoid the introduction of preempt_disable() in -RT
    which avoids a busy-loop which pools for the NULL -> !NULL change if the
    waiter has a higher priority compared to the waker.

    By making use of wake_qs, the logic of sysv msg queues is greatly
    simplified (and very well suited as we can batch lockless wakeups),
    particularly around the lockless receive algorithm.

    This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
    Radeon R7, 12 Compute Cores 4C+8G":

    test | before | after | diff
    -----------------|------------|------------|----------
    pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
    pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
    pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 %

    Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     
  • Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
    race:

    sem_lock has a fast path that allows parallel simple operations.
    There are two reasons why a simple operation cannot run in parallel:
    - a non-simple operations is ongoing (sma->sem_perm.lock held)
    - a complex operation is sleeping (sma->complex_count != 0)

    As both facts are stored independently, a thread can bypass the current
    checks by sleeping in the right positions. See below for more details
    (or kernel bugzilla 105651).

    The patch fixes that by creating one variable (complex_mode)
    that tracks both reasons why parallel operations are not possible.

    The patch also updates stale documentation regarding the locking.

    With regards to stable kernels:
    The patch is required for all kernels that include the
    commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

    The alternative is to revert the patch that introduced the race.

    The patch is safe for backporting, i.e. it makes no assumptions
    about memory barriers in spin_unlock_wait().

    Background:
    Here is the race of the current implementation:

    Thread A: (simple op)
    - does the first "sma->complex_count == 0" test

    Thread B: (complex op)
    - does sem_lock(): This includes an array scan. But the scan can't
    find Thread A, because Thread A does not own sem->lock yet.
    - the thread does the operation, increases complex_count,
    drops sem_lock, sleeps

    Thread A:
    - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
    - sleeps before the complex_count test

    Thread C: (complex op)
    - does sem_lock (no array scan, complex_count==1)
    - wakes up Thread B.
    - decrements complex_count

    Thread A:
    - does the complex_count test

    Bug:
    Now both thread A and thread C operate on the same array, without
    any synchronization.

    Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
    Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
    Reported-by:
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Cc: Davidlohr Bueso
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc:
    Cc: [3.10+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • There's no point in collecting coverage from lib/stackdepot.c, as it is
    not a function of syscall inputs. Disabling kcov instrumentation for that
    file will reduce the coverage noise level.

    Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Acked-by: Dmitry Vyukov
    Cc: Kostya Serebryany
    Cc: Andrey Konovalov
    Cc: syzkaller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • As of Android N, SECCOMP is required. Without it, we will get
    mediaextractor error:

    E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument

    Link: http://lkml.kernel.org/r/20160908185934.18098-3-robh@kernel.org
    Signed-off-by: Rob Herring
    Acked-by: John Stultz
    Cc: Amit Pundir
    Cc: Dmitry Shmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Herring
     
  • Android won't boot without SELinux enabled, so make it the default.

    Link: http://lkml.kernel.org/r/20160908185934.18098-2-robh@kernel.org
    Signed-off-by: Rob Herring
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Herring
     
  • CONFIG_MD is in recommended, but other dependent options like DM_CRYPT and
    DM_VERITY options are in base. The result is the options in base don't
    get enabled when applying both base and recommended fragments. Move all
    the options to recommended.

    Link: http://lkml.kernel.org/r/20160908185934.18098-1-robh@kernel.org
    Signed-off-by: Rob Herring
    Acked-by: John Stultz
    Cc: Amit Pundir
    Cc: Dmitry Shmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Herring
     
  • Option is long gone, see commit 5d9efa7ee99e ("ipv6: Remove privacy
    config option.")

    Link: http://lkml.kernel.org/r/20160811170340.9859-1-bp@alien8.de
    Signed-off-by: Borislav Petkov
    Cc: Rob Herring
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Relay avoids calling wake_up_interruptible() for doing the wakeup of
    readers/consumers, waiting for the generation of new data, from the
    context of a process which produced the data. This is apparently done to
    prevent the possibility of a deadlock in case Scheduler itself is is
    generating data for the relay, after acquiring rq->lock.

    The following patch used a timer (to be scheduled at next jiffy), for
    delegating the wakeup to another context.
    commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7
    Author: Tom Zanussi
    Date: Wed May 9 02:34:01 2007 -0700

    relay: use plain timer instead of delayed work

    relay doesn't need to use schedule_delayed_work() for waking readers
    when a simple timer will do.

    Scheduling a plain timer, at next jiffies boundary, to do the wakeup
    causes a significant wakeup latency for the Userspace client, which makes
    relay less suitable for the high-frequency low-payload use cases where the
    data gets generated at a very high rate, like multiple sub buffers getting
    filled within a milli second. Moreover the timer is re-scheduled on every
    newly produced sub buffer so the timer keeps getting pushed out if sub
    buffers are filled in a very quick succession (less than a jiffy gap
    between filling of 2 sub buffers). As a result relay runs out of sub
    buffers to store the new data.

    By using irq_work it is ensured that wakeup of userspace client, blocked
    in the poll call, is done at earliest (through self IPI or next timer
    tick) enabling it to always consume the data in time. Also this makes
    relay consistent with printk & ring buffers (trace), as they too use
    irq_work for deferred wake up of readers.

    [arnd@arndb.de: select CONFIG_IRQ_WORK]
    Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Akash Goel
    Cc: Tom Zanussi
    Cc: Chris Wilson
    Cc: Tvrtko Ursulin
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • CONFIG_NO_HZ currently only sets the default value of dynticks config so
    if PPS kernel consumer needs periodic timer ticks it should depend on
    !CONFIG_NO_HZ_COMMON instead of !CONFIG_NO_HZ.

    Otherwise it is possible to enable it even on tickless system which has
    CONFIG_NO_HZ not set and CONFIG_NO_HZ_IDLE (or CONFIG_NO_HZ_FULL) set.

    Link: http://lkml.kernel.org/r/57E2B769.50202@maciej.szmigiero.name
    Signed-off-by: Maciej S. Szmigiero
    Acked-by: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maciej S. Szmigiero
     
  • Daniel Walker reported problems which happens when
    crash_kexec_post_notifiers kernel option is enabled
    (https://lkml.org/lkml/2015/6/24/44).

    In that case, smp_send_stop() is called before entering kdump routines
    which assume other CPUs are still online. As the result, kdump
    routines fail to save other CPUs' registers. Additionally for MIPS
    OCTEON, it misses to stop the watchdog timer.

    To fix this problem, call a new kdump friendly function,
    crash_smp_send_stop(), instead of the smp_send_stop() when
    crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a
    weak function, and it just call smp_send_stop(). Architecture
    codes should override it so that kdump can work appropriately.
    This patch provides MIPS version.

    Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
    Link: http://lkml.kernel.org/r/20160810080950.11028.28000.stgit@sysi4-13.yrl.intra.hitachi.co.jp
    Signed-off-by: Hidehiro Kawai
    Reported-by: Daniel Walker
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Eric Biederman
    Cc: Masami Hiramatsu
    Cc: Daniel Walker
    Cc: Xunlei Pang
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: David Vrabel
    Cc: Toshi Kani
    Cc: Ralf Baechle
    Cc: David Daney
    Cc: Aaro Koskinen
    Cc: "Steven J. Hill"
    Cc: Corey Minyard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • Daniel Walker reported problems which happens when
    crash_kexec_post_notifiers kernel option is enabled
    (https://lkml.org/lkml/2015/6/24/44).

    In that case, smp_send_stop() is called before entering kdump routines
    which assume other CPUs are still online. As the result, for x86, kdump
    routines fail to save other CPUs' registers and disable virtualization
    extensions.

    To fix this problem, call a new kdump friendly function,
    crash_smp_send_stop(), instead of the smp_send_stop() when
    crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
    function, and it just call smp_send_stop(). Architecture codes should
    override it so that kdump can work appropriately. This patch only
    provides x86-specific version.

    For Xen's PV kernel, just keep the current behavior.

    NOTES:

    - Right solution would be to place crash_smp_send_stop() before
    __crash_kexec() invocation in all cases and remove smp_send_stop(), but
    we can't do that until all architectures implement own
    crash_smp_send_stop()

    - crash_smp_send_stop()-like work is still needed by
    machine_crash_shutdown() because crash_kexec() can be called without
    entering panic()

    Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
    Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
    Signed-off-by: Hidehiro Kawai
    Reported-by: Daniel Walker
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Eric Biederman
    Cc: Masami Hiramatsu
    Cc: Daniel Walker
    Cc: Xunlei Pang
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: David Vrabel
    Cc: Toshi Kani
    Cc: Ralf Baechle
    Cc: David Daney
    Cc: Aaro Koskinen
    Cc: "Steven J. Hill"
    Cc: Corey Minyard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme
    driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR).

    Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com
    Signed-off-by: Mauricio Faria de Oliveira
    Reviewed-by: Gabriel Krisman Bertazi
    Cc: Keith Busch
    Cc: Jens Axboe
    Cc: Benjamin Herrenschmidt
    Cc: Michael Ellerman
    Cc: Krzysztof Kozlowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mauricio Faria de Oliveira
     
  • Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

    Link: http://lkml.kernel.org/r/1470092390-25451-3-git-send-email-mauricfo@linux.vnet.ibm.com
    Signed-off-by: Mauricio Faria de Oliveira
    Acked-by: Michael Ellerman
    Cc: Keith Busch
    Cc: Jens Axboe
    Cc: Benjamin Herrenschmidt
    Cc: Krzysztof Kozlowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mauricio Faria de Oliveira
     
  • Introduce the DMA_ATTR_NO_WARN attribute, and document it.

    Link: http://lkml.kernel.org/r/1470092390-25451-2-git-send-email-mauricfo@linux.vnet.ibm.com
    Signed-off-by: Mauricio Faria de Oliveira
    Cc: Keith Busch
    Cc: Jens Axboe
    Cc: Benjamin Herrenschmidt
    Cc: Michael Ellerman
    Cc: Krzysztof Kozlowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mauricio Faria de Oliveira
     
  • All call sites for randomize_range have been updated to use the much
    simpler and more robust randomize_addr(). Remove the now unnecessary
    code.

    Link: http://lkml.kernel.org/r/20160803233913.32511-8-jason@lakedaemon.net
    Signed-off-by: Jason Cooper
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • Currently, all callers to randomize_range() set the length to 0 and
    calculate end by adding a constant to the start address. We can simplify
    the API to remove a bunch of needless checks and variables.

    Use the new randomize_addr(start, range) call to set the requested
    address.

    Link: http://lkml.kernel.org/r/20160803233913.32511-7-jason@lakedaemon.net
    Signed-off-by: Jason Cooper
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Guan Xuetao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • Currently, all callers to randomize_range() set the length to 0 and
    calculate end by adding a constant to the start address. We can simplify
    the API to remove a bunch of needless checks and variables.

    Use the new randomize_addr(start, range) call to set the requested
    address.

    Link: http://lkml.kernel.org/r/20160803233913.32511-6-jason@lakedaemon.net
    Signed-off-by: Jason Cooper
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • Currently, all callers to randomize_range() set the length to 0 and
    calculate end by adding a constant to the start address. We can simplify
    the API to remove a bunch of needless checks and variables.

    Use the new randomize_addr(start, range) call to set the requested
    address.

    Link: http://lkml.kernel.org/r/20160803233913.32511-5-jason@lakedaemon.net
    Signed-off-by: Jason Cooper
    Acked-by: Will Deacon
    Acked-by: Kees Cook
    Cc: "Russell King - ARM Linux"
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • Currently, all callers to randomize_range() set the length to 0 and
    calculate end by adding a constant to the start address. We can simplify
    the API to remove a bunch of needless checks and variables.

    Use the new randomize_addr(start, range) call to set the requested
    address.

    Link: http://lkml.kernel.org/r/20160803233913.32511-4-jason@lakedaemon.net
    Signed-off-by: Jason Cooper
    Acked-by: Kees Cook
    Cc: "Russell King - ARM Linux"
    Cc: "Theodore Ts'o"
    Cc: Catalin Marinas
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • Currently, all callers to randomize_range() set the length to 0 and
    calculate end by adding a constant to the start address. We can simplify
    the API to remove a bunch of needless checks and variables.

    Use the new randomize_addr(start, range) call to set the requested
    address.

    Link: http://lkml.kernel.org/r/20160803233913.32511-3-jason@lakedaemon.net
    Signed-off-by: Jason Cooper
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • To date, all callers of randomize_range() have set the length to 0, and
    check for a zero return value. For the current callers, the only way to
    get zero returned is if end
    Cc: Nick Kralevich
    Cc: Jeffrey Vander Stoep
    Cc: Daniel Cashman
    Cc: Chris Metcalf
    Cc: Guan Xuetao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Cooper
     
  • Fix coccinelle warning about duplicating existing memdup_user function.

    Link: http://lkml.kernel.org/r/20160811151737.20140-1-alexandre.bounine@idt.com
    Link: https://lkml.org/lkml/2016/8/11/29
    Signed-off-by: Alexandre Bounine
    Reported-by: kbuild test robot
    Cc: Matt Porter
    Cc: Andre van Herk
    Cc: Barry Wood
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine