12 Mar, 2015

1 commit

  • RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
    (They -can- use SRCU, but not RCU.) This means that any use of RCU
    during or after the call to arch_cpu_idle_dead(). Unfortunately,
    commit 2ed53c0d6cc99 added a complete() call, which will contain RCU
    read-side critical sections if there is a task waiting to be awakened.

    Which, as it turns out, there almost never is. In my qemu/KVM testing,
    the to-be-awakened task is not yet asleep more than 99.5% of the time.
    In current mainline, failure is even harder to reproduce, requiring a
    virtualized environment that delays the outgoing CPU by at least three
    jiffies between the time it exits its stop_machine() task at CPU_DYING
    time and the time it calls arch_cpu_idle_dead() from the idle loop.
    However, this problem really can occur, especially in virtualized
    environments, and therefore really does need to be fixed

    This suggests moving back to the polling loop, but using a much shorter
    wait, with gentle exponential backoff instead of the old 100-millisecond
    wait. Most of the time, the loop will exit without waiting at all,
    and almost all of the remaining uses will wait only five microseconds.
    If the outgoing CPU is preempted, a loop will wait one jiffy, then
    increase the wait by a factor of 11/10ths, rounding up. As before, there
    is a five-second timeout.

    This commit therefore provides common-code infrastructure to do the
    dying-to-surviving CPU handoff in a safe manner. This code also
    provides an indication at CPU-online of whether the CPU to be onlined
    previously timed out on offline. The new cpu_check_up_prepare() function
    returns -EBUSY if this CPU previously took more than five seconds to
    go offline, or -EAGAIN if it has not yet managed to go offline. The
    rationale for -EAGAIN is that it might still be preempted, so an additional
    wait might well find it correctly offlined. Architecture-specific code
    can decide how to handle these conditions. Systems in which CPUs take
    themselves completely offline might respond to an -EBUSY return as if
    it was a zero (success) return. Systems in which the surviving CPU must
    take some action might take it at this time, or might simply mark the
    other CPU as unusable.

    Note that architectures that take the easy way out and simply pass the
    -EBUSY and -EAGAIN upwards will change the sysfs API.

    Signed-off-by: Paul E. McKenney
    Cc:
    Cc:
    [ paulmck: Fixed state machine for architectures that don't check earlier
    CPU-hotplug results as suggested by James Hogan. ]

    Paul E. McKenney
     

23 Jan, 2015

1 commit

  • The following race exists in the smpboot percpu threads management:

    CPU0 CPU1
    cpu_up(2)
    get_online_cpus();
    smpboot_create_threads(2);
    smpboot_register_percpu_thread();
    for_each_online_cpu();
    __smpboot_create_thread();
    __cpu_up(2);

    This results in a missing per cpu thread for the newly onlined cpu2 and
    in a NULL pointer dereference on a consecutive offline of that cpu.

    Proctect smpboot_register_percpu_thread() with get_online_cpus() to
    prevent that.

    [ tglx: Massaged changelog and removed the change in
    smpboot_unregister_percpu_thread() because that's an
    optimization and therefor not stable material. ]

    Signed-off-by: Lai Jiangshan
    Cc: Thomas Gleixner
    Cc: Rusty Russell
    Cc: Peter Zijlstra
    Cc: Srivatsa S. Bhat
    Cc: David Rientjes
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner

    Lai Jiangshan
     

28 Oct, 2014

1 commit

  • smp_hotplug_thread::{setup,unpark} functions can sleep too, so be
    consistent and do the same for all callbacks.

    __might_sleep+0x74/0x80
    kmem_cache_alloc_trace+0x4e/0x1c0
    perf_event_alloc+0x55/0x450
    perf_event_create_kernel_counter+0x2f/0x100
    watchdog_nmi_enable+0x8d/0x160
    watchdog_enable+0x45/0x90
    smpboot_thread_fn+0xec/0x2b0
    kthread+0xe4/0x100
    ret_from_fork+0x7c/0xb0

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: tglx@linutronix.de
    Cc: ilya.dryomov@inktank.com
    Cc: umgwanakikbuti@gmail.com
    Cc: oleg@redhat.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140924082242.392279328@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

12 Apr, 2013

1 commit

  • The smpboot threads rely on the park/unpark mechanism which binds per
    cpu threads on a particular core. Though the functionality is racy:

    CPU0 CPU1 CPU2
    unpark(T) wake_up_process(T)
    clear(SHOULD_PARK) T runs
    leave parkme() due to !SHOULD_PARK
    bind_to(CPU2) BUG_ON(wrong CPU)

    We cannot let the tasks move themself to the target CPU as one of
    those tasks is actually the migration thread itself, which requires
    that it starts running on the target cpu right away.

    The solution to this problem is to prevent wakeups in park mode which
    are not from unpark(). That way we can guarantee that the association
    of the task to the target cpu is working correctly.

    Add a new task state (TASK_PARKED) which prevents other wakeups and
    use this state explicitly for the unpark wakeup.

    Peter noticed: Also, since the task state is visible to userspace and
    all the parked tasks are still in the PID space, its a good hint in ps
    and friends that these tasks aren't really there for the moment.

    The migration thread has another related issue.

    CPU0 CPU1
    Bring up CPU2
    create_thread(T)
    park(T)
    wait_for_completion()
    parkme()
    complete()
    sched_set_stop_task()
    schedule(TASK_PARKED)

    The sched_set_stop_task() call is issued while the task is on the
    runqueue of CPU1 and that confuses the hell out of the stop_task class
    on that cpu. So we need the same synchronizaion before
    sched_set_stop_task().

    Reported-by: Dave Jones
    Reported-and-tested-by: Dave Hansen
    Reported-and-tested-by: Borislav Petkov
    Acked-by: Peter Ziljstra
    Cc: Srivatsa S. Bhat
    Cc: dhillf@gmail.com
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

09 Mar, 2013

1 commit

  • Commit b67bfe0d42ca ("hlist: drop the node parameter from iterators")
    did a lot of nice changes but also contains two small hunks that seem to
    have slipped in accidentally and have no apparent connection to the
    intent of the patch.

    This reverts the two extraneous changes.

    Signed-off-by: Arnd Bergmann
    Cc: Peter Senna Tschudin
    Cc: Paul E. McKenney
    Cc: Sasha Levin
    Cc: Thomas Gleixner
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

06 Mar, 2013

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

27 Feb, 2013

1 commit

  • commit 14e568e78 (stop_machine: Use smpboot threads) introduced the
    following regression:

    Before this commit the stopper enabled bit was set in the online
    notifier.

    CPU0 CPU1
    cpu_up
    cpu online
    hotplug_notifier(ONLINE)
    stopper(CPU1)->enabled = true;
    ...
    stop_machine()

    The conversion to smpboot threads moved the enablement to the wakeup
    path of the parked thread. The majority of users seem to have the
    following working order:

    CPU0 CPU1
    cpu_up
    cpu online
    unpark_threads()
    wakeup(stopper[CPU1])
    ....
    stopper thread runs
    stopper(CPU1)->enabled = true;
    stop_machine()

    But Konrad and Sander have observed:

    CPU0 CPU1
    cpu_up
    cpu online
    unpark_threads()
    wakeup(stopper[CPU1])
    ....
    stop_machine()
    stopper thread runs
    stopper(CPU1)->enabled = true;

    Now the stop machinery kicks CPU0 into the stop loop, where it gets
    stuck forever because the queue code saw stopper(CPU1)->enabled ==
    false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
    stopper work got discarded due to enabled == false.

    Add a pre_unpark function to the smpboot thread descriptor and call it
    before waking the thread.

    This fixes the problem at hand, but the stop_machine code should be
    more robust. The stopper->enabled flag smells fishy at best.

    Thanks to Konrad for going through a loop of debug patches and
    providing the information to decode this issue.

    Reported-and-tested-by: Konrad Rzeszutek Wilk
    Reported-and-tested-by: Sander Eikelenboom
    Cc: Srivatsa S. Bhat
    Cc: Rusty Russell
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

14 Feb, 2013

1 commit

  • The stop machine threads are still killed when a cpu goes offline. The
    reason is that the thread is used to bring the cpu down, so it can't
    be parked along with the other per cpu threads.

    Allow a per cpu thread to be excluded from automatic parking, so it
    can park itself once it's done

    Add a create callback function as well.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul McKenney
    Cc: Srivatsa S. Bhat
    Cc: Arjan van de Veen
    Cc: Paul Turner
    Cc: Richard Weinberger
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130131120741.553993267@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

13 Aug, 2012

2 commits

  • Because kernel subsystems need their per-CPU kthreads on UP systems as
    well as on SMP systems, the smpboot hotplug kthread functions must be
    provided in UP builds as well as in SMP builds. This commit therefore
    adds smpboot.c to UP builds and excludes irrelevant code via #ifdef.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner

    Paul E. McKenney
     
  • Provide a generic interface for setting up and tearing down percpu
    threads.

    On registration the threads for already online cpus are created and
    started. On deregistration (modules) the threads are stoppped.

    During hotplug operations the threads are created, started, parked and
    unparked. The datastructure for registration provides a pointer to
    percpu storage space and optional setup, cleanup, park, unpark
    functions. These functions are called when the thread state changes.

    Each implementation has to provide a function which is queried and
    returns whether the thread should run and the thread function itself.

    The core code handles all state transitions and avoids duplicated code
    in the call sites.

    [ paulmck: Preemption leak fix ]

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Reviewed-by: Srivatsa S. Bhat
    Cc: Rusty Russell
    Reviewed-by: Paul E. McKenney
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

25 May, 2012

2 commits

  • The comment over idle_threads_init() really talks about the functionality
    of idle_init(). Move that comment to idle_init(), and add a suitable
    comment over idle_threads_init().

    Signed-off-by: Srivatsa S. Bhat
    Cc: suresh.b.siddha@intel.com
    Cc: venki@google.com
    Cc: nikunj@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/20120524151100.2549.66501.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Thomas Gleixner

    Srivatsa S. Bhat
     
  • While trying to initialize idle threads for all cpus, idle_threads_init()
    calls smp_processor_id() in a loop, which is unnecessary. The intent
    is to initialize idle threads for all non-boot cpus. So just use a variable
    to note the boot cpu and use it in the loop.

    Signed-off-by: Srivatsa S. Bhat
    Cc: suresh.b.siddha@intel.com
    Cc: venki@google.com
    Cc: nikunj@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/20120524151055.2549.64309.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Thomas Gleixner

    Srivatsa S. Bhat
     

04 May, 2012

1 commit

  • percpu areas are already allocated during boot for each possible cpu.
    percpu idle threads can be considered as an extension of the percpu areas,
    and allocate them for each possible cpu during boot.

    This will eliminate the need for workqueue based idle thread allocation.
    In future we can move the idle thread area into the percpu area too.

    [ tglx: Moved the loop into smpboot.c and added an error check when
    the init code failed to allocate an idle thread for a cpu which
    should be onlined ]

    Signed-off-by: Suresh Siddha
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul E. McKenney
    Cc: Srivatsa S. Bhat
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc: venki@google.com
    Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Thomas Gleixner

    Suresh Siddha
     

26 Apr, 2012

2 commits

  • All SMP architectures have magic to fork the idle task and to store it
    for reusage when cpu hotplug is enabled. Provide a generic
    infrastructure for it.

    Create/reinit the idle thread for the cpu which is brought up in the
    generic code and hand the thread pointer to the architecture code via
    __cpu_up().

    Note, that fork_idle() is called via a workqueue, because this
    guarantees that the idle thread does not get a reference to a user
    space VM. This can happen when the boot process did not bring up all
    possible cpus and a later cpu_up() is initiated via the sysfs
    interface. In that case fork_idle() would be called in the context of
    the user space task and take a reference on the user space VM.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul E. McKenney
    Cc: Srivatsa S. Bhat
    Cc: Matt Turner
    Cc: Russell King
    Cc: Mike Frysinger
    Cc: Jesper Nilsson
    Cc: Richard Kuo
    Cc: Tony Luck
    Cc: Hirokazu Takata
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: James E.J. Bottomley
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Chris Metcalf
    Cc: Richard Weinberger
    Cc: x86@kernel.org
    Acked-by: Venkatesh Pallipadi
    Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de

    Thomas Gleixner
     
  • Start a new file, which will hold SMP and CPU hotplug related generic
    infrastructure.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul E. McKenney
    Cc: Srivatsa S. Bhat
    Cc: Matt Turner
    Cc: Russell King
    Cc: Mike Frysinger
    Cc: Jesper Nilsson
    Cc: Richard Kuo
    Cc: Tony Luck
    Cc: Hirokazu Takata
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: James E.J. Bottomley
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Chris Metcalf
    Cc: Richard Weinberger
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/20120420124557.035417523@linutronix.de

    Thomas Gleixner