25 Sep, 2013

5 commits

  • Remove the bloat of the C calling convention out of the
    preempt_enable() sites by creating an ASM wrapper which allows us to
    do an asm("call ___preempt_schedule") instead.

    calling.h bits by Andi Kleen

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
    [ Fixed build error. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Rewrite the preempt_count macros in order to extract the 3 basic
    preempt_count value modifiers:

    __preempt_count_add()
    __preempt_count_sub()

    and the new:

    __preempt_count_dec_and_test()

    And since we're at it anyway, replace the unconventional
    $op_preempt_count names with the more conventional preempt_count_$op.

    Since these basic operators are equivalent to the previous _notrace()
    variants, do away with the _notrace() versions.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order to prepare to per-arch implementations of preempt_count move
    the required bits into an asm-generic header and use this for all
    archs.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order to combine the preemption and need_resched test we need to
    fold the need_resched information into the preempt_count value.

    Since the NEED_RESCHED flag is set across CPUs this needs to be an
    atomic operation, however we very much want to avoid making
    preempt_count atomic, therefore we keep the existing TIF_NEED_RESCHED
    infrastructure in place but at 3 sites test it and fold its value into
    preempt_count; namely:

    - resched_task() when setting TIF_NEED_RESCHED on the current task
    - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
    remote task it follows it up with a reschedule IPI
    and we can modify the cpu local preempt_count from
    there.
    - cpu_idle_loop() for when resched_task() found tsk_is_polling().

    We use an inverted bitmask to indicate need_resched so that a 0 means
    both need_resched and !atomic.

    Also remove the barrier() in preempt_enable() between
    preempt_enable_no_resched() and preempt_check_resched() to avoid
    having to reload the preemption value and allow the compiler to use
    the flags of the previuos decrement. I couldn't come up with any sane
    reason for this barrier() to be there as preempt_enable_no_resched()
    already has a barrier() before doing the decrement.

    Suggested-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-7a7m5qqbn5pmwnd4wko9u6da@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Replace the single preempt_count() 'function' that's an lvalue with
    two proper functions:

    preempt_count() - returns the preempt_count value as rvalue
    preempt_count_set() - Allows setting the preempt-count value

    Also provide preempt_count_ptr() as a convenience wrapper to implement
    all modifying operations.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-orxrbycjozopqfhb4dxdkdvb@git.kernel.org
    [ Fixed build failure. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Jun, 2013

1 commit

  • Dave Jones hit the following bug report:

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.10.0-rc2+ #1 Not tainted
    -------------------------------
    include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
    other info that might help us debug this:
    RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
    RCU used illegally from extended quiescent state!
    2 locks held by cc1/63645:
    #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xed/0x9b0
    #1: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0x5/0x1f0

    CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
    Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
    0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
    ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
    0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] lockdep_rcu_suspicious+0xfd/0x130
    [] cpuacct_charge+0x185/0x1f0
    [] ? cpuacct_charge+0x5/0x1f0
    [] update_curr+0xec/0x240
    [] put_prev_task_fair+0x228/0x480
    [] __schedule+0x161/0x9b0
    [] preempt_schedule+0x51/0x80
    [] ? __cond_resched_softirq+0x60/0x60
    [] ? retint_careful+0x12/0x2e
    [] ftrace_ops_control_func+0x1dc/0x210
    [] ftrace_call+0x5/0x2f
    [] ? retint_careful+0xb/0x2e
    [] ? schedule_user+0x5/0x70
    [] ? schedule_user+0x5/0x70
    [] ? retint_careful+0x12/0x2e
    ------------[ cut here ]------------

    What happened was that the function tracer traced the schedule_user() code
    that tells RCU that the system is coming back from userspace, and to
    add the CPU back to the RCU monitoring.

    Because the function tracer does a preempt_disable/enable_notrace() calls
    the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
    then preempt_schedule() is called. But this is called before the user_exit()
    function can inform the kernel that the CPU is no longer in user mode and
    needs to be accounted for by RCU.

    The fix is to create a new preempt_schedule_context() that checks if
    the kernel is still in user mode and if so to switch it to kernel mode
    before calling schedule. It also switches back to user mode coming back
    from schedule in need be.

    The only user of this currently is the preempt_enable_notrace(), which is
    only used by the tracing subsystem.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

10 Apr, 2013

1 commit

  • In UP and non-preempt respectively, the spinlocks and preemption
    disable/enable points are stubbed out entirely, because there is no
    regular code that can ever hit the kind of concurrency they are meant to
    protect against.

    However, while there is no regular code that can cause scheduling, we
    _do_ end up having some exceptional (literally!) code that can do so,
    and that we need to make sure does not ever get moved into the critical
    region by the compiler.

    In particular, get_user() and put_user() is generally implemented as
    inline asm statements (even if the inline asm may then make a call
    instruction to call out-of-line), and can obviously cause a page fault
    and IO as a result. If that inline asm has been scheduled into the
    middle of a preemption-safe (or spinlock-protected) code region, we
    obviously lose.

    Now, admittedly this is *very* unlikely to actually ever happen, and
    we've not seen examples of actual bugs related to this. But partly
    exactly because it's so hard to trigger and the resulting bug is so
    subtle, we should be extra careful to get this right.

    So make sure that even when preemption is disabled, and we don't have to
    generate any actual *code* to explicitly tell the system that we are in
    a preemption-disabled region, we need to at least tell the compiler not
    to move things around the critical region.

    This patch grew out of the same discussion that caused commits
    79e5f05edcbf ("ARC: Add implicit compiler barrier to raw_local_irq*
    functions") and 3e2e0d2c222b ("tile: comment assumption about
    __insn_mtspr for ") to come about.

    Note for stable: use discretion when/if applying this. As mentioned,
    this bug may never have actually bitten anybody, and gcc may never have
    done the required code motion for it to possibly ever trigger in
    practice.

    Cc: stable@vger.kernel.org
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Mar, 2012

1 commit

  • Create a distinction between scheduler related preempt_enable_no_resched()
    calls and the nearly one hundred other places in the kernel that do not
    want to reschedule, for one reason or another.

    This distinction matters for -rt, where the scheduler and the non-scheduler
    preempt models (and checks) are different. For upstream it's purely
    documentational.

    Signed-off-by: Thomas Gleixner
    Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.org
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

10 Jun, 2011

1 commit

  • Create a new CONFIG_PREEMPT_COUNT that handles the inc/dec
    of preempt count offset independently. So that the offset
    can be updated by preempt_disable() and preempt_enable()
    even without the need for CONFIG_PREEMPT beeing set.

    This prepares to make CONFIG_DEBUG_SPINLOCK_SLEEP working
    with !CONFIG_PREEMPT where it currently doesn't detect
    code that sleeps inside explicit preemption disabled
    sections.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Frederic Weisbecker
     

02 Dec, 2009

1 commit

  • 498657a478c60be092208422fefa9c7b248729c2 incorrectly assumed
    that preempt wasn't disabled around context_switch() and thus
    was fixing imaginary problem. It also broke KVM because it
    depended on ->sched_in() to be called with irq enabled so that
    it can do smp calls from there.

    Revert the incorrect commit and add comment describing different
    contexts under with the two callbacks are invoked.

    Avi: spotted transposed in/out in the added comment.

    Signed-off-by: Tejun Heo
    Acked-by: Avi Kivity
    Cc: peterz@infradead.org
    Cc: efault@gmx.de
    Cc: rusty@rustcorp.com.au
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

24 May, 2008

2 commits

  • Add preempt off timings. A lot of kernel core code is taken from the RT patch
    latency trace that was written by Ingo Molnar.

    This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers

    Now instead of just tracing irqs off, preemption off can be selected
    to be recorded.

    When this is selected, it shares the same files as irqs off timings.
    One can either trace preemption off, irqs off, or one or the other off.

    By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
    of preempt off only is performed. "irqsoff" will only record the time
    irqs are disabled, but "preemptirqsoff" will take the total time irqs
    or preemption are disabled. Runtime switching of these options is now
    supported by simpling echoing in the appropriate trace name into
    /debugfs/tracing/current_tracer.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • The tracer may need to call preempt_enable and disable functions
    for time keeping and such. The trace gets ugly when we see these
    functions show up for all traces. To make the output cleaner
    this patch adds preempt_enable_notrace and preempt_disable_notrace
    to be used by tracer (and debugging) functions.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     

09 Feb, 2008

1 commit


26 Jul, 2007

1 commit

  • This adds a general mechanism whereby a task can request the scheduler to
    notify it whenever it is preempted or scheduled back in. This allows the
    task to swap any special-purpose registers like the fpu or Intel's VT
    registers.

    Signed-off-by: Avi Kivity
    [ mingo@elte.hu: fixes, cleanups ]
    Signed-off-by: Ingo Molnar

    Avi Kivity
     

26 Apr, 2006

1 commit


23 Dec, 2005

1 commit

  • Currently a simple

    void foo(void) { preempt_enable(); }

    produces the following code on ARM:

    foo:
    bic r3, sp, #8128
    bic r3, r3, #63
    ldr r2, [r3, #4]
    ldr r1, [r3, #0]
    sub r2, r2, #1
    tst r1, #4
    str r2, [r3, #4]
    blne preempt_schedule
    mov pc, lr

    The problem is that the TIF_NEED_RESCHED flag is loaded _before_ the
    preemption count is stored back, hence any interrupt coming within that
    3 instruction window causing TIF_NEED_RESCHED to be set won't be
    seen and scheduling won't happen as it should.

    Nothing currently prevents gcc from performing that reordering. There
    is already a barrier() before the decrement of the preemption count, but
    another one is needed between this and the TIF_NEED_RESCHED flag test
    for proper code ordering.

    Signed-off-by: Nicolas Pitre
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nicolas Pitre
     

14 Nov, 2005

1 commit

  • a) in smp_lock.h #include of sched.h and spinlock.h moved under #ifdef
    CONFIG_LOCK_KERNEL.

    b) interrupt.h now explicitly pulls sched.h (not via smp_lock.h from
    hardirq.h as it used to)

    c) in three more places we need changes to compensate for (a) - one place
    in arch/sparc needs string.h now, hardirq.h needs forward declaration of
    task_struct and preempt.h needs direct include of thread_info.h.

    d) thread_info-related helpers in sched.h and thread_info.h put under
    ifndef __HAVE_THREAD_FUNCTIONS. Obviously safe.

    Signed-off-by: Al Viro
    Signed-off-by: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds