21 Mar, 2012

3 commits

  • Pull timer changes for v3.4 from Ingo Molnar

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    ntp: Fix integer overflow when setting time
    math: Introduce div64_long
    cs5535-clockevt: Allow the MFGPT IRQ to be shared
    cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
    x86/time: Eliminate unused irq0_irqs counter
    clocksource: scx200_hrt: Fix the build
    x86/tsc: Reduce the TSC sync check time for core-siblings
    timer: Fix bad idle check on irq entry
    nohz: Remove ts->Einidle checks before restarting the tick
    nohz: Remove update_ts_time_stat from tick_nohz_start_idle
    clockevents: Leave the broadcast device in shutdown mode when not needed
    clocksource: Load the ACPI PM clocksource asynchronously
    clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
    clocksource: Get rid of clocksource_calc_mult_shift()
    clocksource: dbx500: convert to clocksource_register_hz()
    clocksource: scx200_hrt: use pr_ instead of printk
    time: Move common updates to a function
    time: Reorder so the hot data is together
    time: Remove most of xtime_lock usage in timekeeping.c
    ntp: Add ntp_lock to replace xtime_locking
    ...

    Linus Torvalds
     
  • Pull scheduler changes for v3.4 from Ingo Molnar

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    printk: Make it compile with !CONFIG_PRINTK
    sched/x86: Fix overflow in cyc2ns_offset
    sched: Fix nohz load accounting -- again!
    sched: Update yield() docs
    printk/sched: Introduce special printk_sched() for those awkward moments
    sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
    sched: Cleanup cpu_active madness
    sched: Fix load-balance wreckage
    sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
    sched: Ditch per cgroup task lists for load-balancing
    sched: Rename load-balancing fields
    sched: Move load-balancing arguments into helper struct
    sched/rt: Do not submit new work when PI-blocked
    sched/rt: Prevent idle task boosting
    sched/wait: Add __wake_up_all_locked() API
    sched/rt: Document scheduler related skip-resched-check sites
    sched/rt: Use schedule_preempt_disabled()
    sched/rt: Add schedule_preempt_disabled()
    sched/rt: Do not throttle when PI boosting
    sched/rt: Keep period timer ticking when rt throttling is active
    ...

    Linus Torvalds
     
  • Pull perf events changes for v3.4 from Ingo Molnar:

    - New "hardware based branch profiling" feature both on the kernel and
    the tooling side, on CPUs that support it. (modern x86 Intel CPUs
    with the 'LBR' hardware feature currently.)

    This new feature is basically a sophisticated 'magnifying glass' for
    branch execution - something that is pretty difficult to extract from
    regular, function histogram centric profiles.

    The simplest mode is activated via 'perf record -b', and the result
    looks like this in perf report:

    $ perf record -b any_call,u -e cycles:u branchy

    $ perf report -b --sort=symbol
    52.34% [.] main [.] f1
    24.04% [.] f1 [.] f3
    23.60% [.] f1 [.] f2
    0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
    0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
    0.01% [k] _IO_vfprintf_internal [k] strchrnul
    0.01% [k] __printf [k] _IO_vfprintf_internal
    0.01% [k] main [k] __printf

    This output shows from/to branch columns and shows the highest
    percentage (from,to) jump combinations - i.e. the most likely taken
    branches in the system. "branches" can also include function calls
    and any other synchronous and asynchronous transitions of the
    instruction pointer that are not 'next instruction' - such as system
    calls, traps, interrupts, etc.

    This feature comes with (hopefully intuitive) flat ascii and TUI
    support in perf report.

    - Various 'perf annotate' visual improvements for us assembly junkies.
    It will now recognize function calls in the TUI and by hitting enter
    you can follow the call (recursively) and back, amongst other
    improvements.

    - Multiple threads/processes recording support in perf record, perf
    stat, perf top - which is activated via a comma-list of PIDs:

    perf top -p 21483,21485
    perf stat -p 21483,21485 -ddd
    perf record -p 21483,21485

    - Support for per UID views, via the --uid paramter to perf top, perf
    report, etc. For example 'perf top --uid mingo' will only show the
    tasks that I am running, excluding other users, root, etc.

    - Jump label restructurings and improvements - this includes the
    factoring out of the (hopefully much clearer) include/linux/static_key.h
    generic facility:

    struct static_key key = STATIC_KEY_INIT_FALSE;

    ...

    if (static_key_false(&key))
    do unlikely code
    else
    do likely code

    ...
    static_key_slow_inc();
    ...
    static_key_slow_inc();
    ...

    The static_key_false() branch will be generated into the code with as
    little impact to the likely code path as possible. the
    static_key_slow_*() APIs flip the branch via live kernel code patching.

    This facility can now be used more widely within the kernel to
    micro-optimize hot branches whose likelihood matches the static-key
    usage and fast/slow cost patterns.

    - SW function tracer improvements: perf support and filtering support.

    - Various hardenings of the perf.data ABI, to make older perf.data's
    smoother on newer tool versions, to make new features integrate more
    smoothly, to support cross-endian recording/analyzing workflows
    better, etc.

    - Restructuring of the kprobes code, the splitting out of 'optprobes',
    and a corner case bugfix.

    - Allow the tracing of kernel console output (printk).

    - Improvements/fixes to user-space RDPMC support, allowing user-space
    self-profiling code to extract PMU counts without performing any
    system calls, while playing nice with the kernel side.

    - 'perf bench' improvements

    - ... and lots of internal restructurings, cleanups and fixes that made
    these features possible. And, as usual this list is incomplete as
    there were also lots of other improvements

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
    perf report: Fix annotate double quit issue in branch view mode
    perf report: Remove duplicate annotate choice in branch view mode
    perf/x86: Prettify pmu config literals
    perf report: Enable TUI in branch view mode
    perf report: Auto-detect branch stack sampling mode
    perf record: Add HEADER_BRANCH_STACK tag
    perf record: Provide default branch stack sampling mode option
    perf tools: Make perf able to read files from older ABIs
    perf tools: Fix ABI compatibility bug in print_event_desc()
    perf tools: Enable reading of perf.data files from different ABI rev
    perf: Add ABI reference sizes
    perf report: Add support for taken branch sampling
    perf record: Add support for sampling taken branch
    perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
    x86/kprobes: Split out optprobe related code to kprobes-opt.c
    x86/kprobes: Fix a bug which can modify kernel code permanently
    x86/kprobes: Fix instruction recovery on optimized path
    perf: Add callback to flush branch_stack on context switch
    perf: Disable PERF_SAMPLE_BRANCH_* when not supported
    perf/x86: Add LBR software filter support for Intel CPUs
    ...

    Linus Torvalds
     

06 Mar, 2012

1 commit


01 Mar, 2012

2 commits

  • Create a distinction between scheduler related preempt_enable_no_resched()
    calls and the nearly one hundred other places in the kernel that do not
    want to reschedule, for one reason or another.

    This distinction matters for -rt, where the scheduler and the non-scheduler
    preempt models (and checks) are different. For upstream it's purely
    documentational.

    Signed-off-by: Thomas Gleixner
    Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.org
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Coccinelle based conversion.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-24swm5zut3h9c4a6s46x8rws@git.kernel.org
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

15 Feb, 2012

1 commit

  • idle_cpu() is called on irq entry to guess if we need to call
    tick_check_idle(). This way we can catch up with jiffies if the tick
    was stopped, stop accounting idle time during the interrupt and
    maintain the sched clock if it is unstable.

    But if we are going to exit the idle loop to schedule a new task (ie:
    if we have a task in the runqueue or a remotely enqueued ttwu to
    perform), the idle_cpu() check will return 0 such that we miss the
    call to tick_check_idle() for all interrupts happening before we
    schedule the new task.

    As a result these interrupts and the softirqs coming along may deal
    with stale jiffies values, bad sched clock values, and won't substract
    their time from the idle time accounting.

    Fix this with using is_idle_task() instead that strictly checks that
    we are running the idle task, without caring about the fact we are
    going to schedule a task soon.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: John Stultz
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/1327427984-23282-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Thomas Gleixner

    Frederic Weisbecker
     

03 Feb, 2012

1 commit

  • The __raise_softirq_irqoff() contains a tracepoint. As tracepoints in headers
    can cause issues, and not to mention, bloats the kernel when they are
    in a static inline, it is best to move the function that contains the
    tracepoint out of the header and into softirq.c.

    Link: http://lkml.kernel.org/r/20120118120711.GB14863@elte.hu

    Suggested-by: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

12 Dec, 2011

2 commits

  • On the irq exit path, tick_nohz_irq_exit()
    may raise a softirq, which action leads to the wake up
    path and select_task_rq_fair() that makes use of rcu
    to iterate the domains.

    This is an illegal use of RCU because we may be in RCU
    extended quiescent state if we interrupted an RCU-idle
    window in the idle loop:

    [ 132.978883] ===============================
    [ 132.978883] [ INFO: suspicious RCU usage. ]
    [ 132.978883] -------------------------------
    [ 132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
    [ 132.978883]
    [ 132.978883] other info that might help us debug this:
    [ 132.978883]
    [ 132.978883]
    [ 132.978883] rcu_scheduler_active = 1, debug_locks = 0
    [ 132.978883] RCU used illegally from extended quiescent state!
    [ 132.978883] 2 locks held by swapper/0:
    [ 132.978883] #0: (&p->pi_lock){-.-.-.}, at: [] try_to_wake_up+0x39/0x2f0
    [ 132.978883] #1: (rcu_read_lock){.+.+..}, at: [] select_task_rq_fair+0x6a/0xec0
    [ 132.978883]
    [ 132.978883] stack backtrace:
    [ 132.978883] Pid: 0, comm: swapper Tainted: G W 3.0.0+ #178
    [ 132.978883] Call Trace:
    [ 132.978883] [] lockdep_rcu_suspicious+0xe6/0x100
    [ 132.978883] [] select_task_rq_fair+0x749/0xec0
    [ 132.978883] [] ? select_task_rq_fair+0x6a/0xec0
    [ 132.978883] [] ? do_raw_spin_lock+0x54/0x150
    [ 132.978883] [] ? trace_hardirqs_on+0xd/0x10
    [ 132.978883] [] try_to_wake_up+0xd3/0x2f0
    [ 132.978883] [] ? ktime_get+0x68/0xf0
    [ 132.978883] [] wake_up_process+0x15/0x20
    [ 132.978883] [] raise_softirq_irqoff+0x65/0x110
    [ 132.978883] [] __hrtimer_start_range_ns+0x415/0x5a0
    [ 132.978883] [] ? do_raw_spin_unlock+0x5e/0xb0
    [ 132.978883] [] hrtimer_start+0x18/0x20
    [ 132.978883] [] tick_nohz_stop_sched_tick+0x393/0x450
    [ 132.978883] [] irq_exit+0xd2/0x100
    [ 132.978883] [] do_IRQ+0x66/0xe0
    [ 132.978883] [] common_interrupt+0x13/0x13
    [ 132.978883] [] ? native_safe_halt+0xb/0x10
    [ 132.978883] [] ? trace_hardirqs_on+0xd/0x10
    [ 132.978883] [] default_idle+0xba/0x370
    [ 132.978883] [] amd_e400_idle+0x5e/0x130
    [ 132.978883] [] cpu_idle+0xb6/0x120
    [ 132.978883] [] rest_init+0xef/0x150
    [ 132.978883] [] ? rest_init+0x52/0x150
    [ 132.978883] [] start_kernel+0x3da/0x3e5
    [ 132.978883] [] x86_64_start_reservations+0x131/0x135
    [ 132.978883] [] x86_64_start_kernel+0x103/0x112

    Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • The tick_nohz_stop_sched_tick() function, which tries to delay
    the next timer tick as long as possible, can be called from two
    places:

    - From the idle loop to start the dytick idle mode
    - From interrupt exit if we have interrupted the dyntick
    idle mode, so that we reprogram the next tick event in
    case the irq changed some internal state that requires this
    action.

    There are only few minor differences between both that
    are handled by that function, driven by the ts->inidle
    cpu variable and the inidle parameter. The whole guarantees
    that we only update the dyntick mode on irq exit if we actually
    interrupted the dyntick idle mode, and that we enter in RCU extended
    quiescent state from idle loop entry only.

    Split this function into:

    - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
    dynticks idle mode unconditionally if it can, and enters into RCU
    extended quiescent state.

    - tick_nohz_irq_exit() which only updates the dynticks idle mode
    when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

    To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
    into tick_nohz_idle_exit().

    This simplifies the code and micro-optimize the irq exit path (no need
    for local_irq_save there). This also prepares for the split between
    dynticks and rcu extended quiescent state logics. We'll need this split to
    further fix illegal uses of RCU in extended quiescent states in the idle
    loop.

    Signed-off-by: Frederic Weisbecker
    Cc: Mike Frysinger
    Cc: Guan Xuetao
    Cc: David Miller
    Cc: Chris Metcalf
    Cc: Hans-Christian Egtvedt
    Cc: Ralf Baechle
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Russell King
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

21 Jul, 2011

1 commit

  • The rcu_read_unlock_special() function relies on in_irq() to exclude
    scheduler activity from interrupt level. This fails because exit_irq()
    can invoke the scheduler after clearing the preempt_count() bits that
    in_irq() uses to determine that it is at interrupt level. This situation
    can result in failures as follows:

    $task IRQ SoftIRQ

    rcu_read_lock()

    /* do stuff */

    |= UNLOCK_BLOCKED

    rcu_read_unlock()
    --t->rcu_read_lock_nesting

    irq_enter();
    /* do stuff, don't use RCU */
    irq_exit();
    sub_preempt_count(IRQ_EXIT_OFFSET);
    invoke_softirq()

    ttwu();
    spin_lock_irq(&pi->lock)
    rcu_read_lock();
    /* do stuff */
    rcu_read_unlock();
    rcu_read_unlock_special()
    rcu_report_exp_rnp()
    ttwu()
    spin_lock_irq(&pi->lock) /* deadlock */

    rcu_read_unlock_special(t);

    Ed can simply trigger this 'easy' because invoke_softirq() immediately
    does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
    first, but even without that the above happens.

    Cure this by also excluding softirqs from the
    rcu_read_unlock_special() handler and ensuring the force_irqthreads
    ksoftirqd/# wakeup is done from full softirq context.

    [ Alternatively, delaying the ->rcu_read_lock_nesting decrement
    until after the special handling would make the thing more robust
    in the face of interrupts as well. And there is a separate patch
    for that. ]

    Cc: Thomas Gleixner
    Reported-and-tested-by: Ed Tomlinson
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Peter Zijlstra
     

15 Jun, 2011

1 commit

  • Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
    introduced performance regression. In an AIM7 test, this commit degraded
    performance by about 40%.

    The commit runs rcu callbacks in a kthread instead of softirq. We observed
    high rate of context switch which is caused by this. Out test system has
    64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
    which is caused by RCU's per-CPU kthread. A trace showed that most of
    the time the RCU per-CPU kthread doesn't actually handle any callbacks,
    but instead just does a very small amount of work handling grace periods.
    This means that RCU's per-CPU kthreads are making the scheduler do quite
    a bit of work in order to allow a very small amount of RCU-related
    processing to be done.

    Alex Shi's analysis determined that this slowdown is due to lock
    contention within the scheduler. Unfortunately, as Peter Zijlstra points
    out, the scheduler's real-time semantics require global action, which
    means that this contention is inherent in real-time scheduling. (Yes,
    perhaps someone will come up with a workaround -- otherwise, -rt is not
    going to do well on large SMP systems -- but this patch will work around
    this issue in the meantime. And "the meantime" might well be forever.)

    This patch therefore re-introduces softirq processing to RCU, but only
    for core RCU work. RCU callbacks are still executed in kthread context,
    so that only a small amount of RCU work runs in softirq context in the
    common case. This should minimize ksoftirqd execution, allowing us to
    skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

    Signed-off-by: Shaohua Li
    Tested-by: "Alex,Shi"
    Signed-off-by: Paul E. McKenney

    Shaohua Li
     

06 May, 2011

1 commit

  • If RCU priority boosting is to be meaningful, callback invocation must
    be boosted in addition to preempted RCU readers. Otherwise, in presence
    of CPU real-time threads, the grace period ends, but the callbacks don't
    get invoked. If the callbacks don't get invoked, the associated memory
    doesn't get freed, so the system is still subject to OOM.

    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
    moves the callback invocations to a kthread, which can be boosted easily.

    Also add comments and properly synchronized all accesses to
    rcu_cpu_kthread_task, as suggested by Lai Jiangshan.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

31 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • ksoftirqd, kworker, migration, and pktgend kthreads can be created with
    kthread_create_on_node(), to get proper NUMA affinities for their stack and
    task_struct.

    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Reviewed-by: Andi Kleen
    Acked-by: Rusty Russell
    Acked-by: Tejun Heo
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: David Howells
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

16 Mar, 2011

1 commit

  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
    x86: Enable forced interrupt threading support
    x86: Mark low level interrupts IRQF_NO_THREAD
    x86: Use generic show_interrupts
    x86: ioapic: Avoid redundant lookup of irq_cfg
    x86: ioapic: Use new move_irq functions
    x86: Use the proper accessors in fixup_irqs()
    x86: ioapic: Use irq_data->state
    x86: ioapic: Simplify irq chip and handler setup
    x86: Cleanup the genirq name space
    genirq: Add chip flag to force mask on suspend
    genirq: Add desc->irq_data accessor
    genirq: Add comments to Kconfig switches
    genirq: Fixup fasteoi handler for oneshot mode
    genirq: Provide forced interrupt threading
    sched: Switch wait_task_inactive to schedule_hrtimeout()
    genirq: Add IRQF_NO_THREAD
    genirq: Allow shared oneshot interrupts
    genirq: Prepare the handling of shared oneshot interrupts
    genirq: Make warning in handle_percpu_event useful
    x86: ioapic: Move trigger defines to io_apic.h
    ...

    Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
    space changes clashing with the Xen cleanups. The set_irq_msi() had
    moved to xen_bind_pirq_msi_to_irq().

    Linus Torvalds
     

26 Feb, 2011

1 commit

  • Add a commandline parameter "threadirqs" which forces all interrupts except
    those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
    allow retrieving better debug data from crashing interrupt handlers. If
    "threadirqs" is not enabled on the kernel command line, then there is no
    impact in the interrupt hotpath.

    Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
    marking the interrupts which cant be threaded IRQF_NO_THREAD. All
    interrupts which have IRQF_TIMER set are implict marked
    IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

    Forced threading hard interrupts also forces all soft interrupt
    handling into thread context.

    When enabled it might slow down things a bit, but for debugging problems in
    interrupt code it's a reasonable penalty as it does not immediately
    crash and burn the machine when an interrupt handler is buggy.

    Some test results on a Core2Duo machine:

    Cache cold run of:
    # time git grep irq_desc

    non-threaded threaded
    real 1m18.741s 1m19.061s
    user 0m1.874s 0m1.757s
    sys 0m5.843s 0m5.427s

    # iperf -c server
    non-threaded
    [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec
    threaded
    [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    LKML-Reference:

    Thomas Gleixner
     

09 Feb, 2011

1 commit

  • ksoftirqd() calls do_softirq() which switches stacks on several
    architectures. That makes no sense at all. ksoftirqd's stack is
    sufficient.

    Call __do_softirq() directly.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Acked-by: David Miller
    Cc: Paul Mundt
    Reviewed-by: Frank Rowand
    LKML-Reference:

    Thomas Gleixner
     

26 Jan, 2011

1 commit


14 Jan, 2011

1 commit

  • For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
    USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
    don't provide their own implementions.

    Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
    kernel/softirq.c.

    For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only
    on_each_cpu() is compiled.

    Signed-off-by: Amerigo Wang
    Cc: David Howells
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     

08 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
    gameport: use this_cpu_read instead of lookup
    x86: udelay: Use this_cpu_read to avoid address calculation
    x86: Use this_cpu_inc_return for nmi counter
    x86: Replace uses of current_cpu_data with this_cpu ops
    x86: Use this_cpu_ops to optimize code
    vmstat: User per cpu atomics to avoid interrupt disable / enable
    irq_work: Use per cpu atomics instead of regular atomics
    cpuops: Use cmpxchg for xchg to avoid lock semantics
    x86: this_cpu_cmpxchg and this_cpu_xchg operations
    percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
    percpu,x86: relocate this_cpu_add_return() and friends
    connector: Use this_cpu operations
    xen: Use this_cpu_inc_return
    taskstats: Use this_cpu_ops
    random: Use this_cpu_inc_return
    fs: Use this_cpu_inc_return in buffer.c
    highmem: Use this_cpu_xx_return() operations
    vmstat: Use this_cpu_inc_return for vm statistics
    x86: Support for this_cpu_add, sub, dec, inc_return
    percpu: Generic support for this_cpu_add, sub, dec, inc_return
    ...

    Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
    as per Tejun.

    Linus Torvalds
     

07 Jan, 2011

1 commit


17 Dec, 2010

1 commit

  • __get_cpu_var() can be replaced with this_cpu_read and will then use a
    single read instruction with implied address calculation to access the
    correct per cpu instance.

    However, the address of a per cpu variable passed to __this_cpu_read()
    cannot be determined (since it's an implied address conversion through
    segment prefixes). Therefore apply this only to uses of __get_cpu_var
    where the address of the variable is not used.

    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: Thomas Gleixner
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

18 Nov, 2010

1 commit


28 Oct, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf python scripting: Add futex-contention script
    perf python scripting: Fixup cut'n'paste error in sctop script
    perf scripting: Shut up 'perf record' final status
    perf record: Remove newline character from perror() argument
    perf python scripting: Support fedora 11 (audit 1.7.17)
    perf python scripting: Improve the syscalls-by-pid script
    perf python scripting: print the syscall name on sctop
    perf python scripting: Improve the syscalls-counts script
    perf python scripting: Improve the failed-syscalls-by-pid script
    kprobes: Remove redundant text_mutex lock in optimize
    x86/oprofile: Fix uninitialized variable use in debug printk
    tracing: Fix 'faild' -> 'failed' typo
    perf probe: Fix format specified for Dwarf_Off parameter
    perf trace: Fix detection of script extension
    perf trace: Use $PERF_EXEC_PATH in canned report scripts
    perf tools: Document event modifiers
    perf tools: Remove direct slang.h include
    perf_events: Fix for transaction recovery in group_sched_in()
    perf_events: Revert: Fix transaction recovery in group_sched_in()
    perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
    ...

    Linus Torvalds
     

24 Oct, 2010

1 commit


23 Oct, 2010

2 commits

  • Andrew Morton pointed out almost all sched_setscheduler() callers are
    using fixed parameters and can be converted to static. It reduces runtime
    memory use a little.

    Signed-off-by: KOSAKI Motohiro
    Reported-by: Andrew Morton
    Acked-by: James Morris
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     
  • … 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    softirqs: Make wakeup_softirqd static

    * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, asm: Restore parentheses around one pushl_cfi argument
    x86, asm: Fix ancient-GAS workaround
    x86, asm: Fix CFI macro invocations to deal with shortcomings in gas

    * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

    * 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: HPET force enable for CX700 / VIA Epia LT

    * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, setup: Use string copy operation to optimze copy in kernel compression

    * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()

    * 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.

    Linus Torvalds
     

22 Oct, 2010

1 commit

  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
    apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
    apic, x86: Check if EILVT APIC registers are available (AMD only)
    x86: ioapic: Call free_irte only if interrupt remapping enabled
    arm: Use ARCH_IRQ_INIT_FLAGS
    genirq, ARM: Fix boot on ARM platforms
    genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
    x86: Switch sparse_irq allocations to GFP_KERNEL
    genirq: Switch sparse_irq allocator to GFP_KERNEL
    genirq: Make sparse_lock a mutex
    x86: lguest: Use new irq allocator
    genirq: Remove the now unused sparse irq leftovers
    genirq: Sanitize dynamic irq handling
    genirq: Remove arch_init_chip_data()
    x86: xen: Sanitise sparse_irq handling
    x86: Use sane enumeration
    x86: uv: Clean up the direct access to irq_desc
    x86: Make io_apic.c local functions static
    genirq: Remove irq_2_iommu
    x86: Speed up the irq_remapped check in hot pathes
    intr_remap: Simplify the code further
    ...

    Fix up trivial conflicts in arch/x86/Kconfig

    Linus Torvalds
     

21 Oct, 2010

1 commit

  • With the addition of trace_softirq_raise() the softirq tracepoint got
    even more convoluted. Why the tracepoints take two pointers to assign
    an integer is beyond my comprehension.

    But adding an extra case which treats the first pointer as an unsigned
    long when the second pointer is NULL including the back and forth
    type casting is just horrible.

    Convert the softirq tracepoints to take a single unsigned int argument
    for the softirq vector number and fix the call sites.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Acked-by: Peter Zijlstra
    Acked-by: mathieu.desnoyers@efficios.com
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt

    Thomas Gleixner
     

19 Oct, 2010

3 commits

  • When CPU is idle and on first interrupt, irq_enter calls tick_check_idle()
    to notify interruption from idle. But, there is a problem if this call
    is done after __irq_enter, as all routines in __irq_enter may find
    stale time due to yet to be done tick_check_idle.

    Specifically, trace calls in __irq_enter when they use global clock and also
    account_system_vtime change in this patch as it wants to use sched_clock_cpu()
    to do proper irq timing.

    But, tick_check_idle was moved after __irq_enter intentionally to
    prevent problem of unneeded ksoftirqd wakeups by the commit ee5f80a:

    irq: call __irq_enter() before calling the tick_idle_check
    Impact: avoid spurious ksoftirqd wakeups

    Moving tick_check_idle() before __irq_enter and wrapping it with
    local_bh_enable/disable would solve both the problems.

    Fixed-by: Yong Zhang
    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     
  • To account softirq time cleanly in scheduler, we need to identify whether
    softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
    Add PF_KSOFTIRQD for that purpose.

    As all PF flag bits are currently taken, create space by moving one of the
    infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
    with some other state fields.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     
  • Peter Zijlstra found a bug in the way softirq time is accounted in
    VIRT_CPU_ACCOUNTING on this thread:

    http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

    The problem is, softirq processing uses local_bh_disable internally. There
    is no way, later in the flow, to differentiate between whether softirq is
    being processed or is it just that bh has been disabled. So, a hardirq when bh
    is disabled results in time being wrongly accounted as softirq.

    Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
    as well. As account_system_time() in normal tick based accouting also uses
    softirq_count, which will be set even when not in softirq with bh disabled.

    Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
    for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
    processing. The patch below does that and adds API in_serving_softirq() which
    returns whether we are currently processing softirq or not.

    Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
    to in_serving_softirq.

    Looks like many usages of in_softirq really want in_serving_softirq. Those
    changes can be made individually on a case by case basis.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     

12 Oct, 2010

2 commits


22 Sep, 2010

1 commit


05 Jun, 2010

1 commit

  • The commit 80b5184cc537718122e036afe7e62d202b70d077 ("kernel/: convert cpu
    notifier to return encapsulate errno value") changed the return value of
    cpu notifier callbacks.

    Those callbacks don't return NOTIFY_BAD on failures anymore. But there
    are a few callbacks which are called directly at init time and checking
    the return value.

    I forgot to change BUG_ON checking by the direct callers in the commit.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

28 May, 2010

1 commit


11 May, 2010

1 commit

  • The addition of preemptible RCU to treercu resulted in a bit of
    confusion and inefficiency surrounding the handling of context switches
    for RCU-sched and for RCU-preempt. For RCU-sched, a context switch
    is a quiescent state, pure and simple, just like it always has been.
    For RCU-preempt, a context switch is in no way a quiescent state, but
    special handling is required when a task blocks in an RCU read-side
    critical section.

    However, the callout from the scheduler and the outer loop in ksoftirqd
    still calls something named rcu_sched_qs(), whose name is no longer
    accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched
    quiescent state, it ends up unnecessarily (though harmlessly, aside
    from the performance hit) enqueuing the current task if it happens to
    be running in an RCU-preempt read-side critical section. This not only
    increases the maximum latency of scheduler_tick(), it also needlessly
    increases the overhead of the next outermost rcu_read_unlock() invocation.

    This patch addresses this situation by separating the notion of RCU's
    context-switch handling from that of RCU-sched's quiescent states.
    The context-switch handling is covered by rcu_note_context_switch() in
    general and by rcu_preempt_note_context_switch() for preemptible RCU.
    This permits rcu_sched_qs() to handle quiescent states and only quiescent
    states. It also reduces the maximum latency of scheduler_tick(), though
    probably by much less than a microsecond. Finally, it means that tasks
    within preemptible-RCU read-side critical sections avoid incurring the
    overhead of queuing unless there really is a context switch.

    Suggested-by: Lai Jiangshan
    Acked-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Paul E. McKenney