11 Sep, 2010

1 commit


10 Sep, 2010

9 commits

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
    perf symbols: Fix multiple initialization of symbol system
    perf: Fix CPU hotplug
    perf, trace: Fix module leak
    tracing/kprobe: Fix handling of C-unlike argument names
    tracing/kprobes: Fix handling of argument names
    perf probe: Fix handling of arguments names
    perf probe: Fix return probe support
    tracing/kprobe: Fix a memory leak in error case
    tracing: Do not allow llseek to set_ftrace_filter

    Linus Torvalds
     
  • Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
    having properly started the iterator to iterate the hash. This case is
    degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
    to misuse a pointer. This causes a NULL ptr deref with possible security
    implications. Tracked as CVE-2010-3079.

    Cc: Robert Swiecki
    Cc: Eugene Teo
    Cc:
    Signed-off-by: Chris Wright
    Signed-off-by: Steven Rostedt

    Chris Wright
     
  • Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf
    "hibernation: freeze swap at hibernation". It complicated matters by
    adding a second swap allocation path, just for hibernation; without in any
    way fixing the issue that it was intended to address - page reclaim after
    fixing the hibernation image might free swap from a page already imaged as
    swapcache, letting its swap be reallocated to store a different page of
    the image: resulting in data corruption if the imaged page were freed as
    clean then swapped back in. Pages freed to si->swap_map were still in
    danger of being reallocated by the alternative allocation path.

    I guess it inadvertently fixed slow SSD swap allocation for hibernation,
    as reported by Nigel Cunningham: by missing out the discards that occur on
    the usual swap allocation path; but that was unintentional, and needs a
    separate fix.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: "Rafael J. Wysocki"
    Cc: Ondrej Zary
    Cc: Andrea Gelmini
    Cc: Balbir Singh
    Cc: Andrea Arcangeli
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • gid_t is a unsigned int. If group_info contains a gid greater than
    MAX_INT, groups_search() function may look on the wrong side of the search
    tree.

    This solves some unfair "permission denied" problems.

    Signed-off-by: Jerome Marchand
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     
  • Add cgroup_attach_task_all()

    The existing cgroup_attach_task_current_cg() API is called by a thread to
    attach another thread to all of its cgroups; this is unsuitable for cases
    where a privileged task wants to attach itself to the cgroups of a less
    privileged one, since the call must be made from the context of the target
    task.

    This patch adds a more generic cgroup_attach_task_all() API that allows
    both the source task and to-be-moved task to be specified.
    cgroup_attach_task_current_cg() becomes a specialization of the more
    generic new function.

    [menage@google.com: rewrote changelog]
    [akpm@linux-foundation.org: address reviewer comments]
    Signed-off-by: Michael S. Tsirkin
    Tested-by: Alex Williamson
    Acked-by: Paul Menage
    Cc: Li Zefan
    Cc: Ben Blum
    Cc: Sridhar Samudrala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael S. Tsirkin
     
  • The gcov-kernel infrastructure expects that each object file is loaded
    only once. This may not be true, e.g. when loading multiple kernel
    modules which are linked to the same object file. As a result, loading
    such kernel modules will result in incorrect gcov results while unloading
    will cause a null-pointer dereference.

    This patch fixes these problems by changing the gcov-kernel infrastructure
    so that multiple profiling data sets can be associated with one debugfs
    entry. It applies to 2.6.36-rc1.

    Signed-off-by: Peter Oberparleiter
    Reported-by: Werner Spies
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Oberparleiter
     
  • Currently sched_avg_update() (which updates rt_avg stats in the rq)
    is getting called from scale_rt_power() (in the load balance context)
    which doesn't take rq->lock.

    Fix it by moving the sched_avg_update() to more appropriate
    update_cpu_load() where the CFS load gets updated as well.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • Since we have UP_PREPARE, we should also have UP_CANCELED.

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Commit 1c024eca (perf, trace: Optimize tracepoints by using
    per-tracepoint-per-cpu hlist to track events) caused a module
    refcount leak.

    Reported-And-Tested-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

09 Sep, 2010

3 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    gcc-4.6: kernel/*: Fix unused but set warnings
    mutex: Fix annotations to include it in kernel-locking docbook
    pid: make setpgid() system call use RCU read-side critical section
    MAINTAINERS: Add RCU's public git tree

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf, x86: Try to handle unknown nmis with an enabled PMU
    perf, x86: Fix handle_irq return values
    perf, x86: Fix accidentally ack'ing a second event on intel perf counter
    oprofile, x86: fix init_sysfs() function stub
    lockup_detector: Sync touch_*_watchdog back to old semantics
    tracing: Fix a race in function profile
    oprofile, x86: fix init_sysfs error handling
    perf_events: Fix time tracking for events with pid != -1 and cpu != -1
    perf: Initialize callchains roots's childen hits
    oprofile: fix crash when accessing freed task structs

    Linus Torvalds
     
  • Reading the file set_ftrace_filter does three things.

    1) shows whether or not filters are set for the function tracer
    2) shows what functions are set for the function tracer
    3) shows what triggers are set on any functions

    3 is independent from 1 and 2.

    The way this file currently works is that it is a state machine,
    and as you read it, it may change state. But this assumption breaks
    when you use lseek() on the file. The state machine gets out of sync
    and the t_show() may use the wrong pointer and cause a kernel oops.

    Luckily, this will only kill the app that does the lseek, but the app
    dies while holding a mutex. This prevents anyone else from using the
    set_ftrace_filter file (or any other function tracing file for that matter).

    A real fix for this is to rewrite the code, but that is too much for
    a -rc release or stable. This patch simply disables llseek on the
    set_ftrace_filter() file for now, and we can do the proper fix for the
    next major release.

    Reported-by: Robert Swiecki
    Cc: Chris Wright
    Cc: Tavis Ormandy
    Cc: Eugene Teo
    Cc: vendor-sec@lst.de
    Cc:
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Sep, 2010

4 commits

  • Check the argument name whether it is invalid (not C-like symbol name). This
    makes event format simple.

    Reported-by: Srikar Dronamraju
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Set "argN" name for each argument automatically if it has no specified name.
    Since dynamic trace event(kprobe_events) accepts special characters for its
    argument, its format can show those special characters (e.g. '$', '%', '+').
    However, perf can't parse those format because of the character (especially
    '%') mess up the format. This sets "argX" name for those arguments if user
    omitted the argument names.

    E.g.
    # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events
    # cat tracing/kprobe_events
    p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack

    Reported-by: Srikar Dronamraju
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Fix a memory leak which happens when a field name conflicts with others. In
    error case, free_trace_probe() will free all arguments until nr_args, so this
    increments nr_args the begining of the loop instead of the end.

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    LKML-Reference:
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: use zalloc_cpumask_var() for gcwq->mayday_mask
    workqueue: fix GCWQ_DISASSOCIATED initialization
    workqueue: Add a workqueue chapter to the tracepoint docbook
    workqueue: fix cwq->nr_active underflow
    workqueue: improve destroy_workqueue() debuggability
    workqueue: mark lock acquisition on worker_maybe_bind_and_lock()
    workqueue: annotate lock context change
    workqueue: free rescuer on destroy_workqueue

    Linus Torvalds
     

05 Sep, 2010

1 commit


03 Sep, 2010

1 commit


01 Sep, 2010

3 commits

  • During my rewrite, the semantics of touch_nmi_watchdog and
    touch_softlockup_watchdog changed enough to break some drivers
    (mostly over preemptable regions).

    These are cases where long delays on one CPU (due to
    print_delay for example) can cause long delays on other
    CPUs - so we must 'touch' the nmi_watchdog flag of those
    other CPUs as well.

    This change brings those touch_*_watchdog() functions back in line
    with to how they used to work.

    Signed-off-by: Don Zickus
    Acked-by: Cyrill Gorcunov
    Cc: peterz@infradead.org
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Don Zickus
     
  • [ 23.584719]
    [ 23.584720] ===================================================
    [ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
    [ 23.585176] ---------------------------------------------------
    [ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
    [ 23.585176]
    [ 23.585176] other info that might help us debug this:
    [ 23.585176]
    [ 23.585176]
    [ 23.585176] rcu_scheduler_active = 1, debug_locks = 1
    [ 23.585176] 1 lock held by rc.sysinit/728:
    [ 23.585176] #0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x5f/0x193
    [ 23.585176]
    [ 23.585176] stack backtrace:
    [ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
    [ 23.585176] Call Trace:
    [ 23.585176] [] lockdep_rcu_dereference+0x99/0xa2
    [ 23.585176] [] find_task_by_pid_ns+0x50/0x6a
    [ 23.585176] [] find_task_by_vpid+0x1d/0x1f
    [ 23.585176] [] sys_setpgid+0x67/0x193
    [ 23.585176] [] system_call_fastpath+0x16/0x1b
    [ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

    It turns out that the setpgid() system call fails to enter an RCU
    read-side critical section before doing a PID-to-task_struct translation.
    This commit therefore does rcu_read_lock() before the translation, and
    also does rcu_read_unlock() after the last use of the returned pointer.

    Reported-by: Andrew Morton
    Signed-off-by: Paul E. McKenney
    Acked-by: David Howells

    Paul E. McKenney
     
  • While we are reading trace_stat/functionX and someone just
    disabled function_profile at that time, we can trigger this:

    divide error: 0000 [#1] PREEMPT SMP
    ...
    EIP is at function_stat_show+0x90/0x230
    ...

    This fix just takes the ftrace_profile_lock and checks if
    rec->counter is 0. If it's 0, we know the profile buffer
    has been reset.

    Signed-off-by: Li Zefan
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

31 Aug, 2010

2 commits

  • alloc_mayday_mask() was using alloc_cpumask_var() making
    gcwq->mayday_mask contain garbage after initialization on
    CONFIG_CPUMASK_OFFSTACK=y configurations. This combined with the
    previously fixed GCWQ_DISASSOCIATED initialization bug could make
    rescuers fall into infinite loop trying to bind to an offline cpu.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian

    Tejun Heo
     
  • init_workqueues() incorrectly marks workqueues for all possible CPUs
    associated. Combined with mayday_mask initialization bug, this can
    make rescuers keep trying to bind to an offline gcwq indefinitely.
    Fix init_workqueues() such that only online CPUs have their gcwqs have
    GCWQ_DISASSOCIATED cleared.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian

    Tejun Heo
     

30 Aug, 2010

1 commit

  • Per-thread events with a cpu filter, i.e., cpu != -1, were not
    reporting correct timings when the thread never ran on the
    monitored cpu. The time enabled was reported as a negative
    value.

    This patch fixes the problem by updating tstamp_stopped,
    tstamp_running in event_sched_out() for events with filters and
    which are marked as INACTIVE.

    The function group_sched_out() is modified to systematically
    call into event_sched_out() to avoid duplicating the timing
    adjustment code twice.

    With the patch, I now get:

    $ task_cpu -i -e unhalted_core_cycles,unhalted_core_cycles
    noploop 2 noploop for 2 seconds
    CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0)
    CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0)

    CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0)
    CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0)

    CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0)
    CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0)

    CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594)
    CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594)

    Signed-off-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Cc: paulus@samba.org
    Cc: davem@davemloft.net
    Cc: fweisbec@gmail.com
    Cc: perfmon2-devel@lists.sf.net
    Cc: eranian@google.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

29 Aug, 2010

2 commits


27 Aug, 2010

1 commit


26 Aug, 2010

1 commit


25 Aug, 2010

7 commits

  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
    sched: Fix rq->clock synchronization when migrating tasks

    Linus Torvalds
     
  • save_stack_trace() stores the instruction pointer, not the
    function descriptor. On ppc64 the trace stack code currently
    dereferences the instruction pointer and shows 8 bytes of
    instructions in our backtraces:

    # cat /sys/kernel/debug/tracing/stack_trace
    Depth Size Location (26 entries)
    ----- ---- --------
    0) 5424 112 0x6000000048000004
    1) 5312 160 0x60000000ebad01b0
    2) 5152 160 0x2c23000041c20030
    3) 4992 240 0x600000007c781b79
    4) 4752 160 0xe84100284800000c
    5) 4592 192 0x600000002fa30000
    6) 4400 256 0x7f1800347b7407e0
    7) 4144 208 0xe89f0108f87f0070
    8) 3936 272 0xe84100282fa30000

    Since we aren't dealing with function descriptors, use %pS
    instead of %pF to fix it:

    # cat /sys/kernel/debug/tracing/stack_trace
    Depth Size Location (26 entries)
    ----- ---- --------
    0) 5424 112 ftrace_call+0x4/0x8
    1) 5312 160 .current_io_context+0x28/0x74
    2) 5152 160 .get_io_context+0x48/0xa0
    3) 4992 240 .cfq_set_request+0x94/0x4c4
    4) 4752 160 .elv_set_request+0x60/0x84
    5) 4592 192 .get_request+0x2d4/0x468
    6) 4400 256 .get_request_wait+0x7c/0x258
    7) 4144 208 .__make_request+0x49c/0x610
    8) 3936 272 .generic_make_request+0x390/0x434

    Signed-off-by: Anton Blanchard
    Cc: rostedt@goodmis.org
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     
  • cwq->nr_active is used to keep track of how many work items are active
    for the cpu workqueue, where 'active' is defined as either pending on
    global worklist or executing. This is used to implement the
    max_active limit and workqueue freezing. If a work item is queued
    after nr_active has already reached max_active, the work item doesn't
    increment nr_active and is put on the delayed queue and gets activated
    later as previous active work items retire.

    try_to_grab_pending() which is used in the cancellation path
    unconditionally decremented nr_active whether the work item being
    cancelled is currently active or delayed, so cancelling a delayed work
    item makes nr_active underflow. This breaks max_active enforcement
    and triggers BUG_ON() in destroy_workqueue() later on.

    This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
    is set while a work item in on the delayed list and making
    try_to_grab_pending() decrement nr_active iff the work item is
    currently active.

    The addition of the flag enlarges cwq alignment to 256 bytes which is
    getting a bit too large. It's scheduled to be reduced back to 128
    bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
    devel cycle.

    Signed-off-by: Tejun Heo
    Reported-by: Johannes Berg

    Tejun Heo
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    watchdog: Don't throttle the watchdog
    tracing: Fix timer tracing

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    mutex: Improve the scalability of optimistic spinning

    Linus Torvalds
     
  • sparse spotted that the kzalloc() in pm_qos_power_open() in the
    current Linus' git tree had its parameters swapped. Fix this.

    Signed-off-by: David Alan Gilbert
    Acked-by: mark gross
    Signed-off-by: Rafael J. Wysocki

    David Alan Gilbert
     
  • Now that the worklist is global, having works pending after wq
    destruction can easily lead to oops and destroy_workqueue() have
    several BUG_ON()s to catch these cases. Unfortunately, BUG_ON()
    doesn't tell much about how the work became pending after the final
    flush_workqueue().

    This patch adds WQ_DYING which is set before the final flush begins.
    If a work is requested to be queued on a dying workqueue,
    WARN_ON_ONCE() is triggered and the request is ignored. This clearly
    indicates which caller is trying to queue a work on a dying workqueue
    and keeps the system working in most cases.

    Locking rule comment is updated such that the 'I' rule includes
    modifying the field from destruction path.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

23 Aug, 2010

4 commits

  • worker_maybe_bind_and_lock() actually grabs gcwq->lock but was missing proper
    annotation. Add it. So this patch will remove following sparse warnings:

    kernel/workqueue.c:1214:13: warning: context imbalance in 'worker_maybe_bind_and_lock' - wrong count at exit
    arch/x86/include/asm/irqflags.h:44:9: warning: context imbalance in 'worker_rebind_fn' - unexpected unlock
    kernel/workqueue.c:1991:17: warning: context imbalance in 'rescuer_thread' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Signed-off-by: Tejun Heo

    Namhyung Kim
     
  • Some of internal functions called within gcwq->lock context releases and
    regrabs the lock but were missing proper annotations. Add it.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Tejun Heo

    Namhyung Kim
     
  • There is a scalability issue for current implementation of optimistic
    mutex spin in the kernel. It is found on a 8 node 64 core Nehalem-EX
    system (HT mode).

    The intention of the optimistic mutex spin is to busy wait and spin on a
    mutex if the owner of the mutex is running, in the hope that the mutex
    will be released soon and be acquired, without the thread trying to
    acquire mutex going to sleep. However, when we have a large number of
    threads, contending for the mutex, we could have the mutex grabbed by
    other thread, and then another ……, and we will keep spinning, wasting cpu
    cycles and adding to the contention. One possible fix is to quit
    spinning and put the current thread on wait-list if mutex lock switch to
    a new owner while we spin, indicating heavy contention (see the patch
    included).

    I did some testing on a 8 socket Nehalem-EX system with a total of 64
    cores. Using Ingo's test-mutex program that creates/delete files with 256
    threads (http://lkml.org/lkml/2006/1/8/50) , I see the following speed up
    after putting in the mutex spin fix:

    ./mutex-test V 256 10
    Ops/sec
    2.6.34 62864
    With fix 197200

    Repeating the test with Aim7 fserver workload, again there is a speed up
    with the fix:

    Jobs/min
    2.6.34 91657
    With fix 149325

    To look at the impact on the distribution of mutex acquisition time, I
    collected the mutex acquisition time on Aim7 fserver workload with some
    instrumentation. The average acquisition time is reduced by 48% and
    number of contentions reduced by 32%.

    #contentions Time to acquire mutex (cycles)
    2.6.34 72973 44765791
    With fix 49210 23067129

    The histogram of mutex acquisition time is listed below. The acquisition
    time is in 2^bin cycles. We see that without the fix, the acquisition
    time is mostly around 2^26 cycles. With the fix, we the distribution get
    spread out a lot more towards the lower cycles, starting from 2^13.
    However, there is an increase of the tail distribution with the fix at
    2^28 and 2^29 cycles. It seems a small price to pay for the reduced
    average acquisition time and also getting the cpu to do useful work.

    Mutex acquisition time distribution (acq time = 2^bin cycles):
    2.6.34 With Fix
    bin #occurrence % #occurrence %
    11 2 0.00% 120 0.24%
    12 10 0.01% 790 1.61%
    13 14 0.02% 2058 4.18%
    14 86 0.12% 3378 6.86%
    15 393 0.54% 4831 9.82%
    16 710 0.97% 4893 9.94%
    17 815 1.12% 4667 9.48%
    18 790 1.08% 5147 10.46%
    19 580 0.80% 6250 12.70%
    20 429 0.59% 6870 13.96%
    21 311 0.43% 1809 3.68%
    22 255 0.35% 2305 4.68%
    23 317 0.44% 916 1.86%
    24 610 0.84% 233 0.47%
    25 3128 4.29% 95 0.19%
    26 63902 87.69% 122 0.25%
    27 619 0.85% 286 0.58%
    28 0 0.00% 3536 7.19%
    29 0 0.00% 903 1.83%
    30 0 0.00% 0 0.00%

    I've done similar experiments with 2.6.35 kernel on smaller boxes as
    well. One is on a dual-socket Westmere box (12 cores total, with HT).
    Another experiment is on an old dual-socket Core 2 box (4 cores total, no
    HT)

    On the 12-core Westmere box, I see a 250% increase for Ingo's mutex-test
    program with my mutex patch but no significant difference in aim7's
    fserver workload.

    On the 4-core Core 2 box, I see the difference with the patch for both
    mutex-test and aim7 fserver are negligible.

    So far, it seems like the patch has not caused regression on smaller
    systems.

    Signed-off-by: Tim Chen
    Acked-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: # .35.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tim Chen
     
  • Stephane reported that when the machine locks up, the regular ticks,
    which are responsible to resetting the throttle count, stop too.

    Hence the NMI watchdog can end up being throttled before it reports on
    the locked up state, and we end up being sad..

    Cure this by having the watchdog overflow reset its own throttle count.

    Reported-by: Stephane Eranian
    Tested-by: Stephane Eranian
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra