15 Dec, 2009

4 commits

  • Further name space cleanup. No functional change

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Cc: linux-arch@vger.kernel.org

    Thomas Gleixner
     
  • The raw_spin* namespace was taken by lockdep for the architecture
    specific implementations. raw_spin_* would be the ideal name space for
    the spinlocks which are not converted to sleeping locks in preempt-rt.

    Linus suggested to convert the raw_ to arch_ locks and cleanup the
    name space instead of using an artifical name like core_spin,
    atomic_spin or whatever

    No functional change.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Cc: linux-arch@vger.kernel.org

    Thomas Gleixner
     
  • Separate spin_lock and rw_lock functions. Preempt-RT needs to exclude
    the rw_lock functions from being compiled. The reordering allows to do
    that with a single #ifdef.

    No functional change.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Acked-by: Ingo Molnar

    Thomas Gleixner
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

13 Dec, 2009

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
    sched: Remove forced2_migrations stats
    sched: Fix memory leak in two error corner cases
    sched: Fix build warning in get_update_sysctl_factor()
    sched: Update normalized values on user updates via proc
    sched: Make tunable scaling style configurable
    sched: Fix missing sched tunable recalculation on cpu add/remove
    sched: Fix task priority bug
    sched: cgroup: Implement different treatment for idle shares
    sched: Remove unnecessary RCU exclusion
    sched: Discard some old bits
    sched: Clean up check_preempt_wakeup()
    sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call
    sched: Sanitize fork() handling
    sched: Clean up ttwu() rq locking
    sched: Remove rq->clock coupling from set_task_cpu()
    sched: Consolidate select_task_rq() callers
    sched: Remove sysctl.sched_features
    sched: Protect sched_rr_get_param() access to task->sched_class
    sched: Protect task->cpus_allowed access in sched_getaffinity()
    sched: Fix balance vs hotplug race
    ...

    Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)

    Linus Torvalds
     

12 Dec, 2009

8 commits

  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    itimer: Fix the itimer trace print format
    hrtimer: move timer stats helper functions to hrtimer.c
    hrtimer: Tune hrtimer_interrupt hang logic

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    lockdep: Avoid out of bounds array reference in save_trace()
    futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
    lockstat: Add usage info to Documentation/lockstat.txt
    lockstat: Fix min, max times in /proc/lock_stats

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Remove comparing of NULL to va_list in trace_array_vprintk()
    tracing: Fix function graph trace_pipe to properly display failed entries
    tracing: Add full state to trace_seq
    tracing: Buffer the output of seq_file in case of filled buffer
    tracing: Only call pipe_close if pipe_close is defined
    tracing: Add pipe_close interface

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
    x86, perf events: Check if we have APIC enabled
    perf_event: Fix variable initialization in other codepaths
    perf kmem: Fix unused argument build warning
    perf symbols: perf_header__read_build_ids() offset'n'size should be u64
    perf symbols: dsos__read_build_ids() should read both user and kernel buildids
    perf tools: Align long options which have no short forms
    perf kmem: Show usage if no option is specified
    sched: Mark sched_clock() as notrace
    perf sched: Add max delay time snapshot
    perf tools: Correct size given to memset
    perf_event: Fix perf_swevent_hrtimer() variable initialization
    perf sched: Fix for getting task's execution time
    tracing/kprobes: Fix field creation's bad error handling
    perf_event: Cleanup for cpu_clock_perf_event_update()
    perf_event: Allocate children's perf_event_ctxp at the right time
    perf_event: Clean up __perf_event_init_context()
    hw-breakpoints: Modify breakpoints without unregistering them
    perf probe: Update perf-probe document
    perf probe: Support --del option
    trace-kprobe: Support delete probe syntax
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits)
    tty: split the lock up a bit further
    tty: Move the leader test in disassociate
    tty: Push the bkl down a bit in the hangup code
    tty: Push the lock down further into the ldisc code
    tty: push the BKL down into the handlers a bit
    tty: moxa: split open lock
    tty: moxa: Kill the use of lock_kernel
    tty: moxa: Fix modem op locking
    tty: moxa: Kill off the throttle method
    tty: moxa: Locking clean up
    tty: moxa: rework the locking a bit
    tty: moxa: Use more tty_port ops
    tty: isicom: fix deadlock on shutdown
    tty: mxser: Use the new locking rules to fix setserial properly
    tty: mxser: use the tty_port_open method
    tty: isicom: sort out the board init logic
    tty: isicom: switch to the new tty_port_open helper
    tty: tty_port: Add a kref object to the tty port
    tty: istallion: tty port open/close methods
    tty: stallion: Convert to the tty_port_open/close methods
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
    kgdb: Always process the whole breakpoint list on activate or deactivate
    kgdb: continue and warn on signal passing from gdb
    kgdb,x86: do not set kgdb_single_step on x86
    kgdb: allow for cpu switch when single stepping
    kgdb,i386: Fix corner case access to ss with NMI watch dog exception
    kgdb: Replace strstr() by strchr() for single-character needles
    kgdbts: Read buffer overflow
    kgdb: Read buffer overflow
    kgdb,x86: remove redundant test

    Linus Torvalds
     
  • There are two call points, both want to check that tty->signal->leader is
    set. Move the test into disassociate_ctty() as that will make locking
    changes easier in a bit

    Signed-off-by: Alan Cox
    Signed-off-by: Greg Kroah-Hartman

    Alan Cox
     
  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (109 commits)
    PCI: fix coding style issue in pci_save_state()
    PCI: add pci_request_acs
    PCI: fix BUG_ON triggered by logical PCIe root port removal
    PCI: remove ifdefed pci_cleanup_aer_correct_error_status
    PCI: unconditionally clear AER uncorr status register during cleanup
    x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
    PCI: portdrv: remove redundant definitions
    PCI: portdrv: remove unnecessary struct pcie_port_data
    PCI: portdrv: minor cleanup for pcie_port_device_register
    PCI: portdrv: add missing irq cleanup
    PCI: portdrv: enable device before irq initialization
    PCI: portdrv: cleanup service irqs initialization
    PCI: portdrv: check capabilities first
    PCI: portdrv: move PME capability check
    PCI: portdrv: remove redundant pcie type calculation
    PCI: portdrv: cleanup pcie_device registration
    PCI: portdrv: remove redundant pcie_port_device_probe
    PCI: Always set prefetchable base/limit upper32 registers
    PCI: read-modify-write the pcie device control register when initiating pcie flr
    PCI: show dma_mask bits in /sys
    ...

    Fixed up conflicts in:
    arch/x86/kernel/amd_iommu_init.c
    drivers/pci/dmar.c
    drivers/pci/hotplug/acpiphp_glue.c

    Linus Torvalds
     

11 Dec, 2009

7 commits

  • This patch fixes 2 edge cases in using kgdb in conjunction with gdb.

    1) kgdb_deactivate_sw_breakpoints() should process the entire array of
    breakpoints. The failure to do so results in breakpoints that you
    cannot remove, because a break point can only be removed if its
    state flag is set to BP_SET.

    The easy way to duplicate this problem is to plant a break point in
    a kernel module and then unload the kernel module.

    2) kgdb_activate_sw_breakpoints() should process the entire array of
    breakpoints. The failure to do so results in missed breakpoints
    when a breakpoint cannot be activated.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • On some architectures for the segv trap, gdb wants to pass the signal
    back on continue. For kgdb this is not the default behavior, because
    it can cause the kernel to crash if you arbitrarily pass back a
    exception outside of kgdb.

    Instead of causing instability, pass a message back to gdb about the
    supported kgdb signal passing and execute a standard kgdb continue
    operation.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • The kgdb core should not assume that a single step operation of a
    kernel thread will complete on the same CPU. The single step flag is
    set at the "thread" level and it is possible in a multi cpu system
    that a kernel thread can get scheduled on another cpu the next time it
    is run.

    As a further safety net in case a slave cpu is hung, the debug master
    cpu will try 100 times before giving up and assuming control of the
    slave cpus is no longer possible. It is more useful to be able to get
    some information out of kgdb instead of spinning forever.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • Roel Kluin reported an error found with Parfait. Where we want to
    ensure that that kgdb_info[-1] never gets accessed.

    Also check to ensure any negative tid does not exceed the size of the
    shadow CPU array, else report critical debug context because it is an
    internal kgdb failure.

    Reported-by: Roel Kluin
    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • This build warning:

    kernel/sched.c: In function 'set_task_cpu':
    kernel/sched.c:2070: warning: unused variable 'old_rq'

    Made me realize that the forced2_migrations stat looks pretty
    pointless (and a misnomer) - remove it.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: Add debugobjects support

    Linus Torvalds
     
  • Signed-off-by: Xiao Guangrong
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     

10 Dec, 2009

14 commits

  • If the second in each of these pairs of allocations fails, then the
    first one will not be freed in the error route out.

    Found by a static code analysis tool.

    Signed-off-by: Phil Carmody
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Phil Carmody
     
  • There is no reason to make timer_stats_hrtimer_set_start_info and
    friends visible to the rest of the kernel. So move all of them to
    hrtimer.c. Also make timer_stats_hrtimer_set_start_info a static
    inline function so it gets inlined and we avoid another function call.
    Based on a patch by Thomas Gleixner.

    Signed-off-by: Heiko Carstens
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Heiko Carstens
     
  • The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
    execution time of the hrtimer callbacks.

    This is error-prone for virtual machines, where a guest vcpu can be
    scheduled out during the execution of the callbacks (and the callbacks
    themselves can do operations that translate to blocking operations in
    the hypervisor), which in can lead to large min_delta_ns rendering the
    system unusable.

    Replace the current heuristics with something more reliable. Allow the
    interrupt code to try 3 times to catch up with the lost time. If that
    fails use the total time spent in the interrupt handler to defer the
    next timer interrupt so the system can catch up with other things
    which got delayed. Limit that deferment to 100ms.

    The retry events and the maximum time spent in the interrupt handler
    are recorded and exposed via /proc/timer_list

    Inspired by a patch from Marcelo.

    Reported-by: Michael Tokarev
    Signed-off-by: Thomas Gleixner
    Tested-by: Marcelo Tosatti
    Cc: kvm@vger.kernel.org

    Thomas Gleixner
     
  • Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Mike Galbraith
     
  • ia64 found this the hard way (because we currently have a stub
    for save_stack_trace() that does nothing). But it would be a
    good idea to be cautious in case a real save_stack_trace()
    bailed out with an error before it set trace->nr_entries.

    Signed-off-by: Tony Luck
    Acked-by: Peter Zijlstra
    Cc: luming.yu@intel.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Luck, Tony
     
  • …t/rostedt/linux-2.6-trace into tracing/core

    Ingo Molnar
     
  • fix:

    [] ? printk+0x1d/0x24
    [] ? perf_prepare_sample+0x269/0x280
    [] warn_slowpath_common+0x71/0xd0
    [] ? perf_prepare_sample+0x269/0x280
    [] warn_slowpath_null+0x1a/0x20
    [] perf_prepare_sample+0x269/0x280
    [] ? cpu_clock+0x53/0x90
    [] __perf_event_overflow+0x2a8/0x300
    [] perf_event_overflow+0x1b/0x30
    [] perf_swevent_hrtimer+0x7f/0x120

    This is because 'data.raw' variable not initialize.

    Signed-off-by: Xiao Guangrong
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     
  • Olof Johansson stated the following:

    Comparing a va_list with NULL is bogus. It's supposed to be treated like
    an opaque type and only be manipulated with va_* accessors.

    Olof noticed that this code broke the ARM builds:

    kernel/trace/trace.c: In function 'trace_array_vprintk':
    kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
    kernel/trace/trace.c: In function 'tracing_mark_write':
    kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'

    This patch partly reverts c13d2f7c3231e873f30db92b96c8caa48f100f33 and
    re-installs the original mark_printk() mechanism.

    Reported-by: Olof Johansson
    Signed-off-by: Carsten Emde
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Carsten Emde
     
  • There is a case where the graph tracer might get confused and omits
    displaying of a single record. This applies mostly with the trace_pipe
    since it is unlikely that the trace_seq buffer will overflow with the
    trace file.

    As the function_graph tracer goes through the trace entries keeping a
    pointer to the current record:

    current -> func1 ENTRY
    func2 ENTRY
    func2 RETURN
    func1 RETURN

    When an function ENTRY is encountered, it moves the pointer to the
    next entry to check if the function is a nested or leaf function.

    func1 ENTRY
    current -> func2 ENTRY
    func2 RETURN
    func1 RETURN

    If the rest of the writing of the function fills the trace_seq buffer,
    then the trace_pipe read will ignore this entry. The next read will
    Now start at the current location, but the first entry (func1) will
    be discarded.

    This patch keeps a copy of the current entry in the iterator private
    storage and will keep track of when the trace_seq buffer fills. When
    the trace_seq buffer fills, it will reuse the copy of the entry in the
    next iteration.

    [
    This patch has been largely modified by Steven Rostedt in order to
    clean it up and simplify it. The original idea and concept was from
    Jirka and for that, this patch will go under his name to give him
    the credit he deserves. But because this was modify by Steven Rostedt
    anything wrong with the patch should be blamed on Steven.
    ]

    Signed-off-by: Jiri Olsa
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • The trace_seq buffer might fill up, and right now one needs to check the
    return value of each printf into the buffer to check for that.

    Instead, have the buffer keep track of whether it is full or not, and
    reject more input if it is full or would have overflowed with an input
    that wasn't added.

    Cc: Lai Jiangshan
    Signed-off-by: Johannes Berg
    Signed-off-by: Steven Rostedt

    Johannes Berg
     
  • If the seq_read fills the buffer it will call s_start again on the next
    itertation with the same position. This causes a problem with the
    function_graph tracer because it consumes the iteration in order to
    determine leaf functions.

    What happens is that the iterator stores the entry, and the function
    graph plugin will look at the next entry. If that next entry is a return
    of the same function and task, then the function is a leaf and the
    function_graph plugin calls ring_buffer_read which moves the ring buffer
    iterator forward (the trace iterator still points to the function start
    entry).

    The copying of the trace_seq to the seq_file buffer will fail if the
    seq_file buffer is full. The seq_read will not show this entry.
    The next read by userspace will cause seq_read to again call s_start
    which will reuse the trace iterator entry (the function start entry).
    But the function return entry was already consumed. The function graph
    plugin will think that this entry is a nested function and not a leaf.

    To solve this, the trace code now checks the return status of the
    seq_printf (trace_print_seq). If the writing to the seq_file buffer
    fails, we set a flag in the iterator (leftover) and we do not reset
    the trace_seq buffer. On the next call to s_start, we check the leftover
    flag, and if it is set, we just reuse the trace_seq buffer and do not
    call into the plugin print functions.

    Before this patch:

    2) | fput() {
    2) | __fput() {
    2) 0.550 us | inotify_inode_queue_event();
    2) | __fsnotify_parent() {
    2) 0.540 us | inotify_dentry_parent_queue_event();

    After the patch:

    2) | fput() {
    2) | __fput() {
    2) 0.550 us | inotify_inode_queue_event();
    2) 0.548 us | __fsnotify_parent();
    2) 0.540 us | inotify_dentry_parent_queue_event();

    [
    Updated the patch to fix a missing return 0 from the trace_print_seq()
    stub when CONFIG_TRACING is disabled.

    Reported-by: Ingo Molnar
    ]

    Reported-by: Jiri Olsa
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This fixes a cut and paste error that had pipe_close get called
    if pipe_open was defined (not pipe_close).

    Reported-by: Kosaki Motohiro
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • * 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sys: Remove BKL from sys_reboot
    pm_qos: clean up racy global "name" variable
    pm_qos: remove BKL

    Linus Torvalds
     

09 Dec, 2009

6 commits

  • When we define the common event fields in kprobe, we invert the error
    handling and return immediately in case of success. Then we omit
    to define specific kprobes fields (ip and nargs), and specific
    kretprobes fields (func, ret_ip, nargs). And we only define them
    when we fail to create common fields.

    The most visible consequence is that we can't create filter for
    k(ret)probes specific fields.

    This patch re-invert the success/error handling to fix it.

    Reported-by: Lai Jiangshan
    Signed-off-by: Frederic Weisbecker
    Acked-by: Masami Hiramatsu
    Cc: Steven Rostedt
    Cc: Li Zefan
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The normalized values are also recalculated in case the scaling factor
    changes.

    This patch updates the internally used scheduler tuning values that are
    normalized to one cpu in case a user sets new values via sysfs.

    Together with patch 2 of this series this allows to let user configured
    values scale (or not) to cpu add/remove events taking place later.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    [ v2: fix warning ]
    Signed-off-by: Ingo Molnar

    Christian Ehrhardt
     
  • As scaling now takes place on all kind of cpu add/remove events a user
    that configures values via proc should be able to configure if his set
    values are still rescaled or kept whatever happens.

    As the comments state that log2 was just a second guess that worked the
    interface is not just designed for on/off, but to choose a scaling type.
    Currently this allows none, log and linear, but more important it allwos
    us to keep the interface even if someone has an even better idea how to
    scale the values.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christian Ehrhardt
     
  • Based on Peter Zijlstras patch suggestion this enables recalculation of
    the scheduler tunables in response of a change in the number of cpus. It
    also adds a max of eight cpus that are considered in that scaling.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christian Ehrhardt
     
  • 83f9ac removed a call to effective_prio() in wake_up_new_task(), which
    leads to tasks running at MAX_PRIO.

    This is caused by the idle thread being set to MAX_PRIO before forking
    off init. O(1) used that to make sure idle was always preempted, CFS
    uses check_preempt_curr_idle() for that so we can savely remove this bit
    of legacy code.

    Reported-by: Mike Galbraith
    Tested-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • When setting the weight for a per-cpu task-group, we have to put in a
    phantom weight when there is no work on that cpu, otherwise we'll not
    service that cpu when new work gets placed there until we again update
    the per-cpu weights.

    We used to add these phantom weights to the total, so that the idle
    per-cpu shares don't get inflated, this however causes the non-idle
    parts to get deflated, causing unexpected weight distibutions.

    Reverse this, so that the non-idle shares are correct but the idle
    shares are inflated.

    Reported-by: Yasunori Goto
    Tested-by: Yasunori Goto
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra