28 Jan, 2011

1 commit

  • In current rtmutex, the pending owner may be boosted by the tasks
    in the rtmutex's waitlist when the pending owner is deboosted
    or a task in the waitlist is boosted. This boosting is unrelated,
    because the pending owner does not really take the rtmutex.
    It is not reasonable.

    Example.

    time1:
    A(high prio) onwers the rtmutex.
    B(mid prio) and C (low prio) in the waitlist.

    time2
    A release the lock, B becomes the pending owner
    A(or other high prio task) continues to run. B's prio is lower
    than A, so B is just queued at the runqueue.

    time3
    A or other high prio task sleeps, but we have passed some time
    The B and C's prio are changed in the period (time2 ~ time3)
    due to boosting or deboosting. Now C has the priority higher
    than B. ***Is it reasonable that C has to boost B and help B to
    get the rtmutex?

    NO!! I think, it is unrelated/unneed boosting before B really
    owns the rtmutex. We should give C a chance to beat B and
    win the rtmutex.

    This is the motivation of this patch. This patch *ensures*
    only the top waiter or higher priority task can take the lock.

    How?
    1) we don't dequeue the top waiter when unlock, if the top waiter
    is changed, the old top waiter will fail and go to sleep again.
    2) when requiring lock, it will get the lock when the lock is not taken and:
    there is no waiter OR higher priority than waiters OR it is top waiter.
    3) In any time, the top waiter is changed, the top waiter will be woken up.

    The algorithm is much simpler than before, no pending owner, no
    boosting for pending owner.

    Other advantage of this patch:
    1) The states of a rtmutex are reduced a half, easier to read the code.
    2) the codes become shorter.
    3) top waiter is not dequeued until it really take the lock:
    they will retain FIFO when it is stolen.

    Not advantage nor disadvantage
    1) Even we may wakeup multiple waiters(any time when top waiter changed),
    we hardly cause "thundering herd",
    the number of wokenup task is likely 1 or very little.
    2) two APIs are changed.
    rt_mutex_owner() will not return pending owner, it will return NULL when
    the top waiter is going to take the lock.
    rt_mutex_next_owner() always return the top waiter.
    will not return NULL if we have waiters
    because the top waiter is not dequeued.

    I have fixed the code that use these APIs.

    need updated after this patch is accepted
    1) Document/*
    2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Reviewed-by: Steven Rostedt
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     

26 Jan, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: wacom - pass touch resolution to clients through input_absinfo
    Input: wacom - add 2 Bamboo Pen and touch models
    Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent
    Input: sparse-keymap - fix KEY_VSW handling in sparse_keymap_setup
    Input: tegra-kbc - add tegra keyboard driver
    Input: gpio_keys - switch to using request_any_context_irq
    Input: serio - allow registered drivers to get status flag
    Input: ct82710c - return proper error code for ct82c710_open
    Input: bu21013_ts - added regulator support
    Input: bu21013_ts - remove duplicate resolution parameters
    Input: tnetv107x-ts - don't treat NULL clk as an error
    Input: tnetv107x-keypad - don't treat NULL clk as an error

    Fix up trivial conflicts in drivers/input/keyboard/Makefile due to
    additions of tc3589x/Tegra drivers

    Linus Torvalds
     
  • The -rt patches change the console_semaphore to console_mutex. As a
    result, a quite large chunk of the patches changes all
    acquire/release_console_sem() to acquire/release_console_mutex()

    This commit makes things use more neutral function names which dont make
    implications about the underlying lock.

    The only real change is the return value of console_trylock which is
    inverted from try_acquire_console_sem()

    This patch also paves the way to switching console_sem from a semaphore to
    a mutex.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
    Signed-off-by: Torben Hohn
    Cc: Thomas Gleixner
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Torben Hohn
     

25 Jan, 2011

4 commits

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf tools: Fix time function double declaration with glibc
    perf tools: Fix build by checking if extra warnings are supported
    perf tools: Fix build when using gcc 3.4.6
    perf tools: Add missing header, fixes build
    perf tools: Fix 64 bit integer format strings
    perf test: Fix build on older glibcs
    perf: perf_event_exit_task_context: s/rcu_dereference/rcu_dereference_raw/
    perf test: Use cpu_map->[cpu] when setting affinity
    perf symbols: Fix annotation of thumb code
    perf: Annotate cpuctx->ctx.mutex to avoid a lockdep splat
    powerpc, perf: Fix frequency calculation for overflowing counters (FSL version)
    perf: Fix perf_event_init_task()/perf_event_free_task() interaction
    perf: Fix find_get_context() vs perf_event_exit_task() race

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    RTC: Remove Kconfig symbol for UIE emulation
    RTC: Properly handle rtc_read_alarm error propagation and fix bug
    RTC: Propagate error handling via rtc_timer_enqueue properly
    acpi_pm: Clear pmtmr_ioport if acpi_pm initialization fails
    rtc: Cleanup removed UIE emulation declaration
    hrtimers: Notify hrtimer users of switches to NOHZ mode

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Fix poor interactivity on UP systems due to group scheduler nice tune bug

    Linus Torvalds
     
  • Currently sysrq_enabled and __sysrq_enabled are initialised separately
    and inconsistently, leading to sysrq being actually enabled by reported
    as not enabled in sysfs. The first change to the sysfs configurable
    synchronises these two:

    static int __read_mostly sysrq_enabled = 1;
    static int __sysrq_enabled;

    Add a common define to carry the default for these preventing them becoming
    out of sync again. Default this to 1 to mirror previous behaviour.

    Signed-off-by: Andy Whitcroft
    Cc: stable@kernel.org
    Signed-off-by: Dmitry Torokhov

    Andy Whitcroft
     

24 Jan, 2011

2 commits

  • Michael Witten and Christian Kujau reported that the autogroup
    scheduling feature hurts interactivity on their UP systems.

    It turns out that this is an older bug in the group scheduling code,
    and the wider appeal provided by the autogroup feature exposed it
    more prominently.

    When on UP with FAIR_GROUP_SCHED enabled, tune shares
    only affect tg->shares, but is not reflected in
    tg->se->load. The reason is that update_cfs_shares()
    does nothing on UP.

    So introduce update_cfs_shares() for UP && FAIR_GROUP_SCHED.

    This issue was found when enable autogroup scheduling was enabled,
    but it is an older bug that also exists on cgroup.cpu on UP.

    Reported-and-Tested-by: Michael Witten
    Reported-and-Tested-by: Christian Kujau
    Signed-off-by: Yong Zhang
    Acked-by: Pekka Enberg
    Acked-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yong Zhang
     
  • Currently only drivers that are built as modules have their versions
    shown in /sys/module//version, but this information might
    also be useful for built-in drivers as well. This especially important
    for drivers that do not define any parameters - such drivers, if
    built-in, are completely invisible from userspace.

    This patch changes MODULE_VERSION() macro so that in case when we are
    compiling built-in module, version information is stored in a separate
    section. Kernel then uses this data to create 'version' sysfs attribute
    in the same fashion it creates attributes for module parameters.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Rusty Russell

    Dmitry Torokhov
     

22 Jan, 2011

2 commits

  • * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: note the nested NOT_RUNNING test in worker_clr_flags() isn't a noop
    workqueue: relax lockdep annotation on flush_work()

    Linus Torvalds
     
  • In theory, almost every user of task->child->perf_event_ctxp[]
    is wrong. find_get_context() can install the new context at any
    moment, we need read_barrier_depends().

    dbe08d82ce3967ccdf459f7951d02589cf967300 "perf: Fix
    find_get_context() vs perf_event_exit_task() race" added
    rcu_dereference() into perf_event_exit_task_context() to make
    the precedent, but this makes __rcu_dereference_check() unhappy.
    Use rcu_dereference_raw() to shut up the warning.

    Reported-by: Ingo Molnar
    Signed-off-by: Oleg Nesterov
    Cc: acme@redhat.com
    Cc: paulus@samba.org
    Cc: stern@rowland.harvard.edu
    Cc: a.p.zijlstra@chello.nl
    Cc: fweisbec@gmail.com
    Cc: roland@redhat.com
    Cc: prasad@linux.vnet.ibm.com
    Cc: Paul E. McKenney
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

21 Jan, 2011

7 commits

  • Lockdep spotted:

    loop_1b_instruc/1899 is trying to acquire lock:
    (event_mutex){+.+.+.}, at: [] perf_trace_init+0x3b/0x2f7

    but task is already holding lock:
    (&ctx->mutex){+.+.+.}, at: [] perf_event_init_context+0xc0/0x218

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #3 (&ctx->mutex){+.+.+.}:
    -> #2 (cpu_hotplug.lock){+.+.+.}:
    -> #1 (module_mutex){+.+...}:
    -> #0 (event_mutex){+.+.+.}:

    But because the deadlock would be cpuhotplug (cpu-event) vs fork
    (task-event) it cannot, in fact, happen. We can annotate this by giving the
    perf_event_context used for the cpuctx a different lock class from those
    used by tasks.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • All architectures are finally converted. Remove the cruft.

    Signed-off-by: Thomas Gleixner
    Cc: Richard Henderson
    Cc: Mike Frysinger
    Cc: David Howells
    Cc: Tony Luck
    Cc: Greg Ungerer
    Cc: Michal Simek
    Acked-by: David Howells
    Cc: Kyle McMartin
    Acked-by: Benjamin Herrenschmidt
    Cc: Chen Liqin
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Jeff Dike

    Thomas Gleixner
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
    lockdep: Move early boot local IRQ enable/disable status to init/main.c

    Linus Torvalds
     
  • * akpm:
    kernel/smp.c: consolidate writes in smp_call_function_interrupt()
    kernel/smp.c: fix smp_call_function_many() SMP race
    memcg: correctly order reading PCG_USED and pc->mem_cgroup
    backlight: fix 88pm860x_bl macro collision
    drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
    MAINTAINERS: update Atmel AT91 entry
    mm: fix truncate_setsize() comment
    memcg: fix rmdir, force_empty with THP
    memcg: fix LRU accounting with THP
    memcg: fix USED bit handling at uncharge in THP
    memcg: modify accounting function for supporting THP better
    fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
    mm: compaction: prevent division-by-zero during user-requested compaction
    mm/vmscan.c: remove duplicate include of compaction.h
    memblock: fix memblock_is_region_memory()
    thp: keep highpte mapped until it is no longer needed
    kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

    Linus Torvalds
     
  • We have to test the cpu mask in the interrupt handler before checking the
    refs, otherwise we can start to follow an entry before its deleted and
    find it partially initailzed for the next trip. Presently we also clear
    the cpumask bit before executing the called function, which implies
    getting write access to the line. After the function is called we then
    decrement refs, and if they go to zero we then unlock the structure.

    However, this implies getting write access to the call function data
    before and after another the function is called. If we can assert that no
    smp_call_function execution function is allowed to enable interrupts, then
    we can move both writes to after the function is called, hopfully allowing
    both writes with one cache line bounce.

    On a 256 thread system with a kernel compiled for 1024 threads, the time
    to execute testcase in the "smp_call_function_many race" changelog was
    reduced by about 30-40ms out of about 545 ms.

    I decided to keep this as WARN because its now a buggy function, even
    though the stack trace is of no value -- a simple printk would give us the
    information needed.

    Raw data:

    Without patch:
    ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
    ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
    ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
    ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
    ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
    ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
    ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
    ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

    With patch:
    ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
    ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
    ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
    ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
    ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
    ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
    ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

    #include
    #include
    #include /* sched clock */

    #define ITERATIONS 100

    static void do_nothing_ipi(void *dummy)
    {
    }

    static void do_ipis(struct work_struct *dummy)
    {
    int i;

    for (i = 0; i < ITERATIONS; i++)
    smp_call_function(do_nothing_ipi, NULL, 1);

    printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
    }

    static struct work_struct work[NR_CPUS];

    static int __init testcase_init(void)
    {
    int cpu;
    u64 start, started, done;

    start = local_clock();
    for_each_online_cpu(cpu) {
    INIT_WORK(&work[cpu], do_ipis);
    schedule_work_on(cpu, &work[cpu]);
    }
    started = local_clock();
    for_each_online_cpu(cpu)
    flush_work(&work[cpu]);
    done = local_clock();
    pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
    started-start, done-started, done-start);

    return 0;
    }

    static void __exit testcase_exit(void)
    {
    }

    module_init(testcase_init)
    module_exit(testcase_exit)
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Anton Blanchard");

    Signed-off-by: Milton Miller
    Cc: Anton Blanchard
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • I noticed a failure where we hit the following WARN_ON in
    generic_smp_call_function_interrupt:

    if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
    continue;

    data->csd.func(data->csd.info);

    refs = atomic_dec_return(&data->refs);
    WARN_ON(refs < 0); cpumask sees and
    clears bit in cpumask
    might be using old or new fn!
    decrements refs below 0

    set data->refs (too late!)

    The important thing to note is since the interrupt handler walks a
    potentially stale call_function.queue without any locking, then another
    cpu can view the percpu *data structure at any time, even when the owner
    is in the process of initialising it.

    The following test case hits the WARN_ON 100% of the time on my PowerPC
    box (having 128 threads does help :)

    #include
    #include

    #define ITERATIONS 100

    static void do_nothing_ipi(void *dummy)
    {
    }

    static void do_ipis(struct work_struct *dummy)
    {
    int i;

    for (i = 0; i < ITERATIONS; i++)
    smp_call_function(do_nothing_ipi, NULL, 1);

    printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
    }

    static struct work_struct work[NR_CPUS];

    static int __init testcase_init(void)
    {
    int cpu;

    for_each_online_cpu(cpu) {
    INIT_WORK(&work[cpu], do_ipis);
    schedule_work_on(cpu, &work[cpu]);
    }

    return 0;
    }

    static void __exit testcase_exit(void)
    {
    }

    module_init(testcase_init)
    module_exit(testcase_exit)
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Anton Blanchard");

    I tried to fix it by ordering the read and the write of ->cpumask and
    ->refs. In doing so I missed a critical case but Paul McKenney was able
    to spot my bug thankfully :) To ensure we arent viewing previous
    iterations the interrupt handler needs to read ->refs then ->cpumask then
    ->refs _again_.

    Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

    [miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
    [miltonm@bga.com: remove excess tests]
    Signed-off-by: Anton Blanchard
    Signed-off-by: Milton Miller
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Cc: [2.6.32+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched, cgroup: Use exit hook to avoid use-after-free crash
    sched: Fix signed unsigned comparison in check_preempt_tick()
    sched: Replace rq->bkl_count with rq->rq_sched_info.bkl_count
    sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure
    sched: Display autogroup names in /proc/sched_debug
    sched: Reinstate group names in /proc/sched_debug
    sched: Update effective_load() to use global share weights

    Linus Torvalds
     

20 Jan, 2011

5 commits

  • percpu may end up calling vfree() during early boot which in
    turn may call on_each_cpu() for TLB flushes. The function of
    on_each_cpu() can be done safely while IRQ is disabled during
    early boot but it assumed that the function is always called
    with local IRQ enabled which ended up enabling local IRQ
    prematurely during boot and triggering a couple of warnings.

    This patch updates on_each_cpu() and smp_call_function_many()
    such on_each_cpu() can be used safely while
    early_boot_irqs_disabled is set.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Acked-by: Pekka Enberg
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Reported-by: Ingo Molnar

    Tejun Heo
     
  • During early boot, local IRQ is disabled until IRQ subsystem is
    properly initialized. During this time, no one should enable
    local IRQ and some operations which usually are not allowed with
    IRQ disabled, e.g. operations which might sleep or require
    communications with other processors, are allowed.

    lockdep tracked this with early_boot_irqs_off/on() callbacks.
    As other subsystems need this information too, move it to
    init/main.c and make it generally available. While at it,
    toggle the boolean to early_boot_irqs_disabled instead of
    enabled so that it can be initialized with %false and %true
    indicates the exceptional condition.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Acked-by: Pekka Enberg
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • When NOHZ=y and high res timers are disabled (via cmdline or
    Kconfig) tick_nohz_switch_to_nohz() will notify the user about
    switching into NOHZ mode. Nothing is printed for the case where
    HIGH_RES_TIMERS=y. Fix this for the HIGH_RES_TIMERS=y case by
    duplicating the printk from the low res NOHZ path in the high
    res NOHZ path.

    This confused me since I was thinking 'dmesg | grep -i NOHZ' would
    tell me if NOHZ was enabled, but if I have hrtimers there is
    nothing.

    Signed-off-by: Stephen Boyd
    Acked-by: Thomas Gleixner
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephen Boyd
     
  • perf_event_init_task() should clear child->perf_event_ctxp[]
    before anything else. Otherwise, if
    perf_event_init_context(perf_hw_context) fails,
    perf_event_free_task() can free perf_event_ctxp[perf_sw_context]
    copied from parent->perf_event_ctxp[] by dup_task_struct().

    Also move the initialization of perf_event_mutex and
    perf_event_list from perf_event_init_context() to
    perf_event_init_context().

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Alan Stern
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Prasad
    Cc: Roland McGrath
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • find_get_context() must not install the new perf_event_context
    if the task has already passed perf_event_exit_task().

    If nothing else, this means the memory leak. Initially
    ctx->refcount == 2, it is supposed that
    perf_event_exit_task_context() should participate and do the
    necessary put_ctx().

    find_lively_task_by_vpid() checks PF_EXITING but this buys
    nothing, by the time we call find_get_context() this task can be
    already dead. To the point, cmpxchg() can succeed when the task
    has already done the last schedule().

    Change find_get_context() to populate task->perf_event_ctxp[]
    under task->perf_event_mutex, this way we can trust PF_EXITING
    because perf_event_exit_task() takes the same mutex.

    Also, change perf_event_exit_task_context() to use
    rcu_dereference(). Probably this is not strictly needed, but
    with or without this change find_get_context() can race with
    setup_new_exec()->perf_event_exit_task(), rcu_dereference()
    looks better.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Alan Stern
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Prasad
    Cc: Roland McGrath
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

19 Jan, 2011

4 commits

  • By not notifying the controller of the on-exit move back to
    init_css_set, we fail to move the task out of the previous
    cgroup's cfs_rq. This leads to an opportunity for a
    cgroup-destroy to come in and free the cgroup (there are no
    active tasks left in it after all) to which the not-quite dead
    task is still enqueued.

    Reported-by: Miklos Vajna
    Fixed-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Cc:
    Cc: Mike Galbraith
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Peter Zijlstra
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Validate cpu early in perf_event_alloc()
    perf: Find_get_context: fix the per-cpu-counter check
    perf: Fix contexted inheritance

    Linus Torvalds
     
  • Starting from perf_event_alloc()->perf_init_event(), the kernel
    assumes that event->cpu is either -1 or the valid CPU number.

    Change perf_event_alloc() to validate this argument early. This
    also means we can remove the similar check in
    find_get_context().

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Alan Stern
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Prasad
    Cc: Roland McGrath
    Cc: gregkh@suse.de
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • If task == NULL, find_get_context() should always check that cpu
    is correct.

    Afaics, the bug was introduced by 38a81da2 "perf events: Clean
    up pid passing", but even before that commit "&& cpu != -1" was
    not exactly right, -ESRCH from find_task_by_vpid() is not
    accurate.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Alan Stern
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Prasad
    Cc: Roland McGrath
    Cc: gregkh@suse.de
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

18 Jan, 2011

7 commits

  • Linus reported that the RCU lockdep annotation bits triggered for this
    rcu_dereference() because we're not holding rcu_read_lock().

    Going over the code I cannot convince myself its correct:

    - holding a ref on the parent_ctx, doesn't avoid it being uncloned
    concurrently (as the comment says), so we can race with a free.

    - holding parent_ctx->mutex doesn't avoid the above free from taking
    place either, it would at best avoid parent_ctx from being freed.

    I.e. the warning is correct. To fix the bug, serialize against the
    unclone_ctx() call by extending the reach of the parent_ctx->lock.

    Reported-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Paul E. McKenney
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed unsigned comparison may lead to superfluous resched if leftmost
    is right of the current task, wasting a few cycles, and inadvertently
    _lengthening_ the current task's slice.

    Reported-by: Venkatesh Pallipadi
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Now rq->rq_sched_info.bkl_count is not used for rq, scroll
    rq->bkl_count into it. Thus we can save some space for rq.

    Signed-off-by: Yong Zhang
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yong Zhang
     
  • If CONFIG_RT_GROUP_SCHED is set, __sched_setscheduler() fails due to autogroup
    not allocating rt_runtime. Free unused/unusable rt_se and rt_rq, redirect RT
    tasks to the root task group, and tell __sched_setscheduler() that it's ok.

    Reported-and-tested-by: Bharata B Rao
    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Add autogroup name to cfs_rq and tasks information to /proc/sched_debug.

    Signed-off-by: Bharata B Rao
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Bharata B Rao
     
  • Displaying of group names in /proc/sched_debug was dropped in autogroup
    patches. Add group names while displaying cfs_rq and tasks information.

    Signed-off-by: Bharata B Rao
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Bharata B Rao
     
  • Previously effective_load would approximate the global load weight present on
    a group taking advantage of:

    entity_weight = tg->shares ( lw / global_lw ), where entity_weight was provided
    by tg_shares_up.

    This worked (approximately) for an 'empty' (at tg level) cpu since we would
    place boost load representative of what a newly woken task would receive.

    However, now that load is instantaneously updated this assumption is no longer
    true and the load calculation is rather incorrect in this case.

    Fix this (and improve the general case) by re-writing effective_load to take
    advantage of the new shares distribution code.

    Signed-off-by: Paul Turner
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Turner
     

16 Jan, 2011

1 commit

  • …linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: avoid pointless blocked-task warnings
    rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
    rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, olpc: Add missing Kconfig dependencies
    x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
    x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
    x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timekeeping: Make local variables static
    time: Rename misnamed minsec argument of clocks_calc_mult_shift()

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Remove syscall_exit_fields
    tracing: Only process module tracepoints once
    perf record: Add "nodelay" mode, disabled by default
    perf sched: Fix list of events, dropping unsupported ':r' modifier
    Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
    perf top: Fix annotate segv
    perf evsel: Fix order of event list deletion

    Linus Torvalds
     

15 Jan, 2011

3 commits

  • There is no need for syscall_exit_fields as the syscall
    exit event class can already host the fields in its structure,
    like most other trace events do by default. Use that
    default behavior instead.

    Following this scheme, we don't need anymore to override the
    get_fields() callback of the syscall exit event class either.

    Hence both syscall_exit_fields and syscall_get_exit_fields() can
    be removed.

    Also changed some indentation to keep the following under 80
    characters:

    ".fields = LIST_HEAD_INIT(event_class_syscall_exit.fields),"

    Acked-by: Frederic Weisbecker
    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
    kernel: fix hlist_bl again
    cgroups: Fix a lockdep warning at cgroup removal
    fs: namei fix ->put_link on wrong inode in do_filp_open

    Linus Torvalds
     
  • cgroup can't use simple_lookup(), since that'd override its desired ->d_op.

    Tested-by: Li Zefan
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

14 Jan, 2011

2 commits