28 Jul, 2009

2 commits

  • Commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 ("kthreads: rework
    kthread_stop()") removed the limitation that the thread function mysr
    not call do_exit() itself, but forgot to update the comment.

    Since that commit it is OK to use kthread_stop() even if kthread can
    exit itself.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The check_modstruct_version() needs to look up the symbol "module_layout"
    in the kernel, but it does so literally and not by a C identifier. The
    trouble is that it does not include a symbol prefix for those ports that
    need it (like the Blackfin and H8300 port). So make sure we tack on the
    MODULE_SYMBOL_PREFIX define to the front of it.

    Signed-off-by: Mike Frysinger
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     

23 Jul, 2009

12 commits

  • Since genirq: Delegate irq affinity setting to the irq thread
    (591d2fb02ea80472d846c0b8507007806bdd69cc) compilation with
    CONFIG_SMP=n fails with following error:

    /usr/src/linux-2.6/kernel/irq/manage.c:
    In function 'irq_thread_check_affinity':
    /usr/src/linux-2.6/kernel/irq/manage.c:475:
    error: 'struct irq_desc' has no member named 'affinity'
    make[4]: *** [kernel/irq/manage.o] Error 1

    That commit adds a new function irq_thread_check_affinity() which
    uses struct irq_desc.affinity which is only available for CONFIG_SMP=y.
    Move that function under #ifdef CONFIG_SMP.

    [ tglx@brownpaperbag: compile and boot tested on UP and SMP ]

    Signed-off-by: Bruno Premont
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Bruno Premont
     
  • …nel/git/peterz/linux-2.6-perf

    * 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits)
    perf_counter tools: Give perf top inherit option
    perf_counter tools: Fix vmlinux symbol generation breakage
    perf_counter: Detect debugfs location
    perf_counter: Add tracepoint support to perf list, perf stat
    perf symbol: C++ demangling
    perf: avoid structure size confusion by using a fixed size
    perf_counter: Fix throttle/unthrottle event logging
    perf_counter: Improve perf stat and perf record option parsing
    perf_counter: PERF_SAMPLE_ID and inherited counters
    perf_counter: Plug more stack leaks
    perf: Fix stack data leak
    perf_counter: Remove unused variables
    perf_counter: Make call graph option consistent
    perf_counter: Add perf record option to log addresses
    perf_counter: Log vfork as a fork event
    perf_counter: Synthesize VDSO mmap event
    perf_counter: Make sure we dont leak kernel memory to userspace
    perf_counter tools: Fix index boundary check
    perf_counter: Fix the tracepoint channel to perfcounters
    perf_counter, x86: Extend perf_counter Pentium M support
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    softirq: introduce tasklet_hrtimer infrastructure

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: Prevent NULL pointer dereference
    timer: Avoid reading uninitialized data

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Delegate irq affinity setting to the irq thread

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: fix nr_uninterruptible accounting of frozen tasks really
    sched: fix load average accounting vs. cpu hotplug
    sched: Account for vruntime wrapping

    Linus Torvalds
     
  • the "reserved" field was not initialized to zero, resulting in 4 bytes
    of stack data leaking to userspace....

    Signed-off-by: Arjan van de Ven
    Acked-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Right now we only print PERF_EVENT_THROTTLE + 1 (ie PERF_EVENT_UNTHROTTLE).
    Fix this to print both a throttle and unthrottle event.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Anton Blanchard
     
  • Anton noted that for inherited counters the counter-id as provided by
    PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
    because each inherited counter gets its own id.

    His suggestion was to always return the parent counter id, since that
    is the primary counter id as exposed. However, these inherited
    counters have a unique identifier so that events like
    PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
    counter gets modified, which is important when trying to normalize the
    sample streams.

    This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
    which is more useful anyway, since changing periods became a lot more
    common than initially thought -- rendering PERF_EVENT_PERIOD the less
    useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
    value, since it reports the value used to trigger the overflow,
    whereas PERF_EVENT_PERIOD simply reports the requested period changed,
    which might only take effect on the next cycle).

    This still leaves us PERF_EVENT_THROTTLE to consider, but since that
    _should_ be a rare occurrence, and linking it to a primary id is the
    most useful bit to diagnose the problem, we introduce a
    PERF_SAMPLE_STREAM_ID, for those few cases where the full
    reconstruction is important.

    [Does change the ABI a little, but I see no other way out]

    Suggested-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Peter Zijlstra
     
  • Per example of Arjan's patch, I went through and found a few more.

    Signed-off-by: Peter Zijlstra

    Peter Zijlstra
     
  • the "reserved" field was not initialized to zero, resulting in 4 bytes
    of stack data leaking to userspace....

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Peter Zijlstra

    Arjan van de Ven
     
  • Peter Zijlstra
     

22 Jul, 2009

1 commit

  • commit ca109491f (hrtimer: removing all ur callback modes) moved all
    hrtimer callbacks into hard interrupt context when high resolution
    timers are active. That breaks code which relied on the assumption
    that the callback happens in softirq context.

    Provide a generic infrastructure which combines tasklets and hrtimers
    together to provide an in-softirq hrtimer experience.

    Signed-off-by: Peter Zijlstra
    Cc: torvalds@linux-foundation.org
    Cc: kaber@trash.net
    Cc: David Miller
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

21 Jul, 2009

1 commit

  • irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might
    sleep, but irq_set_thread_affinity() is called with desc->lock held
    and can be called from hard interrupt context as well. The code has
    another bug as it does not hold a ref on the task struct as required
    by set_cpus_allowed_ptr().

    Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time
    the thread runs it migrates itself. Solves all of the above problems
    nicely.

    Add kerneldoc to irq_set_thread_affinity() while at it.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:

    Thomas Gleixner
     

19 Jul, 2009

2 commits


18 Jul, 2009

5 commits

  • commit e3c8ca8336 (sched: do not count frozen tasks toward load) broke
    the nr_uninterruptible accounting on freeze/thaw. On freeze the task
    is excluded from accounting with a check for (task->flags &
    PF_FROZEN), but that flag is cleared before the task is thawed. So
    while we prevent that the task with state TASK_UNINTERRUPTIBLE
    is accounted to nr_uninterruptible on freeze we decrement
    nr_uninterruptible on thaw.

    Use a separate flag which is handled by the freezing task itself. Set
    it before calling the scheduler with TASK_UNINTERRUPTIBLE state and
    clear it after we return from frozen state.

    Cc:
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The new load average code clears rq->calc_load_active on
    CPU_ONLINE. That's wrong as the new onlined CPU might have got a
    scheduler tick already and accounted the delta to the stale value of
    the time we offlined the CPU.

    Clear the value when we cleanup the dead CPU instead.

    Also move the update of the calc_load_update time for the newly online
    CPU to CPU_UP_PREPARE to avoid that the CPU plays catch up with the
    stale update time value.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Right now we don't output vfork events. Even though we should
    always see an exec after a vfork, we may get perfcounter
    samples between the vfork and exec. These samples can lead to
    some confusion when parsing perfcounter data.

    To keep things consistent we should always log a fork event. It
    will result in a little more log data, but is less confusing to
    trace parsing tools.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     
  • There are a few places we are leaking tiny amounts of kernel
    memory to userspace. This happens when writing out strings
    because we always align the end to 64 bits.

    To avoid this we should always use an appropriately sized
    temporary buffer and ensure it is zeroed.

    Since d_path assembles the string from the end of the buffer
    backwards, we need to add 64 bits after the buffer to allow for
    alignment.

    We also need to copy arch_vma_name to the temporary buffer,
    because if we use it directly we may end up copying to
    userspace a number of bytes after the end of the string
    constant.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     
  • I spotted two sites that didn't take vruntime wrap-around into
    account. Fix these by creating a comparison helper that does do
    so.

    Signed-off-by: Fabio Checconi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Fabio Checconi
     

17 Jul, 2009

2 commits

  • ftrace_trace_onoff_callback() will return an error even if we do the
    right operation, for example:

    # echo _spin_*:traceon:10 > set_ftrace_filter
    -bash: echo: write error: Invalid argument
    # cat set_ftrace_filter
    #### all functions enabled ####
    _spin_trylock_bh:traceon:count=10
    _spin_unlock_irq:traceon:count=10
    _spin_unlock_bh:traceon:count=10
    _spin_lock_irq:traceon:count=10
    _spin_unlock:traceon:count=10
    _spin_trylock:traceon:count=10
    _spin_unlock_irqrestore:traceon:count=10
    _spin_lock_irqsave:traceon:count=10
    _spin_lock_bh:traceon:count=10
    _spin_lock:traceon:count=10

    We want to set _spin_*:traceon:10 to set_ftrace_filter, it complains
    with "Invalid argument", but the operation is successful.

    This is because ftrace_process_regex() returns the number of functions that
    matched the pattern. If the number is not 0, this value is returned
    by ftrace_regex_write() whereas we want to return the number of bytes
    virtually written.
    Also the file offset pointer is not updated in this case.

    If the number of matched functions is lower than the number of bytes written
    by the user, this results to a reprocessing of the string given by the user with
    a lower size, leading to a malformed ftrace regex and then a -EINVAL returned.

    So, this patch fixes it by returning 0 if no error occured.
    The fix also applies on 2.6.30

    Signed-off-by: Xiao Guangrong
    Reviewed-by: Li Zefan
    Cc: stable@kernel.org
    Signed-off-by: Frederic Weisbecker

    Xiao Guangrong
     
  • …l/git/peterz/linux-2.6-sched

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched:
    sched: Fix bug in SCHED_IDLE interaction with group scheduling
    sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq()
    sched: Reset sched stats on fork()
    sched_rt: Fix overload bug on rt group scheduling
    sched: Documentation/sched-rt-group: Fix style issues & bump version

    Linus Torvalds
     

15 Jul, 2009

2 commits


13 Jul, 2009

4 commits

  • The per cpu variable stat is freeded if we fail to allocate a name
    on start up. This was due to stat at first being allocated in the
    initial design. But since then, it has become a static per cpu variable
    but the free on error was not removed.

    Also added __init annotation to the function that this is in.

    [ Impact: prevent possible memory corruption on low mem at boot up ]

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Fix a missed rename in EVENT_PROFILE support so that it gets
    built and allows tracepoint tracing from the 'perf' tool.

    Fix a typo in the (never before built & enabled) portion in
    perf_counter.c as well, and update that code to the
    attr.config changes as well.

    Signed-off-by: Chris Wilson
    Cc: Ben Gamari
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Chris Wilson
     
  • * 'kmemleak' of git://linux-arm.org/linux-2.6:
    kmemleak: Remove alloc_bootmem annotations introduced in the past
    kmemleak: Add callbacks to the bootmem allocator
    kmemleak: Allow partial freeing of memory blocks
    kmemleak: Trace the kmalloc_large* functions in slub
    kmemleak: Scan objects allocated during a scanning episode
    kmemleak: Do not acquire scan_mutex in kmemleak_open()
    kmemleak: Remove the reported leaks number limitation
    kmemleak: Add more cond_resched() calls in the scanning thread
    kmemleak: Renice the scanning thread to +10

    Linus Torvalds
     
  • * Remove smp_lock.h from files which don't need it (including some headers!)
    * Add smp_lock.h to files which do need it
    * Make smp_lock.h include conditional in hardirq.h
    It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

    This will make hardirq.h inclusion cheaper for every PREEMPT=n config
    (which includes allmodconfig/allyesconfig, BTW)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Jul, 2009

5 commits

  • get_futex_key() can infinitely loop if it is called on a
    virtual address that is within a huge page but not aligned to
    the beginning of that page. The call to get_user_pages_fast
    will return the struct page for a sub-page within the huge page
    and the check for page->mapping will always fail.

    The fix is to call compound_head on the page before checking
    that it's mapped.

    Signed-off-by: Sonny Rao
    Acked-by: Thomas Gleixner
    Cc: stable@kernel.org
    Cc: anton@samba.org
    Cc: rajamony@us.ibm.com
    Cc: speight@us.ibm.com
    Cc: mstephen@us.ibm.com
    Cc: grimm@us.ibm.com
    Cc: mikey@ozlabs.au.ibm.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Sonny Rao
     
  • One of the isolation modifications for SCHED_IDLE is the
    unitization of sleeper credit. However the check for this
    assumes that the sched_entity we're placing always belongs to a
    task.

    This is potentially not true with group scheduling and leaves
    us rummaging randomly when we try to pull the policy.

    Signed-off-by: Paul Turner
    Cc: peterz@infradead.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Turner
     
  • …el/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    dma-debug: Fix the overlap() function to be correct and readable
    oprofile: reset bt_lost_no_mapping with other stats
    x86/oprofile: rename kernel parameter for architectural perfmon to arch_perfmon
    signals: declare sys_rt_tgsigqueueinfo in syscalls.h
    rcu: Mark Hierarchical RCU no longer experimental
    dma-debug: Put all hash-chain locks into the same lock class
    dma-debug: fix off-by-one error in overlap function

    Linus Torvalds
     
  • Optimize cond_resched() by removing one conditional.

    Currently cond_resched() checks system_state ==
    SYSTEM_RUNNING in order to avoid scheduling before the
    scheduler is running.

    We can however, as per suggestion of Matt, use
    PREEMPT_ACTIVE to accomplish that very same.

    Suggested-by: Matt Mackall
    Signed-off-by: Peter Zijlstra
    Acked-by: Matt Mackall
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Fix trace_print_seq()
    kprobes: No need to unlock kprobe_insn_mutex
    tracing/fastboot: Document the need of initcall_debug
    trace_export: Repair missed fields
    tracing: Fix stack tracer sysctl handling

    Linus Torvalds
     

10 Jul, 2009

4 commits

  • The timer migration expiry check should prevent the migration of a
    timer to another CPU when the timer expires before the next event is
    scheduled on the other CPU. Migrating the timer might delay it because
    we can not reprogram the clock event device on the other CPU. But the
    code implementing that check has two flaws:

    - for !HIGHRES the check compares the expiry value with the clock
    events device expiry value which is wrong for CLOCK_REALTIME based
    timers.

    - the check is racy. It holds the hrtimer base lock of the target CPU,
    but the clock event device expiry value can be modified
    nevertheless, e.g. by an timer interrupt firing.

    The !HIGHRES case is easy to fix as we can enqueue the timer on the
    cpu which was selected by the load balancer. It runs the idle
    balancing code once per jiffy anyway. So the maximum delay for the
    timer is the same as when we keep the tick on the current cpu going.

    In the HIGHRES case we can get the next expiry value from the hrtimer
    cpu_base of the target CPU and serialize the update with the cpu_base
    lock. This moves the lock section in hrtimer_interrupt() so we can set
    next_event to KTIME_MAX while we are handling the expired timers and
    set it to the next expiry value after we handled the timers under the
    base lock. While the expired timers are processed timer migration is
    blocked because the expiry time of the timer is always

    Thomas Gleixner
     
  • The timer migration code needs to check whether the expiry time of the
    timer is before the programmed clock event expiry time when the timer
    is enqueued on another CPU because we can not reprogram the timer
    device on the other CPU. The current logic checks the expiry time even
    if we enqueue on the current CPU when nohz_get_load_balancer() returns
    current CPU. This might lead to an endless loop in the expiry check
    code when the expiry time of the timer is before the current
    programmed next event.

    Check whether nohz_get_load_balancer() returns current CPU and skip
    the expiry check if this is the case.

    The bug was triggered from the networking code. The patch fixes the
    regression http://bugzilla.kernel.org/show_bug.cgi?id=13738
    (Soft-Lockup/Race in networking in 2.6.31-rc1+195)

    Cc: Arun Bharadwaj
    Tested-by: Andres Freund
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • init_rt_rq() initializes only rq->rt.pushable_tasks, and not the
    pushable_tasks field of the passed rt_rq. The plist is not used
    uninitialized since the only pushable_tasks plists used are the
    ones of root rt_rqs; anyway reinitializing the list on every group
    creation corrupts the root plist, losing its previous contents.

    Signed-off-by: Fabio Checconi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    CC: Gregory Haskins
    Signed-off-by: Ingo Molnar

    Fabio Checconi
     
  • The sched_stat fields are currently not reset upon fork.
    Ingo's recent commit 6c594c21fcb02c662f11c97be4d7d2b73060a205
    did reset nr_migrations, but it didn't reset any of the
    others.

    This patch resets all sched_stat fields on fork.

    Signed-off-by: Lucas De Marchi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lucas De Marchi