29 Dec, 2008

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
    allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
    kbuild: strip generated symbols from *.ko
    kbuild: simplify use of genksyms
    kernel-doc: check for extra kernel-doc notations
    kbuild: add headerdep used to detect inclusion cycles in header files
    kbuild: fix string equality testing in tags.sh
    kbuild: fix make tags/cscope
    kbuild: fix make incompatibility
    kbuild: remove TAR_IGNORE
    setlocalversion: add git-svn support
    setlocalversion: print correct subversion revision
    scripts: improve the decodecode script
    scripts/package: allow custom options to rpm
    genksyms: allow to ignore symbol checksum changes
    genksyms: track symbol checksum changes
    tags and cscope support really belongs in a shell script
    kconfig: fix options to check-lxdialog.sh
    kbuild: gen_init_cpio expands shell variables in file names
    remove bashisms from scripts/extract-ikconfig
    kbuild: teach mkmakfile to be silent
    ...

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    sched: fix warning in fs/proc/base.c
    schedstat: consolidate per-task cpu runtime stats
    sched: use RCU variant of list traversal in for_each_leaf_rt_rq()
    sched, cpuacct: export percpu cpuacct cgroup stats
    sched, cpuacct: refactoring cpuusage_read / cpuusage_write
    sched: optimize update_curr()
    sched: fix wakeup preemption clock
    sched: add missing arch_update_cpu_topology() call
    sched: let arch_update_cpu_topology indicate if topology changed
    sched: idle_balance() does not call load_balance_newidle()
    sched: fix sd_parent_degenerate on non-numa smp machine
    sched: add uid information to sched_debug for CONFIG_USER_SCHED
    sched: move double_unlock_balance() higher
    sched: update comment for move_task_off_dead_cpu
    sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares
    sched/rt: removed unneeded defintion
    sched: add hierarchical accounting to cpu accounting controller
    sched: include group statistics in /proc/sched_debug
    sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER
    sched: clean up SCHED_CPUMASK_ALLOC
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
    sched, trace: update trace_sched_wakeup()
    tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
    Revert "x86: disable X86_PTRACE_BTS"
    ring-buffer: prevent false positive warning
    ring-buffer: fix dangling commit race
    ftrace: enable format arguments checking
    x86, bts: memory accounting
    x86, bts: add fork and exit handling
    ftrace: introduce tracing_reset_online_cpus() helper
    tracing: fix warnings in kernel/trace/trace_sched_switch.c
    tracing: fix warning in kernel/trace/trace.c
    tracing/ring-buffer: remove unused ring_buffer size
    trace: fix task state printout
    ftrace: add not to regex on filtering functions
    trace: better use of stack_trace_enabled for boot up code
    trace: add a way to enable or disable the stack tracer
    x86: entry_64 - introduce FTRACE_ frame macro v2
    tracing/ftrace: add the printk-msg-only option
    tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
    x86, bts: correctly report invalid bts records
    ...

    Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
    being already partly merged by the SH merge.

    Linus Torvalds
     

25 Dec, 2008

5 commits


24 Dec, 2008

4 commits

  • If cgroup_get_rootdir() failed, free_cg_links() will be called in the
    failure path, but tmp_cg_links hasn't been initialized at that time.

    I introduced this bug in the 2.6.27 merge window.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Remove spurious warning messages that are thrown onto the console during
    cgroup operations.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Sharyathi Nagesh
    Acked-by: Serge E. Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sharyathi Nagesh
     
  • Impact: eliminate false WARN_ON message

    If an interrupt goes off after the setting of the local variable
    tail_page and before incrementing the write index of that page,
    the interrupt could push the commit forward to the next page.

    Later a check is made to see if interrupts pushed the buffer around
    the entire ring buffer by comparing the next page to the last commited
    page. This can produce a false positive if the interrupt had pushed
    the commit page forward as stated above.

    Thanks to Jiaying Zhang for finding this race.

    Reported-by: Jiaying Zhang
    Signed-off-by: Steven Rostedt
    Cc:
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: fix stuck trace-buffers

    If an interrupt comes in during the rb_set_commit_to_write and
    pushes the tail page forward just at the right time, the commit
    updates will miss the adding of the interrupt data. This will
    cause the commit pointer to cease from moving forward.

    Thanks to Jiaying Zhang for finding this race.

    Reported-by: Jiaying Zhang
    Signed-off-by: Steven Rostedt
    Cc:
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

21 Dec, 2008

1 commit

  • Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW

    commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68 (clocksource:
    introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only
    available to read out the raw not NTP adjusted system time.

    The above commit did not prevent that a posix timer can be created
    with that clockid. The timer_create() syscall succeeds and initializes
    the timer to a non existing hrtimer base. When the timer is deleted
    either by timer_delete() or by the exit() cleanup the kernel crashes.

    Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the
    posix clock function to no_timer_create which returns an error code.

    Reported-and-tested-by: Eric Sesterhenn
    Signed-off-by: Thomas Gleixner
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

20 Dec, 2008

2 commits

  • Impact: introduce new ptrace facility

    Add arch_ptrace_untrace() function that is called when the tracer
    detaches (either voluntarily or when the tracing task dies);
    ptrace_disable() is only called on a voluntary detach.

    Add ptrace_fork() and arch_ptrace_fork(). They are called when a
    traced task is forked.

    Clear DS and BTS related fields on fork.

    Release DS resources and reclaim memory in ptrace_untrace(). This
    releases resources already when the tracing task dies. We used to do
    that when the traced task dies.

    Signed-off-by: Markus Metzger
    Signed-off-by: Ingo Molnar

    Markus Metzger
     
  • Building upon parts of the module stripping patch, this patch
    introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y.
    Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of
    CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64)
    kernels I tested with.

    The patch also does away with the need to special case the kallsyms-
    internal symbols by making them available even in the first linking
    stage.

    While it is a generated file, the patch includes the changes to
    scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure
    here is.

    Signed-off-by: Jan Beulich
    Signed-off-by: Sam Ravnborg

    Jan Beulich
     

19 Dec, 2008

4 commits


18 Dec, 2008

6 commits

  • Impact: simplify code

    When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
    twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
    These two stats are exactly the same.

    Given that task->se.sum_exec_runtime is always accumulated by the core
    scheduler, sched_info can reuse that data instead of duplicate the accounting.

    Signed-off-by: Ken Chen
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ken Chen
     
  • Impact: remove dead code

    struct ring_buffer.size is not set after ring_buffer is initialized
    or resized. it is always 0.

    we can use "buffer->pages * PAGE_SIZE" to get ring_buffer's size

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: fix occasionally incorrect trace output

    The tracing code has interesting varieties of printing out task state.

    Unfortunalely only one of the instances is correct as it copies the
    code from sched.c:sched_show_task(). The others are plain wrong as
    they treatthe bitfield as an integer offset into the character
    array. Also the size check of the character array is wrong as it
    includes the trailing \0.

    Use a common state decoder inline which does the Right Thing.

    Signed-off-by: Thomas Gleixner
    Acked-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Impact: enhancement

    Ingo Molnar has asked about a way to remove items from the filter
    lists. Currently, you can only add or replace items. The way
    items are added to the list is through opening one of the list
    files (set_ftrace_filter or set_ftrace_notrace) via append.
    If the file is opened for truncate, the list is cleared.

    echo spin_lock > /debug/tracing/set_ftrace_filter

    The above will replace the list with only spin_lock

    echo spin_lock >> /debug/tracing/set_ftrace_filter

    The above will add spin_lock to the list.

    Now this patch adds:

    echo '!spin_lock' >> /debug/tracing/set_ftrace_filter

    This will remove spin_lock from the list.

    The limited glob features of these lists also can be notted.

    echo '!spin_*' >> /debug/tracing/set_ftrace_filter

    This will remove all functions that start with 'spin_'

    Note:

    echo '!spin_*' > /debug/tracing/set_ftrace_filter

    will simply clear out the list (notice the '>' instead of '>>')

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: clean up

    Andrew Morton suggested to use the stack_tracer_enabled variable
    to decide whether or not to start stack tracing on bootup.
    This lets us remove the start_stack_trace variable.

    Reported-by: Andrew Morton
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: enhancement to stack tracer

    The stack tracer currently is either on when configured in or
    off when it is not. It can not be disabled when it is configured on.
    (besides disabling the function tracer that it uses)

    This patch adds a way to enable or disable the stack tracer at
    run time. It defaults off on bootup, but a kernel parameter 'stacktrace'
    has been added to enable it on bootup.

    A new sysctl has been added "kernel.stack_tracer_enabled" to let
    the user enable or disable the stack tracer at run time.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

17 Dec, 2008

3 commits

  • Impact: display ftrace_printk messages "as is"

    By default, ftrace_printk() messages find their output with some other
    informations like pid, caller, ...
    Sometimes a developer just want to have the ftrace_printk left "as is", without
    other information.

    This is done by providing a default-off option called printk-msg-only.
    To enable it, just do `echo printk-msg-only > /debugfs/tracing/trace_options`

    Before the patch:

    -2739 [000] 145.692153: __might_sleep: I'm an ftrace_printk msg in __might_sleep
    -2739 [000] 145.692155: __might_sleep: I'm another ftrace_printk msg in __might_sleep

    After the patch and the printk-msg-only option enabled:

    I'm an ftrace_printk msg in __might_sleep
    I'm another ftrace_printk msg in __might_sleep

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: prevent a trace recursion

    After some tests with function graph tracer under x86-32, I saw some recursions
    caused by ring_buffer_time_stamp() that calls preempt_enable_no_notrace() which
    calls preempt_schedule() which is traced itself.

    This patch re-enables preemption without rescheduling.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: fix potential of rare crash

    for_each_leaf_rt_rq() walks an RCU protected list (rq->leaf_rt_rq_list),
    but doesn't use list_for_each_entry_rcu(). Fix this.

    Signed-off-by: Bharata B Rao
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Bharata B Rao
     

16 Dec, 2008

6 commits

  • This patch export per-cpu CPU cycle usage for a given cpuacct cgroup.
    There is a need for a user space monitor daemon to track group CPU
    usage on per-cpu base. It is also useful for monitoring CFS load
    balancer behavior by tracking per CPU group usage.

    Signed-off-by: Ken Chen
    Reviewed-by: Li Zefan
    Reviewed-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Ken Chen
     
  • Impact: micro-optimize the code on 64-bit architectures

    In the thread regarding to 'export percpu cpuacct cgroup stats'
    http://lkml.org/lkml/2008/12/7/13

    akpm pointed out that current cpuacct code is inefficient. This patch
    refactoring the following:

    * make cpu_rq locking only on 32-bit
    * change iterator to each_present_cpu instead of each_possible_cpu to
    make it hotplug friendly.

    It's a bit of code churn, but I was rewarded with 160 byte code size saving
    on x86-64 arch and zero code size change on i386.

    Signed-off-by: Ken Chen
    Cc: Paul Menage
    Cc: Li Zefan
    Signed-off-by: Ingo Molnar

    Ken Chen
     
  • …cer' and 'tracing/hw-branch-tracing' into tracing/core

    Ingo Molnar
     
  • Impact: micro-optimization

    Skip the hard work when there is none.

    Signed-off-by: Peter Zijlstra
    Acked-by: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: sharpen the wakeup-granularity to always be against current scheduler time

    It was possible to do the preemption check against an old time stamp.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • When a cgroup is removed, it's unlinked from its parent's children list,
    but not actually freed until the last dentry on it is released (at which
    point cgrp->root->number_of_cgroups is decremented).

    Currently rebind_subsystems checks for the top cgroup's child list being
    empty in order to rebind subsystems into or out of a hierarchy - this can
    result in the set of subsystems bound to a hierarchy being
    removed-but-not-freed cgroup.

    The simplest fix for this is to forbid remounts that change the set of
    subsystems on a hierarchy that has removed-but-not-freed cgroups. This
    bug can be reproduced via:

    mkdir /mnt/cg
    mount -t cgroup -o ns,freezer cgroup /mnt/cg
    mkdir /mnt/cg/foo
    sleep 1h < /mnt/cg/foo &
    rmdir /mnt/cg/foo
    mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg
    kill $!

    Though the above will cause oops in -mm only but not mainline, but the bug
    can cause memory leak in mainline (and even oops)

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

15 Dec, 2008

1 commit

  • This reverts commit 5b7dba4ff834259a5623e03a565748704a8fe449, which
    caused a regression in hibernate, reported by and bisected by Fabio
    Comolli.

    This revert fixes

    http://bugzilla.kernel.org/show_bug.cgi?id=12155
    http://bugzilla.kernel.org/show_bug.cgi?id=12149

    Bisected-by: Fabio Comolli
    Requested-by: Rafael J. Wysocki
    Acked-by: Dave Kleikamp
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Dec, 2008

5 commits

  • arch_reinit_sched_domains() used to call arch_update_cpu_topology()
    via arch_init_sched_domains(). This call got lost with
    e761b7725234276a802322549cee5255305a0930 ("cpu hotplug, sched: Introduce
    cpu_active_map and redo sched domain managment (take 2)".

    So we might end up with outdated and missing cpus in the cpu core
    maps (architecture used to call arch_reinit_sched_domains if cpu
    topology changed).

    This adds a call to arch_update_cpu_topology in partition_sched_domains
    which gets called whenever scheduling domains get updated. Which is
    what is supposed to happen when cpu topology changes.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     
  • Change arch_update_cpu_topology so it returns 1 if the cpu topology changed
    and 0 if it didn't change. This will be useful for the next patch which adds
    a call to this function in partition_sched_domains.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     
  • The trace point only caught one of many places where a task changes cpu,
    put it in the right place to we get all of them.

    Change the signature while we're at it.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: make more obvious the hardirq calls in the output

    When a hardirq is triggered inside the codeflow on output, we have
    now two arrows that indicate the entry and return of the hardirq.

    0) | bit_waitqueue() {
    0) 0.880 us | __phys_addr();
    0) 2.699 us | }
    0) | __wake_up_bit() {
    0) ==========> | smp_apic_timer_interrupt() {
    0) 0.797 us | native_apic_mem_write();
    0) 0.715 us | exit_idle();
    0) | irq_enter() {
    0) 0.722 us | idle_cpu();
    0) 5.519 us | }
    0) | hrtimer_interrupt() {
    0) | ktime_get() {
    0) | ktime_get_ts() {
    0) 0.805 us | getnstimeofday();

    [...]

    0) ! 108.528 us | }
    0) | irq_exit() {
    0) | do_softirq() {
    0) | __do_softirq() {
    0) 0.895 us | __local_bh_disable();
    0) | run_timer_softirq() {
    0) 0.827 us | hrtimer_run_pending();
    0) 1.226 us | _spin_lock_irq();
    0) | _spin_unlock_irq() {
    0) 6.550 us | }
    0) 0.924 us | _local_bh_enable();
    0) + 12.129 us | }
    0) + 13.911 us | }
    0) 0.707 us | idle_cpu();
    0) + 17.009 us | }
    0) ! 137.419 us | }
    0)
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Ingo Molnar