15 Sep, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
    netxen: update copyright
    netxen: fix tx timeout recovery
    netxen: fix file firmware leak
    netxen: improve pci memory access
    netxen: change firmware write size
    tg3: Fix return ring size breakage
    netxen: build fix for INET=n
    cdc-phonet: autoconfigure Phonet address
    Phonet: back-end for autoconfigured addresses
    Phonet: fix netlink address dump error handling
    ipv6: Add IFA_F_DADFAILED flag
    net: Add DEVTYPE support for Ethernet based devices
    mv643xx_eth.c: remove unused txq_set_wrr()
    ucc_geth: Fix hangs after switching from full to half duplex
    ucc_geth: Rearrange some code to avoid forward declarations
    phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
    drivers/net/phy: introduce missing kfree
    drivers/net/wan: introduce missing kfree
    net: force bridge module(s) to be GPL
    Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
    ...

    Fixed up trivial conflicts:

    - arch/x86/include/asm/socket.h

    converted to in the x86 tree. The generic
    header has the same new #define's, so that works out fine.

    - drivers/net/tun.c

    fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
    switched over to using 'tun->socket.sk' instead of the redundantly
    available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
    to the TUN driver") which added a new 'tun->sk' use.

    Noted in 'next' by Stephen Rothwell.

    Linus Torvalds
     

12 Sep, 2009

12 commits

  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
    ring-buffer: only enable ring_buffer_swap_cpu when needed
    ring-buffer: check for swapped buffers in start of committing
    tracing: report error in trace if we fail to swap latency buffer
    tracing: add trace_array_printk for internal tracers to use
    tracing: pass around ring buffer instead of tracer
    tracing: make tracing_reset safe for external use
    tracing: use timestamp to determine start of latency traces
    tracing: Remove mentioning of legacy latency_trace file from documentation
    tracing/filters: Defer pred allocation, fix memory leak
    tracing: remove users of tracing_reset
    tracing: disable buffers and synchronize_sched before resetting
    tracing: disable update max tracer while reading trace
    tracing: print out start and stop in latency traces
    ring-buffer: disable all cpu buffers when one finds a problem
    ring-buffer: do not count discarded events
    ring-buffer: remove ring_buffer_event_discard
    ring-buffer: fix ring_buffer_read crossing pages
    ring-buffer: remove unnecessary cpu_relax
    ring-buffer: do not swap buffers during a commit
    ring-buffer: do not reset while in a commit
    ...

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
    sched: Fix sched::sched_stat_wait tracepoint field
    sched: Disable NEW_FAIR_SLEEPERS for now
    sched: Keep kthreads at default priority
    sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
    sched: Turn off child_runs_first
    sched: Ensure that a child can't gain time over it's parent after fork()
    sched: enable SD_WAKE_IDLE
    sched: Deal with low-load in wake_affine()
    sched: Remove short cut from select_task_rq_fair()
    sched: Turn on SD_BALANCE_NEWIDLE
    sched: Clean up topology.h
    sched: Fix dynamic power-balancing crash
    sched: Remove reciprocal for cpu_power
    sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
    sched: Try to deal with low capacity
    sched: Scale down cpu_power due to RT tasks
    sched: Implement dynamic cpu_power
    sched: Add smt_gain
    sched: Update the cpu_power sum during load-balance
    sched: Add SD_PREFER_SIBLING
    ...

    Linus Torvalds
     
  • …/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
    perf tools: Avoid unnecessary work in directory lookups
    perf stat: Clean up statistics calculations a bit more
    perf stat: More advanced variance computation
    perf stat: Use stddev_mean in stead of stddev
    perf stat: Remove the limit on repeat
    perf stat: Change noise calculation to use stddev
    x86, perf_counter, bts: Do not allow kernel BTS tracing for now
    x86, perf_counter, bts: Correct pointer-to-u64 casts
    x86, perf_counter, bts: Fail if BTS is not available
    perf_counter: Fix output-sharing error path
    perf trace: Fix read_string()
    perf trace: Print out in nanoseconds
    perf tools: Seek to the end of the header area
    perf trace: Fix parsing of perf.data
    perf trace: Sample timestamps as well
    perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
    perf trace: Sample the CPU too
    perf tools: Work around strict aliasing related warnings
    perf tools: Clean up warnings list in the Makefile
    perf tools: Complete support for dynamic strings
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Do not mask oneshot edge type interrupts
    genirq: Support nested threaded irq handling
    genirq: Add buslock support
    genirq: Add oneshot support

    Linus Torvalds
     
  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    pci/intr_remapping: Allocate irq_iommu on node
    irq: Add irq_node() primitive
    irq: Make sure irq_desc for legacy irq get correct node setting
    genirq: Add prototype for handle_nested_irq()
    irq: Remove superfluous NULL pointer check in check_irq_resend()
    irq: Clean up by removing irqfixup MODULE_PARM_DESC()
    genirq: Fix comment describing suspend_device_irqs()
    genirq: Remove obsolete defines and typedefs

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
    rcu: Move end of special early-boot RCU operation earlier
    rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments
    rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees
    rcu: Remove lockdep annotations from RCU's _notrace() API members
    rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds
    rcu: Add CPU-offline processing for single-node configurations
    rcu: Add "notrace" to RCU function headers used by ftrace
    rcu: Remove CONFIG_PREEMPT_RCU
    rcu: Merge preemptable-RCU functionality into hierarchical RCU
    rcu: Simplify rcu_pending()/rcu_check_callbacks() API
    rcu: Use debugfs_remove_recursive() simplify code.
    rcu: Merge per-RCU-flavor initialization into pre-existing macro
    rcu: Fix online/offline indication for rcudata.csv trace file
    rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h
    rcu: Renamings to increase RCU clarity
    rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h
    rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP
    rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.
    rcu: Fix typo in rcu_irq_exit() comment header
    rcu: Make rcupreempt_trace.c look at offline CPUs
    ...

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    printk: Fix "printk: Enable the use of more than one CON_BOOT (early console)"
    printk: Restore previous console_loglevel when re-enabling logging
    printk: Ensure that "console enabled" messages are printed on the console
    printk: Enable the use of more than one CON_BOOT (early console)

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
    locking, m68k/asm-offsets: Rename signal defines
    locking: Inline spinlock code for all locking variants on s390
    locking: Simplify spinlock inlining
    locking: Allow arch-inlined spinlocks
    locking: Move spinlock function bodies to header file
    locking, m68k: Calculate thread_info offset with asm offset
    locking, m68k/asm-offsets: Rename pt_regs offset defines
    locking, sparc: Rename __spin_try_lock() and friends
    locking, powerpc: Rename __spin_try_lock() and friends
    lockdep: Remove recursion stattistics
    lockdep: Simplify lock_stat seqfile code
    lockdep: Simplify lockdep_chains seqfile code
    lockdep: Simplify lockdep seqfile code
    lockdep: Fix missing entries in /proc/lock_chains
    lockdep: Fix missing entry in /proc/lock_stat
    lockdep: Fix memory usage info of BFS
    lockdep: Reintroduce generation count to make BFS faster
    lockdep: Deal with many similar locks
    lockdep: Introduce lockdep_assert_held()
    lockdep: Fix style nits
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: Detect mismatched requeue targets
    futex: Correct futex_wait_requeue_pi() commentary

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    debug lockups: Improve lockup detection, fix generic arch fallback
    debug lockups: Improve lockup detection

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    workqueues: Improve schedule_work() documentation

    Linus Torvalds
     
  • * 'writeback' of git://git.kernel.dk/linux-2.6-block:
    writeback: check for registered bdi in flusher add and inode dirty
    writeback: add name to backing_dev_info
    writeback: add some debug inode list counters to bdi stats
    writeback: get rid of pdflush completely
    writeback: switch to per-bdi threads for flushing data
    writeback: move dirty inodes from super_block to backing_dev_info
    writeback: get rid of generic_sync_sb_inodes() export

    Linus Torvalds
     

11 Sep, 2009

4 commits

  • This enables us to track who does what and print info. Its main use
    is catching dirty inodes on the default_backing_dev_info, so we can
    fix that up.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • James Morris
     
  • This weird perf trace output:

    cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

    Is caused by setting one component field of the delta to zero
    a bit too early. Move it to later.

    ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
    it's just a reporting bug in essence. )

    Acked-by: Peter Zijlstra
    Cc: Nikos Chantziaras
    Cc: Jens Axboe
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Nikos Chantziaras and Jens Axboe reported that turning off
    NEW_FAIR_SLEEPERS improves desktop interactivity visibly.

    Nikos described his experiences the following way:

    " With this setting, I can do "nice -n 19 make -j20" and
    still have a very smooth desktop and watch a movie at
    the same time. Various other annoyances (like the
    "logout/shutdown/restart" dialog of KDE not appearing
    at all until the background fade-out effect has finished)
    are also gone. So this seems to be the single most
    important setting that vastly improves desktop behavior,
    at least here. "

    Jens described it the following way, referring to a 10-seconds
    xmodmap scheduling delay he was trying to debug:

    " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
    I get:

    Performance counter stats for 'xmodmap .xmodmap-carl':

    9.009137 task-clock-msecs # 0.447 CPUs
    18 context-switches # 0.002 M/sec
    1 CPU-migrations # 0.000 M/sec
    315 page-faults # 0.035 M/sec

    0.020167093 seconds time elapsed

    Woot! "

    So disable it for now. In perf trace output i can see weird
    delta timestamps:

    cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

    That nsec field is not supposed to be that large. More digging
    is needed - but lets turn it off while the real bug is found.

    Reported-by: Nikos Chantziaras
    Tested-by: Nikos Chantziaras
    Reported-by: Jens Axboe
    Tested-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

09 Sep, 2009

3 commits

  • Removes kthread/workqueue priority boost, they increase worst-case
    desktop latencies.

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Reduce the latency target from 20 msecs to 5 msecs.

    Why? Larger latencies increase spread, which is good for scaling,
    but bad for worst case latency.

    We still have the ilog(nr_cpus) rule to scale up on bigger
    server boxes.

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Set child_runs_first default to off.

    It hurts 'optimal' make -j workloads as make jobs
    get preempted by child tasks, reducing parallelism.

    Note, this patch might make existing races in user
    applications more prominent than before - so breakages
    might be bisected to this commit.

    Child-runs-first is broken on SMP to begin with, and we
    already had it off briefly in v2.6.23 so most of the
    offenders ought to be fixed. Would be nice not to revert
    this commit but fix those apps finally ...

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
    people want to work around broken apps. ]
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

08 Sep, 2009

3 commits

  • A fork/exec load is usually "pass the baton", so the child
    should never be placed behind the parent. With START_DEBIT we
    make room for the new task, but with child_runs_first, that
    room comes out of the _parent's_ hide. There's nothing to say
    that the parent wasn't ahead of min_vruntime at fork() time,
    which means that the "baton carrier", who is essentially the
    parent in drag, can gain time and increase scheduling latencies
    for waiters.

    With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
    enabled, we essentially pass the sleeper fairness off to the
    child, which is fine, but if we don't base placement on the
    parent's updated vruntime, we can end up compounding latency
    woes if the child itself then does fork/exec. The debit
    incurred at fork doesn't hurt the parent who is then going to
    sleep and maybe exit, but the child who acquires the error
    harms all comers.

    This improves latencies of make -j kernel build workloads.

    Reported-by: Jens Axboe
    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • wake_affine() would always fail under low-load situations where
    both prev and this were idle, because adding a single task will
    always be a significant imbalance, even if there's nothing
    around that could balance it.

    Deal with this by allowing imbalance when there's nothing you
    can do about it.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • select_task_rq_fair() incorrectly skips the wake_affine()
    logic, remove this.

    When prev_cpu == this_cpu, the code jumps straight to the
    wake_idle() logic, this doesn't give the wake_affine() logic
    the chance to pin the task to this cpu.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Sep, 2009

1 commit


06 Sep, 2009

3 commits


05 Sep, 2009

10 commits

  • Since the ability to swap the cpu buffers adds a small overhead to
    the recording of a trace, we only want to add it when needed.

    Only the irqsoff and preemptoff tracers use this feature, and both are
    not recommended for production kernels. This patch disables its use
    when neither irqsoff nor preemptoff is configured.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Because the irqsoff tracer can swap an internal CPU buffer, it is possible
    that a swap happens between the start of the write and before the committing
    bit is set (the committing bit will disable swapping).

    This patch adds a check for this and will fail the write if it detects it.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The irqsoff tracer will fail to swap the cpu buffer with the max
    buffer if it preempts a commit. Instead of ignoring this, this patch
    makes the tracer report it if the last max latency failed due to preempting
    a current commit.

    The output of the latency tracer will look like this:

    # tracer: irqsoff
    #
    # irqsoff latency trace v1.1.5 on 2.6.31-rc5
    # --------------------------------------------------------------------
    # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    # => started at: save_args
    # => ended at: __do_softirq
    #
    #
    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| /
    # ||||| delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress

    Note the latency time and the functions that disabled the irqs or preemption
    will still be listed.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch adds a trace_array_printk to allow a tracer to use the
    trace_printk on its own trace array.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The latency tracers (irqsoff and wakeup) can swap trace buffers
    on the fly. If an event is happening and has reserved data on one of
    the buffers, and the latency tracer swaps the global buffer with the
    max buffer, the result is that the event may commit the data to the
    wrong buffer.

    This patch changes the API to the trace recording to be recieve the
    buffer that was used to reserve a commit. Then this buffer can be passed
    in to the commit.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Reseting the trace buffer without first disabling the buffer and
    waiting for any writers to complete, can corrupt the ring buffer.

    This patch makes the external version of tracing_reset safe from
    corruption by disabling the ring buffer and calling synchronize_sched.

    This version can no longer be called from interrupt context. But all those
    callers have been removed.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently the latency tracers reset the ring buffer. Unfortunately
    if a commit is in process (due to a trace event), this can corrupt
    the ring buffer. When this happens, the ring buffer will detect
    the corruption and then permanently disable the ring buffer.

    The bug does not crash the system, but it does prevent further tracing
    after the bug is hit.

    Instead of reseting the trace buffers, the timestamp of the start of
    the trace is used instead. The buffers will still contain the previous
    data, but the output will not count any data that is before the
    timestamp of the trace.

    Note, this only affects the static trace output (trace) and not the
    runtime trace output (trace_pipe). The runtime trace output does not
    make sense for the latency tracers anyway.

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The predicates of an event and their filter structure are allocated
    when we create an event filter for the first time.

    These objects must be created once but each time we come with a new
    filter, we overwrite such pre-existing allocation, if any.

    Thus, this patch checks if the filter has already been allocated
    before going ahead.

    Spotted-by: Frederic Weisbecker
    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Tom Zanussi
    Cc: Masami Hiramatsu
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     
  • The function tracing_reset is deprecated for outside use of trace.c.

    The new function to reset the the buffers is tracing_reset_online_cpus.

    The reason for this is that resetting the buffers while the event
    trace points are active can corrupt the buffers, because they may
    be writing at the time of reset. The tracing_reset_online_cpus disables
    writes and waits for current writers to finish.

    This patch replaces all users of tracing_reset except for the latency
    tracers. Those changes require more work and will be removed in the
    following patches.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Resetting the ring buffers while traces are happening can corrupt
    the ring buffer and disable it (no kernel crash to worry about).

    The safest thing to do is disable the ring buffers, call synchronize_sched()
    to wait for all current writers to finish and then reset the buffer.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

04 Sep, 2009

3 commits

  • When reading the tracer from the trace file, updating the max latency
    may corrupt the output. This patch disables the tracing of the max
    latency while reading the trace file.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • During development of the tracer, we would copy information from
    the live tracer to the max tracer with one memcpy. Since then we
    added a generic ring buffer and we handle the copies differently now.
    Unfortunately, we never copied the critical section information, and
    we lost the output:

    # => started at: kmem_cache_alloc
    # => ended at: kmem_cache_alloc

    This patch adds back the critical start and end copying as well as
    removes the unused "trace_idx" and "overrun" fields of the
    trace_array_cpu structure.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently the way RB_WARN_ON works, is to disable either the current
    CPU buffer or all CPU buffers, depending on whether a ring_buffer or
    ring_buffer_per_cpu struct was passed into the macro.

    Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
    CPU buffer gets disabled but the rest are still active. This may
    confuse users even though a warning is sent to the console.

    This patch changes the macro to disable the entire buffer even if
    the CPU buffer is passed in.

    Signed-off-by: Steven Rostedt

    Steven Rostedt