09 Apr, 2009

1 commit

  • Freezing tasks via the cgroup freezer causes the load average to climb
    because the freezer's current implementation puts frozen tasks in
    uninterruptible sleep (D state).

    Some applications which perform job-scheduling functions consult the
    load average when making decisions. If a cgroup is frozen, the load
    average does not provide a useful measure of the system's utilization
    to such applications. This is especially inconvenient if the job
    scheduler employs the cgroup freezer as a mechanism for preempting low
    priority jobs. Contrast this with using SIGSTOP for the same purpose:
    the stopped tasks do not count toward system load.

    Change task_contributes_to_load() to return false if the task is
    frozen. This results in /proc/loadavg behavior that better meets
    users' expectations.

    Signed-off-by: Nathan Lynch
    Acked-by: Andrew Morton
    Acked-by: Nigel Cunningham
    Tested-by: Nigel Cunningham
    Cc:
    Cc: containers@lists.linux-foundation.org
    Cc: linux-pm@lists.linux-foundation.org
    Cc: Matt Helsley
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Nathan Lynch
     

08 Apr, 2009

1 commit

  • * 'core/softlockup' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    softlockup: make DETECT_HUNG_TASK default depend on DETECT_SOFTLOCKUP
    softlockup: move 'one' to the softlockup section in sysctl.c
    softlockup: ensure the task has been switched out once
    softlockup: remove timestamp checking from hung_task
    softlockup: convert read_lock in hung_task to rcu_read_lock
    softlockup: check all tasks in hung_task
    softlockup: remove unused definition for spawn_softlockup_task
    softlockup: fix potential race in hung_task when resetting timeout
    softlockup: fix to allow compiling with !DETECT_HUNG_TASK
    softlockup: decouple hung tasks check from softlockup detection

    Linus Torvalds
     

07 Apr, 2009

1 commit


06 Apr, 2009

2 commits

  • Conflicts:
    include/linux/irq.h
    kernel/irq/handle.c

    Ingo Molnar
     
  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
    tracing, net: fix net tree and tracing tree merge interaction
    tracing, powerpc: fix powerpc tree and tracing tree interaction
    ring-buffer: do not remove reader page from list on ring buffer free
    function-graph: allow unregistering twice
    trace: make argument 'mem' of trace_seq_putmem() const
    tracing: add missing 'extern' keywords to trace_output.h
    tracing: provide trace_seq_reserve()
    blktrace: print out BLK_TN_MESSAGE properly
    blktrace: extract duplidate code
    blktrace: fix memory leak when freeing struct blk_io_trace
    blktrace: fix blk_probes_ref chaos
    blktrace: make classic output more classic
    blktrace: fix off-by-one bug
    blktrace: fix the original blktrace
    blktrace: fix a race when creating blk_tree_root in debugfs
    blktrace: fix timestamp in binary output
    tracing, Text Edit Lock: cleanup
    tracing: filter fix for TRACE_EVENT_FORMAT events
    ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
    x86: kretprobe-booster interrupt emulation code fix
    ...

    Fix up trivial conflicts in
    arch/parisc/include/asm/ftrace.h
    include/linux/memory.h
    kernel/extable.c
    kernel/module.c

    Linus Torvalds
     

03 Apr, 2009

6 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    Remove two unneeded exports and make two symbols static in fs/mpage.c
    Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
    Trim includes of fdtable.h
    Don't crap into descriptor table in binfmt_som
    Trim includes in binfmt_elf
    Don't mess with descriptor table in load_elf_binary()
    Get rid of indirect include of fs_struct.h
    New helper - current_umask()
    check_unsafe_exec() doesn't care about signal handlers sharing
    New locking/refcounting for fs_struct
    Take fs_struct handling to new file (fs/fs_struct.c)
    Get rid of bumping fs_struct refcount in pivot_root(2)
    Kill unsharing fs_struct in __set_personality()

    Linus Torvalds
     
  • We are wasting 2 words in signal_struct without any reason to implement
    task_pgrp_nr() and task_session_nr().

    task_session_nr() has no callers since
    2e2ba22ea4fd4bb85f0fa37c521066db6775cbef, we can remove it.

    task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and
    fs/coda.

    This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills
    __pgrp/__session and the related helpers.

    The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense
    anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Alan Cox [tty parts]
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Pavel Emelyanov
    Cc: Serge Hallyn
    Cc: Sukadev Bhattiprolu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy.

    task_pid_nr_ns(task) needs rcu/tasklist depending on task == current.

    As for "special" pids, vnr/nr_ns helpers always need rcu. However, if
    task != current, they are unsafe even under rcu lock, we can't trust
    task->group_leader without the special checks.

    And almost every helper has a callsite which needs a fix.

    Also, it is a bit annoying that the implementations of, say,
    task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical".

    This patch introduces the new helper, __task_pid_nr_ns(), which is always
    safe to use, and turns all other helpers into the trivial wrappers.

    After this I'll send another patch which converts task_tgid_xxx() as well,
    they're are a bit special.

    Signed-off-by: Oleg Nesterov
    Cc: Louis Rilling
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Even if task == current, it is not safe to dereference the result of
    task_pgrp/task_session. We can race with another thread which changes the
    special pid via setpgid/setsid.

    Document this. The next 2 patches give an example of the unsafe usage, we
    have more bad users.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Oleg Nesterov
    Cc: Louis Rilling
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • By discussion with Roland.

    - Rename ptrace_exit() to exit_ptrace(), and change it to do all the
    necessary work with ->ptraced list by its own.

    - Move this code from exit.c to ptrace.c

    - Update the comment in ptrace_detach() to explain the rechecking of
    the child->ptrace.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: "Metzger, Markus T"
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • First argument unused since 2.3.11.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Alexey Dobriyan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

02 Apr, 2009

1 commit


01 Apr, 2009

2 commits


31 Mar, 2009

1 commit

  • * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits)
    lockdep: fix deadlock in lockdep_trace_alloc
    lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB
    lockdep: annotate reclaim context (__GFP_NOFS), fix
    lockdep: build fix for !PROVE_LOCKING
    lockstat: warn about disabled lock debugging
    lockdep: use stringify.h
    lockdep: simplify check_prev_add_irq()
    lockdep: get_user_chars() redo
    lockdep: simplify get_user_chars()
    lockdep: add comments to mark_lock_irq()
    lockdep: remove macro usage from mark_held_locks()
    lockdep: fully reduce mark_lock_irq()
    lockdep: merge the !_READ mark_lock_irq() helpers
    lockdep: merge the _READ mark_lock_irq() helpers
    lockdep: simplify mark_lock_irq() helpers #3
    lockdep: further simplify mark_lock_irq() helpers
    lockdep: simplify the mark_lock_irq() helpers
    lockdep: split up mark_lock_irq()
    lockdep: generate usage strings
    lockdep: generate the state bit definitions
    ...

    Linus Torvalds
     

28 Mar, 2009

1 commit


27 Mar, 2009

1 commit

  • * 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
    sched: Add comments to find_busiest_group() function
    sched: Refactor the power savings balance code
    sched: Optimize the !power_savings_balance during fbg()
    sched: Create a helper function to calculate imbalance
    sched: Create helper to calculate small_imbalance in fbg()
    sched: Create a helper function to calculate sched_domain stats for fbg()
    sched: Define structure to store the sched_domain statistics for fbg()
    sched: Create a helper function to calculate sched_group stats for fbg()
    sched: Define structure to store the sched_group statistics for fbg()
    sched: Fix indentations in find_busiest_group() using gotos
    sched: Simple helper functions for find_busiest_group()
    sched: remove unused fields from struct rq
    sched: jiffies not printed per CPU
    sched: small optimisation of can_migrate_task()
    sched: fix typos in documentation
    sched: add avg_overlap decay
    x86, sched_clock(): mark variables read-mostly
    sched: optimize ttwu vs group scheduling
    sched: TIF_NEED_RESCHED -> need_reshed() cleanup
    sched: don't rebalance if attached on NULL domain
    ...

    Linus Torvalds
     

24 Mar, 2009

3 commits

  • Impact: more accurate timings

    The current method of function graph tracing does not take into
    account the time spent when a task is not running. This shows functions
    that call schedule have increased costs:

    3) + 18.664 us | }
    ------------------------------------------
    3) -0 => kblockd-123
    ------------------------------------------

    3) | finish_task_switch() {
    3) 1.441 us | _spin_unlock_irq();
    3) 3.966 us | }
    3) ! 2959.433 us | }
    3) ! 2961.465 us | }

    This patch uses the tracepoint in the scheduling context switch to
    account for time that has elapsed while a task is scheduled out.
    Now we see:

    ------------------------------------------
    3) -0 => edac-po-1067
    ------------------------------------------

    3) | finish_task_switch() {
    3) 0.685 us | _spin_unlock_irq();
    3) 2.331 us | }
    3) + 41.439 us | }
    3) + 42.663 us | }

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Add support for threaded interrupt handlers:

    A device driver can request that its main interrupt handler runs in a
    thread. To achive this the device driver requests the interrupt with
    request_threaded_irq() and provides additionally to the handler a
    thread function. The handler function is called in hard interrupt
    context and needs to check whether the interrupt originated from the
    device. If the interrupt originated from the device then the handler
    can either return IRQ_HANDLED or IRQ_WAKE_THREAD. IRQ_HANDLED is
    returned when no further action is required. IRQ_WAKE_THREAD causes
    the genirq code to invoke the threaded (main) handler. When
    IRQ_WAKE_THREAD is returned handler must have disabled the interrupt
    on the device level. This is mandatory for shared interrupt handlers,
    but we need to do it as well for obscure x86 hardware where disabling
    an interrupt on the IO_APIC level redirects the interrupt to the
    legacy PIC interrupt lines.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar

    Thomas Gleixner
     
  • James Morris
     

13 Mar, 2009

4 commits


12 Mar, 2009

1 commit


06 Mar, 2009

2 commits


05 Mar, 2009

2 commits


04 Mar, 2009

1 commit


02 Mar, 2009

1 commit


27 Feb, 2009

3 commits


16 Feb, 2009

1 commit


15 Feb, 2009

1 commit

  • Here is another version, with the incremental patch rolled up, and
    added reclaim context annotation to kswapd, and allocation tracing
    to slab allocators (which may only ever reach the page allocator
    in rare cases, so it is good to put annotations here too).

    Haven't tested this version as such, but it should be getting closer
    to merge worthy ;)

    --
    After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
    allocation when it should not have been, I thought it might be a good idea to
    try to catch this kind of thing with lockdep.

    I coded up a little idea that seems to work. Unfortunately the system has to
    actually be in __GFP_FS page reclaim, then take the lock, before it will mark
    it. But at least that might still be some orders of magnitude more common
    (and more debuggable) than an actual deadlock condition, so we have some
    improvement I hope (the concept is no less complete than discovery of a lock's
    interrupt contexts).

    I guess we could even do the same thing with __GFP_IO (normal reclaim), and
    even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
    code paths, so let's start there and see how it goes.

    It *seems* to work. I did a quick test.

    =================================
    [ INFO: inconsistent lock state ]
    2.6.28-rc6-00007-ged31348-dirty #26
    ---------------------------------
    inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
    modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
    (testlock){--..}, at: [] brd_init+0x55/0x216 [brd]
    {in-reclaim-W} state was registered at:
    [] __lock_acquire+0x75b/0x1a60
    [] lock_acquire+0x91/0xc0
    [] mutex_lock_nested+0xb1/0x310
    [] brd_init+0x2b/0x216 [brd]
    [] _stext+0x3b/0x170
    [] sys_init_module+0xaf/0x1e0
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff
    irq event stamp: 3929
    hardirqs last enabled at (3929): [] mutex_lock_nested+0x285/0x310
    hardirqs last disabled at (3928): [] mutex_lock_nested+0x59/0x310
    softirqs last enabled at (3732): [] sk_filter+0x83/0xe0
    softirqs last disabled at (3730): [] sk_filter+0x16/0xe0

    other info that might help us debug this:
    1 lock held by modprobe/8526:
    #0: (testlock){--..}, at: [] brd_init+0x55/0x216 [brd]

    stack backtrace:
    Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26
    Call Trace:
    [] print_usage_bug+0x193/0x1d0
    [] mark_lock+0xaf0/0xca0
    [] mark_held_locks+0x55/0xc0
    [] ? brd_init+0x0/0x216 [brd]
    [] trace_reclaim_fs+0x2a/0x60
    [] __alloc_pages_internal+0x475/0x580
    [] ? mutex_lock_nested+0x26e/0x310
    [] ? brd_init+0x0/0x216 [brd]
    [] brd_init+0x6a/0x216 [brd]
    [] ? brd_init+0x0/0x216 [brd]
    [] _stext+0x3b/0x170
    [] ? mutex_unlock+0x9/0x10
    [] ? __mutex_unlock_slowpath+0x10d/0x180
    [] ? trace_hardirqs_on_caller+0x12c/0x190
    [] sys_init_module+0xaf/0x1e0
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Nick Piggin
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Nick Piggin
     

13 Feb, 2009

2 commits


12 Feb, 2009

2 commits