20 Aug, 2008

1 commit

  • I outwitted myself again in commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664,
    and broke the SA_NOCLDWAIT behavior so it leaks zombies. This fixes it.

    Reported-by: Andi Kleen
    Signed-off-by: Roland McGrath

    Roland McGrath
     

18 Aug, 2008

1 commit


17 Aug, 2008

2 commits


16 Aug, 2008

3 commits


15 Aug, 2008

8 commits

  • m68k fails to build with these functions inlined in completion.h. Move
    them out of line into sched.c and export them to avoid this problem.

    Signed-off-by: Dave Chinner
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Chinner
     
  • Functionally the same, but more conventional.

    Cc: Huang Ying
    Tested-by: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Ftrace depends on some processor state that we destroyed during kexec and
    restored by restore_processor_state(). So save_processor_state() and
    restore_processor_state() are moved into machine_kexec() and ftrace is
    restored after restore_processor_state().

    Signed-off-by: Huang Ying
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with
    current hibernation implementation.

    Signed-off-by: Huang Ying
    Acked-by: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the
    code.

    Signed-off-by: Huang Ying
    Acked-by: Pavel Machek
    Acked-by: Vivek Goyal
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
    page is used for not only code on some platform. For example in kexec
    jump, it is used for data and stack too.

    [akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
    Signed-off-by: Huang Ying
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Move if (kexec_image->preserve_context) { ... } into #ifdef
    CONFIG_KEXEC_JUMP to make code looks cleaner.

    Fix no longer correct comments of kernel_kexec().

    Signed-off-by: Huang Ying
    Acked-by: Vivek Goyal
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • kernel/kexec.c: In function 'kernel_kexec':
    kernel/kexec.c:1506: warning: value computed is not used

    Signed-off-by: Huang Ying
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     

14 Aug, 2008

4 commits

  • When we hot-unplug a cpu and rebuild the sched-domain, all cpus will be
    detatched. Alex observed the case where a runqueue was stealing bandwidth
    from an already disabled runqueue to satisfy its own needs.

    Stop this by skipping over already disabled runqueues.

    Reported-by: Alex Nixon
    Signed-off-by: Peter Zijlstra
    Tested-by: Alex Nixon
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
    the target process if that is not the current process and it is trying to
    change its own flags in a different way at the same time.

    __capable() is using neither atomic ops nor locking to protect t->flags. This
    patch removes __capable() and introduces has_capability() that doesn't set
    PF_SUPERPRIV on the process being queried.

    This patch further splits security_ptrace() in two:

    (1) security_ptrace_may_access(). This passes judgement on whether one
    process may access another only (PTRACE_MODE_ATTACH for ptrace() and
    PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
    current is the parent.

    (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only,
    and takes only a pointer to the parent process. current is the child.

    In Smack and commoncap, this uses has_capability() to determine whether
    the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
    This does not set PF_SUPERPRIV.

    Two of the instances of __capable() actually only act on current, and so have
    been changed to calls to capable().

    Of the places that were using __capable():

    (1) The OOM killer calls __capable() thrice when weighing the killability of a
    process. All of these now use has_capability().

    (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
    whether the parent was allowed to trace any process. As mentioned above,
    these have been split. For PTRACE_ATTACH and /proc, capable() is now
    used, and for PTRACE_TRACEME, has_capability() is used.

    (3) cap_safe_nice() only ever saw current, so now uses capable().

    (4) smack_setprocattr() rejected accesses to tasks other than current just
    after calling __capable(), so the order of these two tests have been
    switched and capable() is used instead.

    (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
    receive SIGIO on files they're manipulating.

    (6) In smack_task_wait(), we let a process wait for a privileged process,
    whether or not the process doing the waiting is privileged.

    I've tested this with the LTP SELinux and syscalls testscripts.

    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler
    Acked-by: Andrew G. Morgan
    Acked-by: Al Viro
    Signed-off-by: James Morris

    David Howells
     
  • With 2.6.27-rc3, I hit a kernel panic when running volanoMark on my
    new x86_64 machine. I also hit it with other 2.6.27-rc kernels.
    See below log.

    Basically, function walk_tg_tree and sched_create_group have a race
    between accessing and initiating tg->children. Below patch fixes it
    by moving tg->children initiation to the front of linking tg->siblings
    to parent->children.

    {----------------panic log------------}

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    IP: [] walk_tg_tree+0x45/0x7f
    PGD 1be1c4067 PUD 1bdd8d067 PMD 0
    Oops: 0000 [1] SMP
    CPU 11
    Modules linked in: igb
    Pid: 22979, comm: java Not tainted 2.6.27-rc3 #1
    RIP: 0010:[] [] walk_tg_tree+0x45/0x7f
    RSP: 0018:ffff8801bfbbbd18 EFLAGS: 00010083
    RAX: 0000000000000000 RBX: ffff8800be0dce40 RCX: ffffffffffffffc0
    RDX: ffff880102c43740 RSI: 0000000000000000 RDI: ffff8800be0dce40
    RBP: ffff8801bfbbbd48 R08: ffff8800ba437bc8 R09: 0000000000001f40
    R10: ffff8801be812100 R11: ffffffff805fdf44 R12: ffff880102c43740
    R13: 0000000000000000 R14: ffffffff8022cf0f R15: ffffffff8022749f
    FS: 00000000568ac950(0063) GS:ffff8801bfa26d00(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 00000001bd848000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process java (pid: 22979, threadinfo ffff8801b145a000, task ffff8801bf18e450)
    Stack: 0000000000000001 ffff8800ba5c8d60 0000000000000001 0000000000000001
    ffff8800bad1ccb8 0000000000000000 ffff8801bfbbbd98 ffffffff8022ed37
    0000000000000001 0000000000000286 ffff8801bd5ee180 ffff8800ba437bc8
    Call Trace:
    [] try_to_wake_up+0x71/0x24c
    [] autoremove_wake_function+0x9/0x2e
    [] ? __wake_up_common+0x46/0x76
    [] __wake_up+0x38/0x4f
    [] tcp_v4_rcv+0x380/0x62e

    Signed-off-by: Zhang Yanmin
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Zhang, Yanmin
     
  • Use WARN() instead of a printk+WARN_ON() pair; this way the message
    becomes part of the warning section for better reporting/collection.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton

    Arjan van de Ven
     

13 Aug, 2008

5 commits


12 Aug, 2008

11 commits

  • …el/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    fix spinlock recursion in hvc_console
    stop_machine: remove unused variable
    modules: extend initcall_debug functionality to the module loader
    export virtio_rng.h
    lguest: use get_user_pages_fast() instead of get_user_pages()
    mm: Make generic weak get_user_pages_fast and EXPORT_GPL it
    lguest: don't set MAC address for guest unless specified

    Linus Torvalds
     
  • > > Nick Piggin (1):
    > > generic-ipi: fix stack and rcu interaction bug in
    > > smp_call_function_mask()
    >
    > I'm still not 100% sure that I have this patch right... I might have seen
    > a lockup trace implicating the smp call function path... which may have
    > been due to some other problem or a different bug in the new call function
    > code, but if some more people can take a look at it before merging?

    OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
    data properly in the alloc && wait case. Secondly, I wasn't resetting
    CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
    wait).

    After those fixes, the patch boots and runs with the kmalloc commented
    out (so it always executes the slowpath).

    Signed-off-by: Ingo Molnar

    Nick Piggin
     
  • Signed-off-by: Li Zefan
    Signed-off-by: Rusty Russell

    Li Zefan
     
  • The kernel has this really nice facility where if you put "initcall_debug"
    on the kernel commandline, it'll print which function it's going to
    execute just before calling an initcall, and then after the call completes
    it will

    1) print if it had an error code

    2) checks for a few simple bugs (like leaving irqs off)
    and

    3) print how long the init call took in milliseconds.

    While trying to optimize the boot speed of my laptop, I have been loving
    number 3 to figure out what to optimize... ... and then I wished that
    the same thing was done for module loading.

    This patch makes the module loader use this exact same functionality; it's
    a logical extension in my view (since modules are just sort of late
    binding initcalls anyway) and so far I've found it quite useful in finding
    where things are too slow in my boot.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Rusty Russell

    Arjan van de Ven
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks
    sched: fix mysql+oltp regression
    sched_clock: delay using sched_clock()
    sched clock: couple local and remote clocks
    sched clock: simplify __update_sched_clock()
    sched: eliminate scd->prev_raw
    sched clock: clean up sched_clock_cpu()
    sched clock: revert various sched_clock() changes
    sched: move sched_clock before first use
    sched: test runtime rather than period in global_rt_runtime()
    sched: fix SCHED_HRTICK dependency
    sched: fix warning in hrtick_start_fair()

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    posix-timers: fix posix_timer_event() vs dequeue_signal() race
    posix-timers: do_schedule_next_timer: fix the setting of ->si_overrun

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    lockdep: fix debug_lock_alloc
    lockdep: increase MAX_LOCKDEP_KEYS
    generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask()
    lockdep: fix overflow in the hlock shrinkage code
    lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]()
    lockdep: handle chains involving classes defined in modules
    mm: fix mm_take_all_locks() locking order
    lockdep: annotate mm_take_all_locks()
    lockdep: spin_lock_nest_lock()
    lockdep: lock protection locks
    lockdep: map_acquire
    lockdep: shrink held_lock structure
    lockdep: re-annotate scheduler runqueues
    lockdep: lock_set_subclass - reset a held lock's subclass
    lockdep: change scheduler annotation
    debug_locks: set oops_in_progress if we will log messages.
    lockdep: fix combinatorial explosion in lock subgraph traversal

    Linus Torvalds
     
  • Ingo Molnar
     
  • Ingo Molnar
     
  • When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
    LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
    we should be doing hlock checking (check=1).

    This causes a false warning and a lockdep self-disable.

    Rectify this.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Aug, 2008

5 commits

  • Mark Langsdorf reported:

    > One of my co-workers noticed that the powernow-k8
    > driver no longer restarts when a CPU core is
    > hot-disabled and then hot-enabled on AMD quad-core
    > systems.
    >
    > The following comands work fine on 2.6.26 and fail
    > on 2.6.27-rc1:
    >
    > echo 0 > /sys/devices/system/cpu/cpu3/online
    > echo 1 > /sys/devices/system/cpu/cpu3/online
    > find /sys -name cpufreq
    >
    > For 2.6.26, the find will return a cpufreq
    > directory for each processor. In 2.6.27-rc1,
    > the cpu3 directory is missing.
    >
    > After digging through the code, the following
    > logic is failing when the core is hot-enabled
    > at runtime. The code works during the boot
    > sequence.
    >
    > cpumask_t = current->cpus_allowed;
    > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
    > if (smp_processor_id() != cpu)
    > return -ENODEV;

    So set the CPU active before calling the CPU_ONLINE notifier chain,
    there are a handful of notifiers that use set_cpus_allowed().

    This fix also solves the problem with x86-microcode. I've sent
    alternative patches for microcode, but as this "rely on
    set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
    assumption seems to be more broad than what we thought, perhaps this fix
    should be applied.

    With this patch we define that by the moment CPU_ONLINE is being sent,
    a 'cpu' is online and ready for tasks to be migrated onto it.

    Signed-off-by: Dmitry Adamushko
    Reported-by: Mark Langsdorf
    Tested-by: Mark Langsdorf
    Signed-off-by: Ingo Molnar

    Dmitry Adamushko
     
  • * Venki Pallipadi wrote:

    > Found a OOPS on a big SMP box during an overnight reboot test with
    > upstream git.
    >
    > Suresh and I looked at the oops and looks like the root cause is in
    > generic_smp_call_function_interrupt() and smp_call_function_mask() with
    > wait parameter.
    >
    > The actual oops looked like
    >
    > [ 11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
    > [ 11.277815] IP: [] 0xffff8802ffffffff
    > [ 11.278155] PGD 202063 PUD 0
    > [ 11.278576] Oops: 0010 [1] SMP
    > [ 11.279006] CPU 5
    > [ 11.279336] Modules linked in:
    > [ 11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f #290
    > [ 11.280039] RIP: 0010:[] [] 0xffff8802ffffffff
    > [ 11.280692] RSP: 0018:ffff88027f1f7f70 EFLAGS: 00010086
    > [ 11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
    > [ 11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
    > [ 11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
    > [ 11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
    > [ 11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
    > [ 11.282502] FS: 0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
    > [ 11.283096] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    > [ 11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
    > [ 11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > [ 11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > [ 11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
    > [ 11.284936] Stack: ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
    > [ 11.285802] ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
    > [ 11.286599] ffffffff8020bdd6 ffff88027f1f3e40 ffff88027f1f3ef8 0000000000000000
    > [ 11.287120] Call Trace:
    > [ 11.287768] [] ? generic_smp_call_function_interrupt+0x61/0x12c
    > [ 11.288354] [] smp_call_function_interrupt+0x17/0x27
    > [ 11.288744] [] call_function_interrupt+0x66/0x70
    > [ 11.289030] [] ? clockevents_notify+0x19/0x73
    > [ 11.289380] [] ? acpi_idle_enter_simple+0x18b/0x1fa
    > [ 11.289760] [] ? acpi_idle_enter_simple+0x181/0x1fa
    > [ 11.290051] [] ? cpuidle_idle_call+0x70/0xa2
    > [ 11.290338] [] ? cpu_idle+0x5f/0x7d
    > [ 11.290723] [] ? start_secondary+0x14d/0x152
    > [ 11.291010]
    > [ 11.291287]
    > [ 11.291654] Code: Bad RIP value.
    > [ 11.292041] RIP [] 0xffff8802ffffffff
    > [ 11.292380] RSP
    > [ 11.292741] CR2: ffff8802ffffffff
    > [ 11.310951] ---[ end trace 137c54d525305f1c ]---
    >
    > The problem is with the following sequence of events:
    >
    > - CPU A calls smp_call_function_mask() for CPU B with wait parameter
    > - CPU A sets up the call_function_data on the stack and does an rcu add to
    > call_function_queue
    > - CPU A waits until the WAIT flag is cleared
    > - CPU B gets the call function interrupt and starts going through the
    > call_function_queue
    > - CPU C also gets some other call function interrupt and starts going through
    > the call_function_queue
    > - CPU C, which is also going through the call_function_queue, starts referencing
    > CPU A's stack, as that element is still in call_function_queue
    > - CPU B finishes the function call that CPU A set up and as there are no other
    > references to it, rcu deletes the call_function_data (which was from CPU A
    > stack)
    > - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
    > - CPU A which was waiting on the flag continues executing and the stack
    > contents change
    >
    > - CPU C is still in rcu_read section accessing the CPU A's stack sees
    > inconsistent call_funation_data and can try to execute
    > function with some random pointer, causing stack corruption for A
    > (by clearing the bits in mask field) and oops.

    Nice debugging work.

    I'd suggest something like the attached (boot tested) patch as the simple
    fix for now.

    I expect the benefits from the less synchronized, multiple-in-flight-data
    global queue will still outweigh the costs of dynamic allocations. But
    if worst comes to worst then we just go back to a globally synchronous
    one-at-a-time implementation, but that would be pretty sad!

    Signed-off-by: Ingo Molnar

    Nick Piggin
     
  • Defer commit 6d299f1b53b84e2665f402d9bcc494800aba6386 to the next release.

    Testing of the tip/sched/clock tree revealed a mysql+oltp regression
    which bisection eventually traced back to this commit in mainline.

    Pertinent test results: Three run sysbench averages, throughput units
    in read/write requests/sec.

    clients 1 2 4 8 16 32 64
    6e0534f 9646 17876 34774 33868 32230 30767 29441
    2.6.26.1 9112 17936 34652 33383 31929 30665 29232
    6d299f1 9112 14637 28370 33339 32038 30762 29204

    Note: subsequent commits hide the majority of this regression until you
    apply the clock fixes, at which time it reemerges at full magnitude.

    We cannot see anything bad about the change itself so we defer it to the
    next release until this problem is fully analysed.

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    Cc: Gregory Haskins
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Ingo Molnar
     
  • the names were too generic:

    drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
    drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
    drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)

    Signed-off-by: Ingo Molnar

    Ingo Molnar