06 Dec, 2006

1 commit


05 Dec, 2006

1 commit


04 Dec, 2006

2 commits


03 Dec, 2006

3 commits


02 Dec, 2006

1 commit

  • Show the drivers, which belong to the module:
    $ ls -l /sys/module/usbcore/drivers/
    hub -> ../../../bus/usb/drivers/hub
    usb -> ../../../bus/usb/drivers/usb
    usbfs -> ../../../bus/usb/drivers/usbfs

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

29 Nov, 2006

3 commits


26 Nov, 2006

2 commits


23 Nov, 2006

1 commit

  • This reverts commit f72fa707604c015a6625e80f269506032d5430dc, and solves
    the problem that it tried to fix by simply making "__do_IRQ()" call the
    note_interrupt() function without the lock held, the way everybody else
    does.

    It should be noted that all interrupt handling code must never allow the
    descriptor actors to be entered "recursively" (that's why we do all the
    magic IRQ_PENDING stuff in the first place), so there actually is
    exclusion at that much higher level, even in the absense of locking.

    Acked-by: Vivek Goyal
    Acked-by:Pavel Emelianov
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Nov, 2006

5 commits

  • Fix up for make allyesconfig.

    Signed-Off-By: David Howells

    David Howells
     
  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells
     
  • Reclaim a word from the size of the work_struct by folding the pending bit and
    the wq_data pointer together. This shouldn't cause misalignment problems as
    all pointers should be at least 4-byte aligned.

    Signed-Off-By: David Howells

    David Howells
     
  • Define a type for the work function prototype. It's not only kept in the
    work_struct struct, it's also passed as an argument to several functions.

    This makes it easier to change it.

    Signed-Off-By: David Howells

    David Howells
     
  • Separate delayable work items from non-delayable work items be splitting them
    into a separate structure (delayed_work), which incorporates a work_struct and
    the timer_list removed from work_struct.

    The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
    architecture it's nearly 100 bytes in size. This reduces that by half for the
    non-delayable type of event.

    Signed-Off-By: David Howells

    David Howells
     

18 Nov, 2006

1 commit

  • lockdep got confused by certain locks in modules:

    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.

    Call Trace:
    [] dump_trace+0xaa/0x3f2
    [] show_trace+0x3a/0x60
    [] dump_stack+0x15/0x17
    [] __lock_acquire+0x724/0x9bb
    [] lock_acquire+0x4d/0x67
    [] rt_spin_lock+0x3d/0x41
    [] :ip_conntrack:__ip_ct_refresh_acct+0x131/0x174
    [] :ip_conntrack:udp_packet+0xbf/0xcf
    [] :ip_conntrack:ip_conntrack_in+0x394/0x4a7
    [] nf_iterate+0x41/0x7f
    [] nf_hook_slow+0x64/0xd5
    [] ip_rcv+0x24e/0x506
    [...]

    Steven Rostedt found the bug: static_obj() check did not take
    PERCPU_ENOUGH_ROOM into account, so in-module DEFINE_PER_CPU-area locks
    were triggering this message.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

17 Nov, 2006

1 commit

  • I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine.

    Below is the log.

    Oops 11012296146944 [1]
    Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal processor f
    an container button sg eepro100 e100 mii

    Pid: 0, CPU 0, comm: swapper
    psr : 0000121008022038 ifs : 800000000000040b ip : [] Not
    tainted
    ip is at __do_IRQ+0x371/0x3e0
    unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003
    rnat: 656960155aa56aa5 bsps: a00000010058b890 pr : 656960155aa55a65
    ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
    csd : 0000000000000000 ssd : 0000000000000000
    b0 : a0000001000e1390 b6 : a0000001005beac0 b7 : e00000007f01aa00
    f6 : 000000000000000000000 f7 : 0ffe69090000000000000
    f8 : 1000a9090000000000000 f9 : 0ffff8000000000000000
    f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909
    r1 : a000000100fbbff0 r2 : 0000000000010002 r3 : 0000000000010001
    r8 : fffffffffffbffff r9 : a000000100bd8060 r10 : a000000100dd83b8
    r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000
    r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8
    r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300
    r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0
    r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000
    r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c
    r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0

    Call Trace:
    [] show_stack+0x40/0xa0
    sp=a000000100bcb760 bsp=a000000100bc4f40
    [] show_regs+0x840/0x880
    sp=a000000100bcb930 bsp=a000000100bc4ee8
    [] die+0x250/0x320
    sp=a000000100bcb930 bsp=a000000100bc4ea0
    [] ia64_do_page_fault+0x8d0/0xa20
    sp=a000000100bcb950 bsp=a000000100bc4e50
    [] ia64_leave_kernel+0x0/0x290
    sp=a000000100bcb9e0 bsp=a000000100bc4e50
    [] __do_IRQ+0x370/0x3e0
    sp=a000000100bcbbb0 bsp=a000000100bc4df0
    [] ia64_handle_irq+0x170/0x220
    sp=a000000100bcbbb0 bsp=a000000100bc4dc0
    [] ia64_leave_kernel+0x0/0x290
    sp=a000000100bcbbb0 bsp=a000000100bc4dc0
    [] ia64_pal_call_static+0x90/0xc0
    sp=a000000100bcbd80 bsp=a000000100bc4d78
    [] default_idle+0x90/0x160
    sp=a000000100bcbd80 bsp=a000000100bc4d58
    [] cpu_idle+0x1f0/0x440
    sp=a000000100bcbe20 bsp=a000000100bc4d18
    [] rest_init+0xc0/0xe0
    sp=a000000100bcbe20 bsp=a000000100bc4d00
    [] start_kernel+0x6a0/0x6c0
    sp=a000000100bcbe20 bsp=a000000100bc4ca0
    [] __end_ivt_text+0x6d0/0x6f0
    sp=a000000100bcbe30 bsp=a000000100bc4c00
    Kernel panic - not syncing: Aiee, killing interrupt handler!

    The root cause is that some irq_chip variables, especially ia64_msi_chip,
    initiate their memeber end to point to NULL. __do_IRQ doesn't check
    if irq_chip->end is null and just calls it after processing the interrupt.

    As irq_chip->end is called at many places, so I fix it by reinitiating
    irq_chip->end to dummy_irq_chip.end, e.g., a noop function.

    Signed-off-by: Zhang Yanmin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     

15 Nov, 2006

2 commits

  • This reverts commit 0130b0b32ee53dc7add773fcea984f6a26ef1da3.

    Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was
    supposed to fix must be some unrelated memory corruption, and the "fix"
    actually causes more problems:

    "However, the new code does not look safe in all cases. If some other
    task has opened more files while dup_fd() released oldf->file_lock, the
    new code will update open_files to the new larger value. But newf was
    allocated with the old smaller value of open_files, therefore subsequent
    accesses to newf may try to write into unallocated memory."

    so revert it.

    Cc: Sharyathi Nagesh
    Cc: Sergey Vlasov
    Cc: Vadim Lobanov
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • When we get a mismatch between handlers on the same IRQ, all we get is "IRQ
    handler type mismatch for IRQ n". Let's print the name of the
    presently-registered handler with which we got the mismatch.

    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

13 Nov, 2006

2 commits

  • While testing kernel on machine with "irqpoll" option I've caught such a
    lockup:

    __do_IRQ()
    spin_lock(&desc->lock);
    desc->chip->ack(); /* IRQ is ACKed */
    note_interrupt()
    misrouted_irq()
    handle_IRQ_event()
    if (...)
    local_irq_enable_in_hardirq();
    /* interrupts are enabled from now */
    ...
    __do_IRQ() /* same IRQ we've started from */
    spin_lock(&desc->lock); /* LOCKUP */

    Looking at misrouted_irq() code I've found that a potential deadlock like
    this can also take place:

    1CPU:
    __do_IRQ()
    spin_lock(&desc->lock); /* irq = A */
    misrouted_irq()
    for (i = 1; i < NR_IRQS; i++) {
    spin_lock(&desc->lock); /* irq = B */
    if (desc->status & IRQ_INPROGRESS) {

    2CPU:
    __do_IRQ()
    spin_lock(&desc->lock); /* irq = B */
    misrouted_irq()
    for (i = 1; i < NR_IRQS; i++) {
    spin_lock(&desc->lock); /* irq = A */
    if (desc->status & IRQ_INPROGRESS) {

    As the second lock on both CPUs is taken before checking that this irq is
    being handled in another processor this may cause a deadlock. This issue
    is only theoretical.

    I propose the attached patch to fix booth problems: when trying to handle
    misrouted IRQ active desc->lock may be unlocked.

    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • On running the Stress Test on machine for more than 72 hours following
    error message was observed.

    0:mon> e
    cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
    pc: c000000000060d90: .dup_fd+0x240/0x39c
    lr: c000000000060d6c: .dup_fd+0x21c/0x39c
    sp: c00000007ce2fa70
    msr: 800000000000b032
    dar: ffffffff00000028
    dsisr: 40000000
    current = 0xc000000074950980
    paca = 0xc000000000454500
    pid = 27330, comm = bash

    0:mon> t
    [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
    [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
    [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
    [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
    [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
    [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc

    The problem is because of race window. When if(expand) block is executed in
    dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be
    modified. So actual open_files in oldf may not match with open_files
    variable.

    Cc: Vadim Lobanov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sharyathi Nagesh
     

06 Nov, 2006

4 commits

  • Since it is becoming clear that there are just enough users of the binary
    sysctl interface that completely removing the binary interface from the kernel
    will not be an option for foreseeable future, we need to find a way to address
    the sysctl maintenance issues.

    The basic problem is that sysctl requires one central authority to allocate
    sysctl numbers, or else conflicts and ABI breakage occur. The proc interface
    to sysctl does not have that problem, as names are not densely allocated.

    By not terminating a sysctl table until I have neither a ctl_name nor a
    procname, it becomes simple to add sysctl entries that don't show up in the
    binary sysctl interface. Which allows people to avoid allocating a binary
    sysctl value when not needed.

    I have audited the kernel code and in my reading I have not found a single
    sysctl table that wasn't terminated by a completely zero filled entry. So
    this change in behavior should not affect anything.

    I think this mechanism eases the pain enough that combined with a little
    disciple we can solve the reoccurring sysctl ABI breakage.

    Signed-off-by: Eric W. Biederman
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Don't warn about libpthread's access to kernel.version. When it receives
    -ENOSYS it will read /proc/sys/kernel/version.

    If anything else shows up print the sysctl number string.

    Signed-off-by: Eric W. Biederman
    Cc: Cal Peake
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Make the delayacct lock irqsave; this avoids the possible deadlock where
    an interrupt is taken while holding the delayacct lock which needs to
    take the delayacct lock.

    Signed-off-by: Peter Zijlstra
    Acked-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Cpu-hotplug locking has a minor race case caused because of setting the
    variable "recursive" to NULL *after* releasing the cpu_bitmask_lock in the
    function unlock_cpu_hotplug,instead of doing so before releasing the
    cpu_bitmask_lock.

    This was the cause of most of the recent false spurious lock_cpu_unlock
    warnings.

    This should fix the problem reported by Martin Lorenz reported in
    http://lkml.org/lkml/2006/10/29/127.

    Thanks to Srinivasa DS for pointing it out.

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     

05 Nov, 2006

2 commits

  • The previous commit (45c18b0bb579b5c1b89f8c99f1b6ffa4c586ba08, aka "Fix
    unlikely (but possible) race condition on task->user access") fixed a
    potential oops due to __sigqueue_alloc() getting its "user" pointer out
    of sync with switch_user(), and accessing a user pointer that had been
    de-allocated on another CPU.

    It still left another (much less serious) problem, where a concurrent
    __sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal
    pending reference counting for a _different_ user than the one it then
    actually ended up using. No oops, but we'd end up with the wrong signal
    accounting.

    Another case of Oleg's eagle-eyes picking up the problem.

    This is trivially fixed by just making sure we load whichever "user"
    structure we decide to use (it doesn't matter _which_ one we pick, we
    just need to pick one) just once.

    Acked-by: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • There's a possible race condition when doing a "switch_uid()" from one
    user to another, which could race with another thread doing a signal
    allocation and looking at the old thread ->user pointer as it is freed.

    This explains an oops reported by Lukasz Trabinski:
    http://permalink.gmane.org/gmane.linux.kernel/462241

    We fix this by delaying the (reference-counted) freeing of the user
    structure until the thread signal handler lock has been released, so
    that we know that the signal allocation has either seen the new value or
    has properly incremented the reference count of the old one.

    Race identified by Oleg Nesterov.

    Cc: Lukasz Trabinski
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Nov, 2006

4 commits

  • This is needed on bigendian 64bit architectures.

    Signed-off-by: Stephen Rothwell
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • Add a swsusp debugging mode. This does everything that's needed for a suspend
    except for actually suspending. So we can look in the log messages and work
    out a) what code is being slow and b) which drivers are misbehaving.

    (1)
    # echo testproc > /sys/power/disk
    # echo disk > /sys/power/state

    This should turn off the non-boot CPU, freeze all processes, wait for 5
    seconds and then thaw the processes and the CPU.

    (2)
    # echo test > /sys/power/disk
    # echo disk > /sys/power/state

    This should turn off the non-boot CPU, freeze all processes, shrink
    memory, suspend all devices, wait for 5 seconds, resume the devices etc.

    Cc: Pavel Machek
    Cc: Stefan Seyfried
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it
    shouldn't).

    Add a warning printk, give any remaining users six months to migrate off it.

    Cc: Ulrich Drepper
    Cc: Ingo Molnar
    Acked-by: Thomas Gleixner
    Cc: Rusty Russell
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • printk_ratelimit() has global state which makes it not useful for callers
    which wish to perform ratelimiting at a particular frequency.

    Add a printk_timed_ratelimit() which utilises caller-provided state storage to
    permit more flexibility.

    This function can in fact be used for things other than printk ratelimiting
    and is perhaps poorly named.

    Cc: Ulrich Drepper
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Nov, 2006

1 commit

  • If there are no listeners, taskstats_exit_send() just returns because
    taskstats_exit_alloc() didn't allocate *tidstats. This is wrong, each
    sub-thread should do fill_tgid_exit() on exit, otherwise its ->delays is
    not recorded in ->signal->stats and lost.

    Q: We don't send TASKSTATS_TYPE_AGGR_TGID when single-threaded process
    exits. Is it good? How can the listener figure out that it was actually a
    process exit, not sub-thread?

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Acked-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

31 Oct, 2006

2 commits


30 Oct, 2006

2 commits

  • prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()),
    but then it does genlmsg_put()->nlmsg_put(). This means we forget to
    reserve a room for 'struct nlmsghdr'.

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Graf
    Cc: Andrew Morton
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send
    potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer
    we own sk_buff. This means we should always do kfree_skb() on failure.

    [ Thomas acked and pointed out missing return value in original version ]

    Signed-off-by: Oleg Nesterov
    Acked-by: Thomas Graf
    Cc: Andrew Morton
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Linus Torvalds

    Oleg Nesterov