08 Dec, 2006

40 commits

  • do_exit:
    taskstats_exit_alloc()
    ...
    taskstats_exit_send()
    taskstats_exit_free()

    I think this is not good, let it be a single function exported to the core
    kernel, taskstats_exit(), which does alloc + send + free itself.

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If there are no listeners, every task does unneeded kmem_cache alloc/free on
    exit. We don't need listeners->sem for 'if (!list_empty())' check. Yes, we may
    have a false positive, but this doesn't differ from the case when the listener
    is unregistered after we drop the semaphore. So we don't need to do allocation
    beforehand.

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Acked-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Use put_pages_list() instead of opencoding it.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • This patch makes module init return proper value instead of -1 (-EPERM).

    Cc: Tim Waugh
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • probe_kernel_address() purports to be generic, only it forgot to select
    KERNEL_DS, so it presently won't work right on all architectures.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Add support for the parallel port (implemented as separate PCI function) on
    the Oxford Semiconductor OX16PCI952.

    Signed-off-by: Ryan Underwood
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryan Underwood
     
  • Make PRINTK_TIME depend on PRINTK. Only display/offer it if PRINTK is
    enabled.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Signed-off-by: Heiko Carstens
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads
    communicate via memory/futex when they exit, so pthread_join can
    synchronize using a simple futex wait. The word of user memory where NPTL
    stores a thread's own TID is what it passes; this gets reset to zero at
    thread exit.

    It is not desireable to touch this user memory when threads are dying due
    to a fatal signal. A core dump is more usefully representative of the
    dying program state if the threads live at the time of the crash have their
    NPTL data structures unperturbed. The userland expectation of
    CLONE_CHILD_CLEARTID has only ever been that it works for a thread making
    an _exit system call.

    This problem was identified by Ernie Petrides .

    Signed-off-by: Roland McGrath
    Cc: Ernie Petrides
    Cc: Jakub Jelinek
    Acked-by: Ingo Molnar
    Cc: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Function v9fs_get_idpool returns int, not u32. Actually it returns -1 on
    errors, and these two callers check if the value is smaller than 0, which
    was caught by gcc with extra warning flags. Compile tested only but should
    be OK, as the value computed in v9fs_get_idpool() is also int.

    Signed-of-by: Mika Kukkonen
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mika Kukkonen
     
  • Fix sparse NULL warning;
    drivers/misc/tifm_core.c:223:17: warning: Using plain integer as NULL pointer

    Fix style while there.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
    http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz

    Basically it makes a filesystem, splats some random bits over it, then
    tries to mount it and do some simple filesystem actions.

    At best, the filesystem catches the corruption gracefully. At worst,
    things spin out of control.

    As you might guess, we found a couple places in ext4 where things spin out
    of control :)

    First, we had a corrupted directory that was never checked for
    consistency... it was corrupt, and pointed to another bad "entry" of
    length 0. The for() loop looped forever, since the length of
    ext4_next_entry(de) was 0, and we kept looking at the same pointer over and
    over and over and over... I modeled this check and subsequent action on
    what is done for other directory types in ext4_readdir...

    (adding this check adds some computational expense; I am testing a followup
    patch to reduce the number of times we check and re-check these directory
    entries, in all cases. Thanks for the idea, Andreas).

    Next we had a root directory inode which had a corrupted size, claimed to
    be > 200M on a 4M filesystem. There was only really 1 block in the
    directory, but because the size was so large, readdir kept coming back for
    more, spewing thousands of printk's along the way.

    Per Andreas' suggestion, if we're in this read error condition and we're
    trying to read an offset which is greater than i_blocks worth of bytes,
    stop trying, and break out of the loop.

    With these two changes fsfuzz test survives quite well on ext4.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
    http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz

    Basically it makes a filesystem, splats some random bits over it, then
    tries to mount it and do some simple filesystem actions.

    At best, the filesystem catches the corruption gracefully. At worst,
    things spin out of control.

    As you might guess, we found a couple places in ext3 where things spin out
    of control :)

    First, we had a corrupted directory that was never checked for
    consistency... it was corrupt, and pointed to another bad "entry" of
    length 0. The for() loop looped forever, since the length of
    ext3_next_entry(de) was 0, and we kept looking at the same pointer over and
    over and over and over... I modeled this check and subsequent action on
    what is done for other directory types in ext3_readdir...

    (adding this check adds some computational expense; I am testing a followup
    patch to reduce the number of times we check and re-check these directory
    entries, in all cases. Thanks for the idea, Andreas).

    Next we had a root directory inode which had a corrupted size, claimed to
    be > 200M on a 4M filesystem. There was only really 1 block in the
    directory, but because the size was so large, readdir kept coming back for
    more, spewing thousands of printk's along the way.

    Per Andreas' suggestion, if we're in this read error condition and we're
    trying to read an offset which is greater than i_blocks worth of bytes,
    stop trying, and break out of the loop.

    With these two changes fsfuzz test survives quite well on ext3.

    Signed-off-by: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Randomizes -pie compiled binaries from 64k (0x10000) up to ELF_ET_DYN_BASE.

    0 -> 64k is excluded to allow NULL ptr accesses to fail.

    Signed-off-by: Marcus Meissner
    Cc: Ingo Molnar
    Cc: Dave Jones
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcus Meissner
     
  • - numeric string size replaced with constant in print_lock_name and
    print_lockdep_cache,

    - return on null pointer in print_lock_dependencies,

    - one more lockdep return with 0 with unlocking fix in mark_lock.

    Signed-off-by: Jarek Poplawski
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jarek Poplawski
     
  • Here are mainly some lockdep returns with 0 with unlocking fixes.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Jarek Poplawski
     
  • paride_register() returns 1 on success, 0 on failure and module init
    code looks like

    static int __init foo_init(void)
    {
    return paride_register(&foo) - 1;
    }

    which is not what one get used to. Converted to usual 0/-E convention.

    In case of kbic driver, unwind registration. It was just

    return (paride_register(&k951)||paride_register(&k971))-1;

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • We're about to change the semantics of pi_register()'s return value, so
    rename it to something else first, so that any unconverted code reliaby
    breaks.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • In order for spi_busnum_to_master to work spi master devices must be linked
    into the spi_master_class.subsys.kset list. At the moment the default
    class_obj_subsys.kset is used and we can't enumerate the master devices.

    Signed-off-by: Hans-Christian Egtvedt
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans-Christian Egtvedt
     
  • Correct the following in driver/spi/spi.c in function spi_busnum_to_master:

    * must allow bus_num 0, the if is really not needed.
    * correct the name buffer which is too small for bus_num >= 10000. It

    should be 9 bytes big, not 8.

    Signed-off-by: Hans-Christian Egtvedt
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans-Christian Egtvedt
     
  • lock_super() is unnecessary for setting super-block feature flags. Use the
    provided *_SET_COMPAT_FEATURE() macros as well.

    Signed-off-by: Andreas Gruenbacher
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     
  • A couple of minor code simplifications to the kernel/cpuset.c code. No
    functional change. Just a little less code and a little more readable.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • linux/cdev.h uses struct kobject and other structs and should therefore
    include them. Currently, a module either needs to add the missing includes
    itself, or, in case a module includes other headers already, needs to put
    last, which goes against a alphabetically-sorted include
    list.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Engelhardt
     
  • rmmod/3080 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
    (proc_subdir_lock){--..}, at: [] remove_proc_entry+0x40/0x191

    and this task is already holding:
    (ide_lock){++..}, at: [] ide_unregister_subdriver+0x39/0xc8
    which would create a new lock dependency:
    (ide_lock){++..} -> (proc_subdir_lock){--..}

    but this new dependency connects a hard-irq-safe lock:
    (ide_lock){++..}
    ... which became hard-irq-safe at:
    [] lock_acquire+0x4b/0x6b
    [] _spin_lock_irqsave+0x22/0x32
    [] ide_intr+0x17/0x1a9
    [] handle_IRQ_event+0x20/0x4d
    [] __do_IRQ+0x94/0xef
    [] do_IRQ+0x9e/0xbd

    to a hard-irq-unsafe lock:
    (proc_subdir_lock){--..}
    ... which became hard-irq-unsafe at:
    ... [] lock_acquire+0x4b/0x6b
    [] _spin_lock+0x19/0x28
    [] xlate_proc_name+0x1b/0x99
    [] proc_create+0x46/0xdf
    [] create_proc_entry+0x62/0xa5
    [] proc_misc_init+0x1c/0x1d2
    [] proc_root_init+0x4c/0xe9
    [] start_kernel+0x294/0x3b3

    Move ide_remove_proc_entries() out from under ide_lock; there is nothing
    that indicates that this is needed.

    In specific, the call to ide_add_proc_entries() is unprotected, and there
    is nothing else in the file using the respective ->proc fields. Also the
    lock order around destroy_proc_ide_interface() suggests this.

    Alan sayeth:

    proc_ide_write_settings walks the setting list under ide_setting_sem, read
    ditto. remove_proc_entry is doing proc side housekeeping.

    Looks fine to me, although that old code is such a mess anything could be
    going on.

    Signed-off-by: Peter Zijlstra
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Jeff noted that the via driver returned an error to an unsigned int in a
    a case where errors are not permitted. Move the check down earlier so we
    can handle it properly. Not as pretty but it works this way and avoids
    hacking up ugly stuff in the legacy ide core.

    Signed-off-by: Alan Cox
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • One of our test team hit a reiserfs_panic while running fsstress tests on
    2.6.19-rc1. The message looks like :

    REISERFS: panic(device Null superblock):
    reiserfs[5676]: assertion !(p->path_length != 1 ) failed at
    fs/reiserfs/stree.c:397:reiserfs_check_path: path not properly relsed.

    The backtrace looked :

    kernel BUG in reiserfs_panic at fs/reiserfs/prints.c:361!
    .reiserfs_check_path+0x58/0x74
    .reiserfs_get_block+0x1444/0x1508
    .__block_prepare_write+0x1c8/0x558
    .block_prepare_write+0x34/0x64
    .reiserfs_prepare_write+0x118/0x1d0
    .generic_file_buffered_write+0x314/0x82c
    .__generic_file_aio_write_nolock+0x350/0x3e0
    .__generic_file_write_nolock+0x78/0xb0
    .generic_file_write+0x60/0xf0
    .reiserfs_file_write+0x198/0x2038
    .vfs_write+0xd0/0x1b4
    .sys_write+0x4c/0x8c
    syscall_exit+0x0/0x4

    Upon debugging I found that the restart_transaction was not releasing
    the path if the th->refcount was > 1.

    /*static*/
    int restart_transaction(struct reiserfs_transaction_handle *th,
    struct inode *inode, struct path *path)
    {
    [...]

    /* we cannot restart while nested */
    if (th->t_refcount > 1) { <<i_sb)->j_next_async_flush = 1;

    -->> retval = restart_transaction(th, inode, &path); <refcount is > 1, the path is still valid. And,

    if (retval)
    goto failure;
    repeat =
    _allocate_block(th, block, inode,
    &allocated_block_nr, NULL, create);

    If the above allocate_block fails with NO_DISK_SPACE or QUOTA_EXCEEDED,
    we would have path which is not released.

    if (repeat != NO_DISK_SPACE && repeat != QUOTA_EXCEEDED) {
    goto research;
    }
    if (repeat == QUOTA_EXCEEDED)
    retval = -EDQUOT;
    else
    retval = -ENOSPC;
    goto failure;
    [...]

    failure:
    [...]
    reiserfs_check_path(&path); << Panics here !

    Attached here is a patch which could fix the issue.

    fix reiserfs/inode.c : restart_transaction() to release the path in all
    cases.

    The restart_transaction() doesn't release the path when the the journal
    handle has a refcount > 1. This would trigger a reiserfs_panic() if we
    encounter an -ENOSPC / -EDQUOT in reiserfs_get_block().

    Signed-off-by: Suzuki K P
    Cc: "Vladimir V. Saveliev"
    Cc:
    Cc: Jeff Mahoney
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suzuki K P
     
  • The new shared APM emulation just like its ARM and MIPS predecessors uses
    pm_suspend() which was only exported on SH. Move export to close to it's
    definition where it really should be anyway.

    Signed-off-by: Ralf Baechle
    Cc: Russell King
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     
  • Documentation update, adding references to CFQ scheduler and to another
    document about selecting IO Schedulers.

    Signed-off-by: Filipe Lautert
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Filipe
     
  • free_fdtable_rc() schedules timer to reschedule fddef->wq if
    schedule_work() on it returns 0. However, schedule_work() guarantees that
    the target work is executed at least once after the scheduling regardless
    of its return value. 0 return simply means that the work was already
    pending and thus no further action was required.

    Another problem is that it used contant '5' as @expires argument to
    mod_timer().

    Kill unnecessary fddef->timer.

    Signed-off-by: Tejun Heo
    Cc: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Add SysRq-X support: show blocked (TASK_UNINTERRUPTIBLE) tasks only.

    Useful for debugging IO stalls.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Randy Dunlap wote:
    > Should FUSE depend on BLOCK? Without that and with BLOCK=n, I get:
    >
    > inode.c:(.text+0x3acc5): undefined reference to `sb_set_blocksize'
    > inode.c:(.text+0x3a393): undefined reference to `get_sb_bdev'
    > fs/built-in.o:(.data+0xd718): undefined reference to `kill_block_super

    Most fuse filesystems work fine without block device support, so I
    think a better solution is to disable the 'fuseblk' filesystem type if
    BLOCK=n.

    Signed-off-by: Miklos Szeredi
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a DESTROY operation for block device based filesystems. With the help of
    this operation, such a filesystem can flush dirty data to the device
    synchronously before the umount returns.

    This is needed in situations where the filesystem is assumed to be clean
    immediately after unmount (e.g. ejecting removable media).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add support for the BMAP operation for block device based filesystems. This
    is needed to support swap-files and lilo.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add 'blksize' option for block device based filesystems. During
    initialization this is used to set the block size on the device and the super
    block. The default block size is 512bytes.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • I never intended this, but people started using fuse to implement block device
    based "real" filesystems (ntfs-3g, zfs).

    The following four patches add better support for these kinds of filesystems.
    Unlike "normal" fuse filesystems, using this feature should require superuser
    privileges (enforced by the fusermount utility).

    Thanks to Szabolcs Szakacsits for the input and testing.

    This patch adds a 'fuseblk' filesystem type, which is only different from the
    'fuse' filesystem type in how the 'dev_name' mount argument is interpreted.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Remove unneeded code from fuse_dentry_revalidate(). This made some sense
    while the validity time could wrap around, but now it's a very obvious no-op.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a flag to the RELEASE message which specifies that a FLUSH operation
    should be performed as well. This interface update is needed for the FreeBSD
    port, and doesn't actually touch the Linux implementation at all.

    Also rename the unused 'flush_flags' in the FLUSH message to 'unused'.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi