03 Nov, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: fix readdir corner cases
    9p: fix readlink
    9p: fix a small bug in readdir for long directories

    Linus Torvalds
     
  • This reverts commit d0646f7b636d067d715fab52a2ba9c6f0f46b0d7, as
    requested by Eric Sandeen.

    It can basically cause an ext4 filesystem to miss recovery (and thus get
    mounted with errors) if the journal checksum does not match.

    Quoth Eric:

    "My hand-wavy hunch about what is happening is that we're finding a
    bad checksum on the last partially-written transaction, which is
    not surprising, but if we have a wrapped log and we're doing the
    initial scan for head/tail, and we abort scanning on that bad
    checksum, then we are essentially running an unrecovered filesystem.

    But that's hand-wavy and I need to go look at the code.

    We lived without journal checksums on by default until now, and at
    this point they're doing more harm than good, so we should revert
    the default-changing commit until we can fix it and do some good
    power-fail testing with the fixes in place."

    See

    http://bugzilla.kernel.org/show_bug.cgi?id=14354

    for all the gory details.

    Requested-by: Eric Sandeen
    Cc: Theodore Tso
    Cc: Alexey Fisher
    Cc: Maxim Levitsky
    Cc: Aneesh Kumar K.V
    Cc: Mathias Burén
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 Nov, 2009

3 commits

  • The patch below also addresses a couple of other corner cases in readdir
    seen with a large (e.g. 64k) msize. I'm not sure what people think of
    my co-opting of fid->aux here. I'd be happy to rework if there's a better
    way.

    When the size of the user supplied buffer passed to readdir is smaller
    than the data returned in one go by the 9P read request, v9fs_dir_readdir()
    currently discards extra data so that, on the next call, a 9P read
    request will be issued with offset < previous offset + bytes returned,
    which voilates the constraint described in paragraph 3 of read(5) description.
    This patch preseves the leftover data in fid->aux for use in the next call.

    Signed-off-by: Jim Garlick
    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     
  • I do not know if you've looked on the patch, but unfortunately it is
    incorrect. A suggested better version is in this email (the old
    version didn't work in case the user provided buffer was not long
    enough - it incorrectly appended null byte on a position of last char,
    and thus broke the contract of the readlink method). However, I'm
    still not sure this is 100% correct thing to do, I think readlink is
    supposed to return buffer without last null byte in all cases, but we
    do return last null byte (even the old version).. on the other hand it
    is likely unspecified what is in the remaining part of the buffer, so
    null character may be fine there ;):

    Signed-off-by: Martin Stava
    Signed-off-by: Eric Van Hensbergen

    Martin Stava
     
  • Here is a proposed patch for bug in readdir. Listing of dirs with
    many files fails without this patch.

    Signed-off-by: Martin Stava
    Signed-off-by: Eric Van Hensbergen

    Martin Stava
     

01 Nov, 2009

1 commit


30 Oct, 2009

6 commits

  • The xfs_quota returns ENOSYS when remove command is executed.
    Reproducable with following steps.

    # mount -t xfs -o uquota /dev/sda7 /mnt/mp1
    # xfs_quota -x -c off -c remove
    XFS_QUOTARM: Function not implemented.

    The remove command is allowed during quotaoff, but xfs_fs_set_xstate()
    checks whether quota is running, and it leads to ENOSYS.

    To solve this problem, add a check for X_QUOTARM.

    Signed-off-by: Ryota Yamauchi
    Signed-off-by: Utako Kusaka
    Signed-off-by: Christoph Hellwig

    Ryota Yamauchi
     
  • Commit bd169565993b39b9b4b102cdac8b13e0a259ce2f seems
    to have a slight regression where this code path:

    if (!--searchdistance) {
    /*
    * Not in range - save last search
    * location and allocate a new inode
    */
    ...
    goto newino;
    }

    doesn't free the temporary cursor (tcur) that got dup'd in
    this function.

    This leaks an item in the xfs_btree_cur zone, and it's caught
    on module unload:

    ===========================================================
    BUG xfs_btree_cur: Objects remaining on kmem_cache_close()
    -----------------------------------------------------------

    It seems like maybe a single free at the end of the function might
    be cleaner, but for now put a del_cursor right in this code block
    similar to the handling in the rest of the function.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Christoph Hellwig

    Eric Sandeen
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    backing-dev: ensure that a removed bdi no longer has super_block referencing it
    block: use after free bug in __blkdev_get
    block: silently error unsupported empty barriers too

    Linus Torvalds
     
  • * 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Fix hugetlbfs dependencies for SH-3 && MMU configurations.
    sh: Document uImage.bin target in archhelp.
    sh: add uImage.bin target
    sh: rsk7203 CONFIG_MTD=n fix
    sh: Check for return_to_handler when unwinding the stack
    sh: Build fix: define more __movmem* symbols
    sh: __irq_entry annotate do_IRQ().

    Fix up sh/powerpc conflicts in fs/Kconfig

    Linus Torvalds
     
  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFSv4: The link() operation should return any delegation on the file
    NFSv4: Fix two unbalanced put_rpccred() issues.
    NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE
    nfs: Panic when commit fails

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    [CIFS] Fixing to avoid invalid kfree() in cifs_get_tcp_session()

    Linus Torvalds
     

29 Oct, 2009

5 commits

  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule
    powerpc: Minor cleanup to lib/Kconfig.debug
    powerpc: Minor cleanup to sound/ppc/Kconfig
    powerpc: Minor cleanup to init/Kconfig
    powerpc: Limit memory hotplug support to PPC64 Book-3S machines
    powerpc: Limit hugetlbfs support to PPC64 Book-3S machines
    powerpc: Fix compile errors found by new ppc64e_defconfig
    powerpc: Add a Book-3E 64-bit defconfig
    powerpc/booke: Fix xmon single step on PowerPC Book-E
    powerpc: Align vDSO base address
    powerpc: Fix segment mapping in vdso32
    powerpc/iseries: Remove compiler version dependent hack
    powerpc/perf_events: Fix priority of MSR HV vs PR bits
    powerpc/5200: Update defconfigs
    drivers/serial/mpc52xx_uart.c: Use UPIO_MEM rather than SERIAL_IO_MEM
    powerpc/boot/dts: drop obsolete 'fsl5200-clocking'
    of: Remove nested function
    mpc5200: support for the MAN mpc5200 based board mucmc52
    mpc5200: support for the MAN mpc5200 based board uc101

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: fix double IRELE in xfs_dqrele_inode

    Linus Torvalds
     
  • A particular fsfuzzer run caused an hfs file system to crash on mount.
    This is due to a corrupted MDB extent record causing a miscalculation of
    HFS_I(inode)->first_blocks for the extent tree. If the extent records are
    zereod out, it won't trigger the first_blocks special case. Instead it
    falls through to the extent code which we're still in the middle of
    initializing.

    This patch catches the 0 size extent records, reports the corruption, and
    fails the mount.

    Reported-by: Ramon de Carvalho Valle
    Signed-off-by: Jeff Mahoney
    Cc: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • As found in , hfsplus is using type u32
    rather than sector_t for some sector number calculations.

    In particular, hfsplus_get_block() does:

    u32 ablock, dblock, mask;
    ...
    map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask));

    I am not confident that I can find and fix all cases where a sector number
    may be truncated. For now, avoid data loss by refusing to mount HFS+
    volumes with more than 2^32 sectors (2TB).

    [akpm@linux-foundation.org: fix 32 and 64-bit issues]
    Signed-off-by: Ben Hutchings
    Cc: Eric Sesterhenn
    Cc: Roman Zippel
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     
  • Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted
    line is being shown too far right (it does align with x86_64's VmallocChunk
    above, but I hope nobody will ever have that much corrupted!). Align it.

    Signed-off-by: Hugh Dickins
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

27 Oct, 2009

2 commits

  • Signed-off-by: Kumar Gala
    Signed-off-by: Benjamin Herrenschmidt

    Kumar Gala
     
  • The hugetlb dependencies presently depend on SUPERH && MMU while the
    hugetlb page size definitions depend on CPU_SH4 or CPU_SH5. This
    unfortunately allows SH-3 + MMU configurations to enable hugetlbfs
    without a corresponding HPAGE_SHIFT definition, resulting in the build
    blowing up.

    As SH-3 doesn't support variable page sizes, we tighten up the
    dependenies a bit to prevent hugetlbfs from being enabled. These days
    we also have a shiny new SYS_SUPPORTS_HUGETLBFS, so switch to using
    that rather than adding to the list of corner cases in fs/Kconfig.

    Reported-by: Kristoffer Ericson
    Signed-off-by: Paul Mundt

    Paul Mundt
     

26 Oct, 2009

3 commits

  • commit 0762b8bde9729f10f8e6249809660ff2ec3ad735
    (from 14 months ago) introduced a use-after-free bug which has just
    recently started manifesting in my md testing.
    I tried git bisect to find out what caused the bug to start
    manifesting, and it could have been the recent change to
    blk_unregister_queue (48c0d4d4c04) but the results were inconclusive.

    This patch certainly fixes my symptoms and looks correct as the two
    calls are now in the same order as elsewhere in that function.

    Signed-off-by: NeilBrown
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Neil Brown
     
  • Otherwise, we have to wait for the server to recall it.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Commits 29fba38b (nfs41: lease renewal) and fc01cea9 (nfs41: sequence
    operation) introduce a couple of put_rpccred() calls on credentials for
    which there is no corresponding get_rpccred().

    See http://bugzilla.kernel.org/show_bug.cgi?id=14249

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

24 Oct, 2009

2 commits

  • RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not
    supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc
    operations. The problem is that we map that error into EREMOTEIO in the XDR
    layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(),
    and nfs_increment_seqid() don't recognise it.

    The fix is to defer the mapping until after the middle layers have
    processed the error.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Actually pass the NFS_FILE_SYNC option to the server to avoid a
    Panic in nfs_direct_write_complete() when a commit fails.

    At the end of an nfs write, if the nfs commit fails, all the writes
    will be rescheduled. They are supposed to be rescheduled as NFS_FILE_SYNC
    writes, but the rpc_task structure is not completely intialized and so
    the option is not passed. When the rescheduled writes complete, the
    return indicates that they are NFS_UNSTABLE and we try to do another
    commit. This leads to a Panic because the commit data structure pointer
    was set to null in the initial (failed) commit attempt.

    Signed-off-by: Terry Loftin
    Signed-off-by: Trond Myklebust

    Terry Loftin
     

22 Oct, 2009

3 commits

  • * 'for-linus' of git://git.infradead.org/users/eparis/notify:
    dnotify: ignore FS_EVENT_ON_CHILD
    inotify: fix coalesce duplicate events into a single event in special case
    inotify: deprecate the inotify kernel interface
    fsnotify: do not set group for a mark before it is on the i_list

    Linus Torvalds
     
  • Fix a (small) memory leak in one of the error paths of the NFS mount
    options parsing code.

    Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
    rpc/rdma transport module automatically).

    Reported-by: Yinghai Lu
    Reported-by: Pekka Enberg
    Signed-off-by: Ingo Molnar
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • This patch fixes a null pointer exception in pipe_rdwr_open() which
    generates the stack trace:

    > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
    > [] pipe_rdwr_open+0x35/0x70
    > [] __dentry_open+0x13c/0x230
    > [] do_filp_open+0x2d/0x40
    > [] do_sys_open+0x5a/0x100
    > [] sysenter_do_call+0x1b/0x67

    The failure mode is triggered by an attempt to open an anonymous
    pipe via /proc/pid/fd/* as exemplified by this script:

    =============================================================
    while : ; do
    { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
    PID=$!
    OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
    { read PID REST ; echo $PID; } )
    OUT="${OUT%% *}"
    DELAY=$((RANDOM * 1000 / 32768))
    usleep $((DELAY * 1000 + RANDOM % 1000 ))
    echo n > /proc/$OUT/fd/1 # Trigger defect
    done
    =============================================================

    Note that the failure window is quite small and I could only
    reliably reproduce the defect by inserting a small delay
    in pipe_rdwr_open(). For example:

    static int
    pipe_rdwr_open(struct inode *inode, struct file *filp)
    {
    msleep(100);
    mutex_lock(&inode->i_mutex);

    Although the defect was observed in pipe_rdwr_open(), I think it
    makes sense to replicate the change through all the pipe_*_open()
    functions.

    The core of the change is to verify that inode->i_pipe has not
    been released before attempting to manipulate it. If inode->i_pipe
    is no longer present, return ENOENT to indicate so.

    The comment about potentially using atomic_t for i_pipe->readers
    and i_pipe->writers has also been removed because it is no longer
    relevant in this context. The inode->i_mutex lock must be used so
    that inode->i_pipe can be dealt with correctly.

    Signed-off-by: Earl Chew
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Earl Chew
     

21 Oct, 2009

1 commit

  • Mask off FS_EVENT_ON_CHILD in dnotify_handle_event(). Otherwise, when there
    is more than one watch on a directory and dnotify_should_send_event()
    succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause
    spurious events.

    This case was overlooked in commit e42e2773.

    #define _GNU_SOURCE

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static void create_event(int s, siginfo_t* si, void* p)
    {
    printf("create\n");
    }

    static void delete_event(int s, siginfo_t* si, void* p)
    {
    printf("delete\n");
    }

    int main (void) {
    struct sigaction action;
    char *tmpdir, *file;
    int fd1, fd2;

    sigemptyset (&action.sa_mask);
    action.sa_flags = SA_SIGINFO;

    action.sa_sigaction = create_event;
    sigaction (SIGRTMIN + 0, &action, NULL);

    action.sa_sigaction = delete_event;
    sigaction (SIGRTMIN + 1, &action, NULL);

    # define TMPDIR "/tmp/test.XXXXXX"
    tmpdir = malloc(strlen(TMPDIR) + 1);
    strcpy(tmpdir, TMPDIR);
    mkdtemp(tmpdir);

    # define TMPFILE "/file"
    file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1);
    sprintf(file, "%s/%s", tmpdir, TMPFILE);

    fd1 = open (tmpdir, O_RDONLY);
    fcntl(fd1, F_SETSIG, SIGRTMIN);
    fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE);

    fd2 = open (tmpdir, O_RDONLY);
    fcntl(fd2, F_SETSIG, SIGRTMIN + 1);
    fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE);

    if (fork()) {
    /* This triggers a create event */
    creat(file, 0600);
    /* This triggers a create and delete event (!) */
    unlink(file);
    } else {
    sleep(1);
    rmdir(tmpdir);
    }

    return 0;
    }

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Eric Paris

    Andreas Gruenbacher
     

19 Oct, 2009

2 commits

  • If we do rename a dir entry, like this:

    rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2")
    rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ")

    The duplicate events should be coalesced into a single event. But those two
    events do not be coalesced into a single event, due to some bad check in
    event_compare(). It can not match the two NULL inodes as the same event.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Eric Paris

    Wei Yongjun
     
  • fsnotify_add_mark is supposed to add a mark to the g_list and i_list and to
    set the group and inode for the mark. fsnotify_destroy_mark_by_entry uses
    the fact that ->group != NULL to know if this group should be destroyed or
    if it's already been done.

    But fsnotify_add_mark sets the group and inode before it actually adds the
    mark to the i_list and g_list. This can result in a race in inotify, it
    requires 3 threads.

    sys_inotify_add_watch("file") sys_inotify_add_watch("file") sys_inotify_rm_watch([a])
    inotify_update_watch()
    inotify_new_watch()
    inotify_add_to_idr()
    ^--- returns wd = [a]
    inotfiy_update_watch()
    inotify_new_watch()
    inotify_add_to_idr()
    fsnotify_add_mark()
    ^--- returns wd = [b]
    returns to userspace;
    inotify_idr_find([a])
    ^--- gives us the pointer from task 1
    fsnotify_add_mark()
    ^--- this is going to set the mark->group and mark->inode fields, but will
    return -EEXIST because of the race with [b].
    fsnotify_destroy_mark()
    ^--- since ->group != NULL we call back
    into inotify_freeing_mark() which calls
    inotify_remove_from_idr([a])

    since fsnotify_add_mark() failed we call:
    inotify_remove_from_idr([a]) group until we are sure the mark is
    on the inode and fsnotify_add_mark will return success.

    Signed-off-by: Eric Paris

    Eric Paris
     

16 Oct, 2009

2 commits


15 Oct, 2009

2 commits

  • sysfs_notify_dirent is a simple atomic operation that can be used to
    alert user-space that new data can be read from a sysfs attribute.

    Unfortunately it cannot currently be called from non-process context
    because of its use of spin_lock which is sometimes taken with
    interrupts enabled.

    So change all lockers of sysfs_open_dirent_lock to disable interrupts,
    thus making sysfs_notify_dirent safe to be called from non-process
    context (as drivers/md does in md_safemode_timeout).

    sysfs_get_open_dirent is (documented as being) only called from
    process context, so it uses spin_lock_irq. Other places
    use spin_lock_irqsave.

    The usage for sysfs_notify_dirent in md_safemode_timeout was
    introduced in 2.6.28, so this patch is suitable for that and more
    recent kernels.

    Reported-by: Joel Andres Granados
    Signed-off-by: NeilBrown
    Signed-off-by: Dan Williams
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Neil Brown
     
  • As device_move() and kobject_move() both handle a NULL destination,
    sysfs_move_dir() should do this as well (again) and fall back to
    sysfs_root in that case.

    Signed-off-by: Cornelia Huck
    Cc: Phil Carmody
    Signed-off-by: Greg Kroah-Hartman

    Cornelia Huck
     

14 Oct, 2009

6 commits