01 May, 2010

1 commit

  • CONFIG_INOTIFY_USER defined but CONFIG_ANON_INODES undefined will result
    in the following build failure:

    LD vmlinux
    fs/built-in.o: In function 'sys_inotify_init1':
    (.text.sys_inotify_init1+0x22c): undefined reference to 'anon_inode_getfd'
    fs/built-in.o: In function `sys_inotify_init1':
    (.text.sys_inotify_init1+0x22c): relocation truncated to fit: R_MIPS_26 against 'anon_inode_getfd'
    make[2]: *** [vmlinux] Error 1
    make[1]: *** [sub-make] Error 2
    make: *** [all] Error 2

    Signed-off-by: Ralf Baechle
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     

30 Apr, 2010

6 commits


29 Apr, 2010

5 commits

  • The pktcdvd driver uses proper locking and does not need the BKL in the
    ioctl and llseek functions of the character device, so kill both.

    Moving the compat_ioctl handling from common code into the driver itself
    fixes build problems when CONFIG_BLOCK is disabled.

    Acked-by: Randy Dunlap
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Commit b3d0ab7e60d1865bb6f6a79a77aaba22f2543236 ("exofs: add bdi backing
    to mount session") has a bug in the placement of the bdi member at
    struct exofs_sb_info. The layout member must be kept last.

    Signed-off-by: Boaz Harrosh
    Acked-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Boaz Harrosh
     
  • If dentry found stale happens to be a root of disconnected tree, we
    can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
    would simply hide it from shrink_dcache_for_umount(), leading to
    all sorts of fun, including busy inodes on umount and oopsen after
    that.

    Bug had been there since at least 2006 (commit c636eb already has it),
    so it's definitely -stable fodder.

    Signed-off-by: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for
    export_path in nfs4_validate_text_mount_data, so we need to free it then.
    This is addressed in following kmemleak report:

    unreferenced object 0xffff88016bf48a50 (size 16):
    comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s)
    hex dump (first 16 bytes):
    2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5 /opt/work.kkkkk.
    backtrace:
    [] kmemleak_alloc+0x60/0xa7
    [] kmemleak_alloc_recursive.clone.5+0x1b/0x1d
    [] __kmalloc_track_caller+0x18f/0x1b7
    [] kstrndup+0x37/0x54
    [] nfs_parse_devname+0x152/0x204 [nfs]
    [] nfs4_validate_text_mount_data+0xd0/0xdc [nfs]
    [] nfs_get_sb+0x325/0x736 [nfs]
    [] vfs_kern_mount+0xbd/0x17c
    [] do_kern_mount+0x4d/0xed
    [] do_mount+0x787/0x7fe
    [] sys_mount+0x88/0xc2
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Xiaotian Feng
    Cc: Trond Myklebust
    Cc: Chuck Lever
    Cc: Benny Halevy
    Cc: Al Viro
    Cc: Andy Adamson
    Signed-off-by: Trond Myklebust

    Xiaotian Feng
     
  • The original code passed an ERR_PTR() to rpc_put_task() and instead of
    returning zero on success it returned -ENOMEM.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Trond Myklebust

    Dan Carpenter
     

28 Apr, 2010

5 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    coda: move backing-dev.h kernel include inside __KERNEL__
    mtd: ensure that bdi entries are properly initialized and registered
    Move mtd_bdi_*mappable to mtdcore.c
    btrfs: convert to using bdi_setup_and_register()
    Catch filesystems lacking s_bdi
    drbd: Terminate a connection early if sending the protocol fails
    drbd: fix memory leak
    Fix JFFS2 sync silent failure
    smbfs: add bdi backing to mount session
    ncpfs: add bdi backing to mount session
    exofs: add bdi backing to mount session
    ecryptfs: add bdi backing to mount session
    coda: add bdi backing to mount session
    cifs: add bdi backing to mount session
    afs: add bdi backing to mount session.
    9p: add bdi backing to mount session
    bdi: add helper function for doing init and register of a bdi for a file system
    block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer

    Linus Torvalds
     
  • * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
    nfsd4: bug in read_buf

    Linus Torvalds
     
  • Correct the file_operations struct in fdinfo entry of tid_base_stuff[].

    Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
    /proc/*/fd/.

    Signed-off-by: Jerome Marchand
    Cc: Alexander Viro
    Cc: Miklos Szeredi
    Cc: Alexey Dobriyan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     
  • Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
    nfs_page_async_flush. According to the trace in
    https://bugzilla.novell.com/show_bug.cgi?id=599628
    the problem appears to be due to nfs_wb_page() not waiting for the
    PG_writeback flag to clear.

    There is a ditto problem in nfs_wb_page_cancel()

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The checks for CONFIG_MMU at this location are duplicated as all the code is
    located inside a #ifndef CONFIG_MMU block. So the first conditional block will
    always be included while the second never will.

    Signed-off-by: Christoph Egger
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Christoph Egger
     

27 Apr, 2010

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    squashfs: fix potential buffer over-run on 4K block file systems
    squashfs: add missing buffer free
    squashfs: fix warn_on when root inode is corrupted
    squashfs: fix locking bug in zlib wrapper

    Linus Torvalds
     
  • When read_buf is called to move over to the next page in the pagelist
    of an NFSv4 request, it sets argp->end to essentially a random
    number, certainly not an address within the page which argp->p now
    points to. So subsequent calls to READ_BUF will think there is much
    more than a page of spare space (the cast to u32 ensures an unsigned
    comparison) so we can expect to fall off the end of the second
    page.

    We never encountered thsi in testing because typically the only
    operations which use more than two pages are write-like operations,
    which have their own decoding logic. Something like a getattr after a
    write may cross a page boundary, but it would be very unusual for it to
    cross another boundary after that.

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    Neil Brown
     
  • A new xfsqa test (226) with a prototype xfs_fsr change to try to
    handle dynamic fork offsets better triggers an assertion failure
    where the inode data fork is in btree format, yet there is room in
    the inode for it to be in extent format. The two inodes look like:

    before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
    before: ino 0x115 (temp), num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
    after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
    after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56

    Basically the target inode ends up with 5 extents in btree format,
    but it had space for 6 extents in extent format, so ends up
    incorrect. Notably here the broot size is the same, and that is
    where the kernel code is going wrong - the btree root will fit, so
    it lets the swap go ahead.

    The check should not allow the swap to take place if the number of
    extents while in btree format is less than the number of extents
    that can fit in the inode in extent format. Adding that check will
    prevent this swap and corruption from occurring.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

26 Apr, 2010

2 commits


25 Apr, 2010

7 commits

  • noop_backing_dev_info is used only as a flag to mark filesystems that
    don't have any backing store, like tmpfs, procfs, spufs, etc.

    Signed-off-by: Joern Engel

    Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
    to the noop_backing_dev_info is not legal and will not result in
    them being flushed, but we already catch this condition in
    __mark_inode_dirty() when checking for a registered bdi.

    Signed-off-by: Jens Axboe

    Jörn Engel
     
  • Sizing the buffer based on block size is incorrect, leading
    to a potential buffer over-run on 4K block size file systems
    (because the metadata block size is always 8K). This bug
    doesn't seem have triggered because 4K block size file systems
    are not default, and also because metadata blocks after
    compression tend to be less than 4K.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where
    the root inode has been corrupted.

    Signed-off-by: Phillip Lougher
    Reported-by: Steve Grubb

    Phillip Lougher
     
  • We are seeing a large regression in database performance on recent
    kernels. The database opens a block device with O_DIRECT|O_SYNC and a
    number of threads write to different regions of the file at the same time.

    A simple test case is below. I haven't defined DEVICE since getting it
    wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
    see about 17MB/sec and only a few threads in IO wait:

    procs -----io---- -system-- -----cpu------
    r b bi bo in cs us sy id wa st
    0 3 0 16170 656 2259 0 0 86 14 0
    0 2 0 16704 695 2408 0 0 92 8 0
    0 2 0 17308 744 2653 0 0 86 14 0
    0 2 0 17933 759 2777 0 0 89 10 0

    Most threads are blocking in vfs_fsync_range, which has:

    mutex_lock(&mapping->host->i_mutex);
    err = fop->fsync(file, dentry, datasync);
    if (!ret)
    ret = err;
    mutex_unlock(&mapping->host->i_mutex);

    commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
    helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
    some explanation of what is going on:

    Use these new helpers for syncing from generic VFS functions. This makes
    O_SYNC writes to block devices acquire i_mutex for syncing. If we really
    care about this, we can make block_fsync() drop the i_mutex and reacquire
    it before it returns.

    Thanks Jan for such a good commit message! As well as dropping i_mutex,
    Christoph suggests we should remove the call to sync_blockdev():

    > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
    > the block device inode, which is exactly what we did just before calling
    > into ->fsync

    The patch below incorporates both suggestions. With it the testcase improves
    from 17MB/s to 68M/sec:

    procs -----io---- -system-- -----cpu------
    r b bi bo in cs us sy id wa st
    0 7 0 65536 1000 3878 0 0 70 30 0
    0 34 0 69632 1016 3921 0 1 46 53 0
    0 57 0 69632 1000 3921 0 0 55 45 0
    0 53 0 69640 754 4111 0 0 81 19 0

    Testcase:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define NR_THREADS 64
    #define BUFSIZE (64 * 1024)

    #define DEVICE "/dev/mapper/XXXXXX"

    #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))

    static int fd;

    static void *doit(void *arg)
    {
    unsigned long offset = (long)arg;
    char *b, *buf;

    b = malloc(BUFSIZE + 1024);
    buf = (char *)ALIGN((unsigned long)b, 1024);
    memset(buf, 0, BUFSIZE);

    while (1)
    pwrite(fd, buf, BUFSIZE, offset);
    }

    int main(int argc, char *argv[])
    {
    int flags = O_RDWR|O_DIRECT;
    int i;
    unsigned long offset = 0;

    if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
    flags |= O_SYNC;

    fd = open(DEVICE, flags);
    if (fd == -1) {
    perror("open");
    exit(1);
    }

    for (i = 0; i < NR_THREADS-1; i++) {
    pthread_t tid;
    pthread_create(&tid, NULL, doit, (void *)offset);
    offset += BUFSIZE;
    }
    doit((void *)offset);

    return 0;
    }

    Signed-off-by: Anton Blanchard
    Acked-by: Jan Kara
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Commit 48b32a3553a54740d236b79a90f20147a25875e3 ("reiserfs: use generic
    xattr handlers") introduced a problem that causes corruption when extended
    attributes are replaced with a smaller value.

    The issue is that the reiserfs_setattr to shrink the xattr file was moved
    from before the write to after the write.

    The root issue has always been in the reiserfs xattr code, but was papered
    over by the fact that in the shrink case, the file would just be expanded
    again while the xattr was written.

    The end result is that the last 8 bytes of xattr data are lost.

    This patch fixes it to use new_size.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826

    Signed-off-by: Jeff Mahoney
    Reported-by: Christian Kujau
    Tested-by: Christian Kujau
    Cc: Edward Shishkin
    Cc: Jethro Beekman
    Cc: Greg Surbey
    Cc: Marco Gatti
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Commit 677c9b2e393a0cd203bd54e9c18b012b2c73305a ("reiserfs: remove
    privroot hiding in lookup") removed the magic from the lookup code to hide
    the .reiserfs_priv directory since it was getting loaded at mount-time
    instead. The intent was that the entry would be hidden from the user via
    a poisoned d_compare, but this was faulty.

    This introduced a security issue where unprivileged users could access and
    modify extended attributes or ACLs belonging to other users, including
    root.

    This patch resolves the issue by properly hiding .reiserfs_priv. This was
    the intent of the xattr poisoning code, but it appears to have never
    worked as expected. This is fixed by using d_revalidate instead of
    d_compare.

    This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way.
    The effort involved in working out the corner cases wrt permissions and
    caching outweigh the benefit of the feature.

    Signed-off-by: Jeff Mahoney
    Acked-by: Edward Shishkin
    Reported-by: Matt McCutchen
    Tested-by: Matt McCutchen
    Cc: Frederic Weisbecker
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     

24 Apr, 2010

1 commit

  • This cleans up a few of the complaints of __generic_block_fiemap. I've
    fixed all the typing stuff, used inline functions instead of macros,
    gotten rid of a couple of variables, and made sure the size and block
    requests are all block aligned. It also fixes a problem where sometimes
    FIEMAP_EXTENT_LAST wasn't being set properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

23 Apr, 2010

5 commits


22 Apr, 2010

5 commits