20 Sep, 2011

1 commit

  • * 'for-linus' of git://github.com/chrismason/linux:
    Btrfs: only clear the need lookup flag after the dentry is setup
    BTRFS: Fix lseek return value for error
    Btrfs: don't change inode flag of the dest clone file
    Btrfs: don't make a file partly checksummed through file clone
    Btrfs: fix pages truncation in btrfs_ioctl_clone()
    btrfs: fix d_off in the first dirent

    Linus Torvalds
     

18 Sep, 2011

7 commits

  • We can race with readdir and the RCU path walking stuff. This is because we
    clear the need lookup flag before actually instantiating the inode. This will
    lead the RCU path walk stuff to find a dentry it thinks is valid without a
    d_inode attached. So instead unhash the dentry when we first start the lookup,
    and then clear the flag after we've instantiated the dentry so we're garunteed
    to either try the slow lookup, or have the d_inode set properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The recent reworking of btrfs' lseek lead to incorrect
    values being returned. This adds checks for seeking
    beyond EOF in SEEK_HOLE and makes sure the error
    values come back correct.

    Andi Kleen also sent in similar patches.

    Signed-off-by: Jie Liu
    Reported-by: Andi Kleen
    Signed-off-by: Chris Mason

    Jeff Liu
     
  • Chris Mason
     
  • The dst file will have the same inode flags with dst file after
    file clone, and I think it's unexpected.

    For example, the dst file will suddenly become immutable after
    getting some share of data with src file, if the src is immutable.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • To reproduce the bug:

    # mount /dev/sda7 /mnt
    # dd if=/dev/zero of=/mnt/src bs=4K count=1
    # umount /mnt

    # mount -o nodatasum /dev/sda7 /mnt
    # dd if=/dev/zero of=/mnt/dst bs=4K count=1
    # clone_range -s 4K -l 4K /mnt/src /mnt/dst

    # echo 3 > /proc/sys/vm/drop_caches
    # cat /mnt/dst
    # dmesg
    ...
    btrfs no csum found for inode 258 start 0
    btrfs csum failed ino 258 off 0 csum 2566472073 private 0

    It's because part of the file is checksummed and the other part is not,
    and then btrfs will complain checksum is not found when we read the file.

    Disallow file clone if src and dst file have different checksum flag,
    so we ensure a file is completely checksummed or unchecksummed.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • It's a bug in commit f81c9cdc567cd3160ff9e64868d9a1a7ee226480
    (Btrfs: truncate pages from clone ioctl target range)

    We should pass the dest range to the truncate function, but not the
    src range.

    Also move the function before locking extent state.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • Since the d_off in the first dirent for "." (that originates from
    the 4th argument "offset" of filldir() for the 2nd dirent for "..")
    is wrongly assigned in btrfs_real_readdir(), telldir returns same
    offset for different locations.

    | # mkfs.btrfs /dev/sdb1
    | # mount /dev/sdb1 fs0
    | # cd fs0
    | # touch file0 file1
    | # ../test
    | telldir: 0
    | readdir: d_off = 2, d_name = "."
    | telldir: 2
    | readdir: d_off = 2, d_name = ".."
    | telldir: 2
    | readdir: d_off = 3, d_name = "file0"
    | telldir: 3
    | readdir: d_off = 2147483647, d_name = "file1"
    | telldir: 2147483647

    To fix this problem, pass filp->f_pos (which is loff_t) instead.

    | # ../test
    | telldir: 0
    | readdir: d_off = 1, d_name = "."
    | telldir: 1
    | readdir: d_off = 2, d_name = ".."
    | telldir: 2
    | readdir: d_off = 3, d_name = "file0"
    :

    At the moment the "offset" for "." is unused because there is no
    preceding dirent, however it is better to pass filp->f_pos to follow
    grammatical usage.

    Signed-off-by: Hidetoshi Seto
    Signed-off-by: Chris Mason

    Hidetoshi Seto
     

16 Sep, 2011

3 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    nfs: Do not allow multiple mounts on same mountpoint when using -o noac
    NFS: Fix a typo in nfs_flush_multi
    NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN error
    NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation
    NFSv4: nfs4_proc_renew should be declared static
    NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocation

    Linus Torvalds
     
  • generic_check_addressable can't deal with hfsplus's larger than page
    size allocation blocks, so simply opencode the checks that we actually
    need in hfsplus_fill_super.

    Signed-off-by: Christoph Hellwig
    Reported-by: Pavel Ivanov
    Tested-by: Pavel Ivanov
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Commit 6596528e391a ("hfsplus: ensure bio requests are not smaller than
    the hardware sectors") changed the pointers used for volume header
    allocations but failed to free the correct pointers in the error path
    path of hfsplus_fill_super() and hfsplus_read_wrapper.

    The second hunk came from a separate patch by Pavel Ivanov.

    Reported-by: Pavel Ivanov
    Signed-off-by: Seth Forshee
    Signed-off-by: Christoph Hellwig
    Cc:
    Signed-off-by: Linus Torvalds

    Seth Forshee
     

15 Sep, 2011

2 commits

  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: fix a use after free in xfs_end_io_direct_write

    Linus Torvalds
     
  • We used to get the victim pinned by dentry_unhash() prior to commit
    64252c75a219 ("vfs: remove dget() from dentry_unhash()") and ->rmdir()
    and ->rename() instances relied on that; most of them don't care, but
    ones that used d_delete() themselves do. As the result, we are getting
    rmdir() oopses on NFS now.

    Just grab the reference before locking the victim and drop it explicitly
    after unlocking, same as vfs_rename_other() does.

    Signed-off-by: Al Viro
    Tested-by: Simon Kirby
    Cc: stable@kernel.org (3.0.x)
    Signed-off-by: Linus Torvalds

    Al Viro
     

14 Sep, 2011

3 commits

  • There is a window in which the ioend that we call inode_dio_wake on
    in xfs_end_io_direct_write is already free. Fix this by storing
    the inode pointer in a local variable.

    This is a fix for the regression introduced in 3.1-rc by
    "fs: move inode_dio_done to the end_io handler".

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Do not allow multiple mounts on same mountpoint when using -o noac

    When you normally attempt to mount a share twice on the same mountpoint,
    a check in do_add_mount causes it to return an error

    # mount localhost:/nfsv3 /mnt
    # mount localhost:/nfsv3 /mnt
    mount.nfs: /mnt is already mounted or busy

    However when using the option 'noac', the user is able to mount the same
    share on the same mountpoint multiple times. This happens because a
    share mounted with the noac option is automatically assigned the 'sync'
    flag MS_SYNCHRONOUS in nfs_initialise_sb(). This flag is set after the
    check for already existing superblocks is done in sget(). The check for
    the mount flags in nfs_compare_mount_options() does not take into
    account the 'sync' flag applied later on in the code path. This means
    that when using 'noac', a new superblock structure is assigned for every
    new mount of the same share and multiple shares on the same mountpoint
    are allowed.

    ie.
    # mount -onoac localhost:/nfsv3 /mnt
    can be run multiple times.

    The patch checks for noac and assigns the sync flag before sget() is
    called to obtain an already existing superblock structure.

    Signed-off-by: Sachin Prabhu
    Reviewed-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Sachin Prabhu
     
  • Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k.

    Signed-off-by: Trond Myklebust
    Tested-by: Sricharan R

    Trond Myklebust
     

13 Sep, 2011

3 commits

  • * 'for-linus' of git://github.com/chrismason/linux:
    Btrfs: add dummy extent if dst offset excceeds file end in
    Btrfs: calc file extent num_bytes correctly in file clone
    btrfs: xattr: fix attribute removal
    Btrfs: fix wrong nbytes information of the inode
    Btrfs: fix the file extent gap when doing direct IO
    Btrfs: fix unclosed transaction handle in btrfs_cont_expand
    Btrfs: fix misuse of trans block rsv
    Btrfs: reset to appropriate block rsv after orphan operations
    Btrfs: skip locking if searching the commit root in csum lookup
    btrfs: fix warning in iput for bad-inode
    Btrfs: fix an oops when deleting snapshots

    Linus Torvalds
     
  • kmemleak is reporting that 32 bytes are being leaked by FUSE:

    unreferenced object 0xe373b270 (size 32):
    comm "fusermount", pid 1207, jiffies 4294707026 (age 2675.187s)
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x27/0x50
    [] kmem_cache_alloc+0xc5/0x180
    [] fuse_alloc_forget+0x1e/0x20
    [] fuse_alloc_inode+0xb0/0xd0
    [] alloc_inode+0x1c/0x80
    [] iget5_locked+0x8f/0x1a0
    [] fuse_iget+0x72/0x1a0
    [] fuse_get_root_inode+0x8a/0x90
    [] fuse_fill_super+0x3ef/0x590
    [] mount_nodev+0x3f/0x90
    [] fuse_mount+0x15/0x20
    [] mount_fs+0x1c/0xc0
    [] vfs_kern_mount+0x41/0x90
    [] do_kern_mount+0x39/0xd0
    [] do_mount+0x2e5/0x660
    [] sys_mount+0x66/0xa0

    This leak report is consistent and happens once per boot on
    3.1.0-rc5-dirty.

    This happens if a FORGET request is queued after the fuse device was
    released.

    Reported-by: Sitsofe Wheeler
    Signed-off-by: Miklos Szeredi
    Tested-by: Sitsofe Wheeler
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Commit 37fb3a30b4 ("fuse: fix flock") added in 3.1-rc4 caused flock() to
    fail with ENOSYS with the kernel ABI version 7.16 or earlier.

    Fix by falling back to testing FUSE_POSIX_LOCKS for ABI versions 7.16
    and earlier.

    Reported-by: Martin Ziegler
    Signed-off-by: Miklos Szeredi
    Tested-by: Martin Ziegler
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

11 Sep, 2011

12 commits

  • You can see there's no file extent with range [0, 4096]. Check this by
    btrfsck:

    # btrfsck /dev/sda7
    root 5 inode 258 errors 100
    ...

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • num_bytes should be 4096 not 12288.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • An attribute is not removed by 'setfattr -x attr file' and remains
    visible in attr list. This makes xfstests/062 pass again.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • If we write some data into the data hole of the file(no preallocation for this
    hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
    the other element--disk_i_size needn't be updated. At this condition, we must
    update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
    return 1).

    # mkfs.btrfs /dev/sdb1
    # mount /dev/sdb1 /mnt
    # touch /mnt/a
    # truncate -s 856002 /mnt/a
    # dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc
    # umount /mnt
    # btrfsck /dev/sdb1
    root 5 inode 257 errors 400
    found 32768 bytes used err is 1

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • When we write some data to the place that is beyond the end of the file
    in direct I/O mode, a data hole will be created. And Btrfs should insert
    a file extent item that point to this hole into the fs tree. But unfortunately
    Btrfs forgets doing it.

    The following is a simple way to reproduce it:
    # mkfs.btrfs /dev/sdc2
    # mount /dev/sdc2 /test4
    # touch /test4/a
    # dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
    # umount /test4
    # btrfsck /dev/sdc2
    root 5 inode 257 errors 100

    Reported-by: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Tested-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Miao Xie
     
  • The function - btrfs_cont_expand() forgot to close the transaction handle before
    it jump out the while loop. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • At the beginning of create_pending_snapshot, trans->block_rsv is set
    to pending->block_rsv and is used for snapshot things, however, when
    it is done, we do not recover it as will.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • While truncating free space cache, we forget to change trans->block_rsv
    back to the original one, but leave it with the orphan_block_rsv, and
    then with option inode_cache enable, it leads to countless warnings of
    btrfs_alloc_free_block and btrfs_orphan_commit_root:

    WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
    ...
    WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • It's not enough to just search the commit root, since we could be cow'ing the
    very block we need to search through, which would mean that its locked and we'll
    still deadlock. So use path->skip_locking as well. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • iput() shouldn't be called for inodes in I_NEW state.
    We need to mark inode as constructed first.

    WARNING: at fs/inode.c:1309 iput+0x20b/0x210()
    Call Trace:
    [] warn_slowpath_common+0x7a/0xb0
    [] warn_slowpath_null+0x15/0x20
    [] iput+0x20b/0x210
    [] btrfs_iget+0x1eb/0x4a0
    [] btrfs_run_defrag_inodes+0x136/0x210
    [] cleaner_kthread+0x17f/0x1a0
    [] ? sub_preempt_count+0x9d/0xd0
    [] ? transaction_kthread+0x280/0x280
    [] kthread+0x96/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? kthread_worker_fn+0x190/0x190
    [] ? gs_change+0xb/0xb

    Signed-off-by: Sergei Trofimovich
    CC: Konstantin Khlebnikov
    Tested-by: David Sterba
    CC: Josef Bacik
    CC: Chris Mason
    Signed-off-by: Chris Mason

    Sergei Trofimovich
     
  • We can reproduce this oops via the following steps:

    $ mkfs.btrfs /dev/sdb7
    $ mount /dev/sdb7 /mnt/btrfs
    $ for ((i=0; ii_ino
    to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
    while the snapshot's location.objectid remains unchanged.

    However, btrfs_ino() does not take this into account, and returns a wrong ino,
    and causes the oops.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • * 'for-linus' of git://neil.brown.name/md:
    md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
    md/raid1,10: Remove use-after-free bug in make_request.
    md/raid10: unify handling of write completion.
    Avoid dereferencing a 'request_queue' after last close.

    Linus Torvalds
     

10 Sep, 2011

3 commits

  • On the last close of an 'md' device which as been stopped, the device
    is destroyed and in particular the request_queue is freed. The free
    is done in a separate thread so it might happen a short time later.

    __blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
    called.

    Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
    bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
    inside a request_queue, to get a spin lock. This causes the last
    close on an md device to sometime take a spin_lock which lives in
    freed memory - which results in an oops.

    So move the called to bdev_inode_switch_bdi before the call to
    ->release.

    Cc: Christoph Hellwig
    Cc: Hugh Dickins
    Cc: Andrew Morton
    Cc: Wu Fengguang
    Acked-by: Wu Fengguang
    Cc: stable@kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     
  • * 'for-linus' of git://ceph.newdream.net/git/ceph-client:
    libceph: fix leak of osd structs during shutdown
    ceph: fix memory leak
    ceph: fix encoding of ino only (not relative) paths
    libceph: fix msgpool

    Linus Torvalds
     
  • Prior to 2.6.38 automount would not trigger on either stat(2) or
    lstat(2) on the automount point.

    After 2.6.38, with the introduction of the ->d_automount()
    infrastructure, stat(2) and others would start triggering automount
    while lstat(2), etc. still would not. This is a regression and a
    userspace ABI change.

    Problem originally reported here:

    http://thread.gmane.org/gmane.linux.kernel.autofs/6098

    It appears that there was an attempt at fixing various userspace tools
    to not trigger the automount. But since the stat system call is
    rather common it is impossible to "fix" all userspace.

    This patch reverts the original behavior, which is to not trigger on
    stat(2) and other symlink following syscalls.

    [ It's not really clear what the right behavior is. Apparently Solaris
    does the "automount on stat, leave alone on lstat". And some programs
    can get unhappy when "stat+open+fstat" ends up giving a different
    result from the fstat than from the initial stat.

    But the change in 2.6.38 resulted in problems for some people, so
    we're going back to old behavior. Maybe we can re-visit this
    discussion at some future date - Linus ]

    Reported-by: Leonardo Chiquitto
    Signed-off-by: Miklos Szeredi
    Acked-by: Ian Kent
    Cc: David Howells
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

08 Sep, 2011

1 commit


06 Sep, 2011

5 commits