15 Apr, 2013

1 commit

  • Pull one more btrfs fix from Chris Mason:
    "This has a recent fix from Josef for our tree log replay code. It
    fixes problems where the inode counter for the number of bytes in the
    file wasn't getting updated properly during fsync replay.

    The commit did get rebased this morning, but it was only to clean up
    the subject line. The code hasn't changed."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: make sure nbytes are right after log replay

    Linus Torvalds
     

14 Apr, 2013

1 commit

  • Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb").

    This commit doesn't look right: since we are looking at the tail of the
    list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
    it back at the head of the list instead of the tail, otherwise we will
    keep spinning on it.

    Discovered when investigating why prune_icache_sb came top in perf
    reports of a swapping load.

    Signed-off-by: Suleiman Souhlal
    Signed-off-by: Hugh Dickins
    Cc: stable@vger.kernel.org # v3.2+
    Signed-off-by: Linus Torvalds

    Suleiman Souhlal
     

13 Apr, 2013

2 commits

  • While trying to track down a tree log replay bug I noticed that fsck was always
    complaining about nbytes not being right for our fsynced file. That is because
    the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
    nbytes are not necessarily updated properly when we log it. So to fix this we
    need to set nbytes to whatever it is on the inode that is on disk, so when we
    replay the extents we can just add the bytes that are being added as we replay
    the extent. This makes it work for the case that we have the wrong nbytes or
    the case that we logged everything and nbytes is actually correct. With this
    I'm no longer getting nbytes errors out of btrfsck.

    Cc: stable@vger.kernel.org
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Pull CIFS fix from Steve French:
    "Fixes a regression in cifs in which a password which begins with a
    comma is parsed incorrectly as a blank password"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: Allow passwords which begin with a delimitor

    Linus Torvalds
     

11 Apr, 2013

4 commits

  • Fixes a regression in cifs_parse_mount_options where a password
    which begins with a delimitor is parsed incorrectly as being a blank
    password.

    Signed-off-by: Sachin Prabhu
    Acked-by: Jeff Layton
    Cc:
    Signed-off-by: Steve French

    Sachin Prabhu
     
  • Pull another nfs fixlet from Trond Myklebust:
    "I suddenly noticed that a one-line issue that I _thought_ I had fixed
    with the nfs41_walk_client_list patch was apparently still there in
    the pull request I sent earlier today. I'm very sorry for not
    catching that in time.

    - Fix a brain fart in nfs41_walk_client_list"

    * tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Doh! Typo in the fix to nfs41_walk_client_list

    Linus Torvalds
     
  • Make sure that we set the status to 0 on success. Missed in testing
    because it never appears when doing multiple mounts to _different_
    servers.

    Signed-off-by: Trond Myklebust
    Cc: # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list

    Trond Myklebust
     
  • Pull NFS client bugfixes from Trond Myklebust:
    - fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
    - fix for an Oopsable bug in rpc_clone_client (stable)
    - another state manager deadlock in the NFSv4 open code
    - memory leaks in nfs4_discover_server_trunking and rpc_new_client

    * tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Fix another potential state manager deadlock
    SUNRPC: Fix a potential memory leak in rpc_new_client
    NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
    NFSv4: Fix a memory leak in nfs4_discover_server_trunking
    SUNRPC: Remove extra xprt_put()

    Linus Torvalds
     

10 Apr, 2013

5 commits

  • Pull vfs fixes from Al Viro:
    "A nasty bug in fs/namespace.c caught by Andrey + a couple of less
    serious unpleasantness - ecryptfs misc device playing hopeless games
    with try_module_get() and palinfo procfs support being... not quite
    correctly done, to be polite."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    mnt: release locks on error path in do_loopback
    palinfo fixes
    procfs: add proc_remove_subtree()
    ecryptfs: close rmmod race

    Linus Torvalds
     
  • do_loopback calls lock_mount(path) and forget to unlock_mount
    if clone_mnt or copy_mnt fails.

    [ 77.661566] ================================================
    [ 77.662939] [ BUG: lock held when returning to user space! ]
    [ 77.664104] 3.9.0-rc5+ #17 Not tainted
    [ 77.664982] ------------------------------------------------
    [ 77.666488] mount/514 is leaving the kernel with locks still held!
    [ 77.668027] 2 locks held by mount/514:
    [ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [] lock_mount+0x32/0xe0
    [ 77.671755] #1: (&namespace_sem){+++++.}, at: [] lock_mount+0x4a/0xe0

    Signed-off-by: Andrey Vagin
    Signed-off-by: Al Viro

    Andrey Vagin
     
  • just what it sounds like; do that only to procfs subtrees you've
    created - doing that to something shared with another driver is
    not only antisocial, but might cause interesting races with
    proc_create() and its ilk.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Don't hold the NFSv4 sequence id while we check for open permission.
    The call to ACCESS may block due to reboot recovery.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

06 Apr, 2013

4 commits

  • It is unsafe to use list_for_each_entry_safe() here, because
    when we drop the nn->nfs_client_lock, we pin the _current_ list
    entry and ensure that it stays in the list, but we don't do the
    same for the _next_ list entry. Use of list_for_each_entry() is
    therefore the correct thing to do.

    Also fix the refcounting in nfs41_walk_client_list().

    Finally, ensure that the nfs_client has finished being initialised
    and, in the case of NFSv4.1, that the session is set up.

    Signed-off-by: Trond Myklebust
    Cc: Chuck Lever
    Cc: Bryan Schumaker
    Cc: stable@vger.kernel.org [>= 3.7]

    Trond Myklebust
     
  • When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
    the old one.

    Signed-off-by: Trond Myklebust
    Cc: Chuck Lever
    Cc: stable@vger.kernel.org [>=3.7]

    Trond Myklebust
     
  • Pull GFS2 fixes from Steven Whitehouse:
    "There are two patches which fix up a couple of minor issues in the DLM
    interface code, a missing error path in gfs2_rs_alloc(), one patch
    which fixes a problem during "withdraw" and a fix for discards/FITRIM
    when using 4k sector sized devices."

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
    GFS2: Issue discards in 512b sectors
    GFS2: Fix unlock of fcntl locks during withdrawn state
    GFS2: return error if malloc failed in gfs2_rs_alloc()
    GFS2: use memchr_inv
    GFS2: use kmalloc for lvb bitmap

    Linus Torvalds
     
  • This patch changes GFS2's discard issuing code so that it calls
    function sb_issue_discard rather than blkdev_issue_discard. The
    code was calling blkdev_issue_discard and specifying the correct
    sector offset and sector size, but blkdev_issue_discard expects
    these values to be in terms of 512 byte sectors, even if the native
    sector size for the device is different. Calling sb_issue_discard
    with the BLOCK size instead ensures the correct block-to-512b-sector
    translation. I verified that "minlen" is specified in blocks, so
    comparing it to a number of blocks is correct.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

04 Apr, 2013

8 commits


02 Apr, 2013

2 commits

  • Pull nfsd bugfix from J Bruce Fields:
    "An xdr decoding error--thanks, Toralf Förster, and Trinity!"

    * 'for-3.9' of git://linux-nfs.org/~bfields/linux:
    nfsd4: reject "negative" acl lengths

    Linus Torvalds
     
  • struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
    block_device allocated first time we access /dev/loopXX and deallocated on
    bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
    we want that block_device stay alive until we destroy the loop device
    with "losetup -d".

    But because we do not hold /dev/loopXX inode its counter goes 0, and
    inode/bdev can be destroyed at any moment. Usually it happens at memory
    pressure or when user drops inode cache (like in the test below). When later in
    loop_clr_fd() we want to use bdev we have use-after-free error with following
    stack:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
    bd_set_size+0x10/0xa0
    loop_clr_fd+0x1f8/0x420 [loop]
    lo_ioctl+0x200/0x7e0 [loop]
    lo_compat_ioctl+0x47/0xe0 [loop]
    compat_blkdev_ioctl+0x341/0x1290
    do_filp_open+0x42/0xa0
    compat_sys_ioctl+0xc1/0xf20
    do_sys_open+0x16e/0x1d0
    sysenter_dispatch+0x7/0x1a

    To prevent use-after-free we need to grab the device in loop_set_fd()
    and put it later in loop_clr_fd().

    The issue is reprodusible on current Linus head and v3.3. Here is the test:

    dd if=/dev/zero of=loop.file bs=1M count=1
    while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
    done

    [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
    time we call loop_set_fd() we check that loop_device->lo_state is
    Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
    it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
    device we'll get ENXIO.

    loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
    loop_device->lo_ctl_mutex. ]

    Signed-off-by: Anatol Pomozov
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Anatol Pomozov
     

30 Mar, 2013

2 commits

  • Pull btrfs fixes from Chris Mason:
    "We've had a busy two weeks of bug fixing. The biggest patches in here
    are some long standing early-enospc problems (Josef) and a very old
    race where compression and mmap combine forces to lose writes (me).
    I'm fairly sure the mmap bug goes all the way back to the introduction
    of the compression code, which is proof that fsx doesn't trigger every
    possible mmap corner after all.

    I'm sure you'll notice one of these is from this morning, it's a small
    and isolated use-after-free fix in our scrub error reporting. I
    double checked it here."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: don't drop path when printing out tree errors in scrub
    Btrfs: fix wrong return value of btrfs_lookup_csum()
    Btrfs: fix wrong reservation of csums
    Btrfs: fix double free in the btrfs_qgroup_account_ref()
    Btrfs: limit the global reserve to 512mb
    Btrfs: hold the ordered operations mutex when waiting on ordered extents
    Btrfs: fix space accounting for unlink and rename
    Btrfs: fix space leak when we fail to reserve metadata space
    Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
    Btrfs: fix race between mmap writes and compression
    Btrfs: fix memory leak in btrfs_create_tree()
    Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
    Btrfs: fix missing qgroup reservation before fallocating
    Btrfs: handle a bogus chunk tree nicely
    Btrfs: update to use fs_state bit

    Linus Torvalds
     
  • After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs
    started failing to delete xattrs from inode. This was due to a buggy
    test for '.' and '..' in fill_with_dentries() which resulted in passing
    '.' and '..' entries to lookup_one_len() in some cases. That returned
    error and so we failed to iterate over all xattrs of and inode.

    Fix the test in fill_with_dentries() along the lines of the one in
    lookup_one_len().

    Reported-by: Pawel Zawora
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

29 Mar, 2013

3 commits

  • A user reported a panic where we were panicing somewhere in
    tree_backref_for_extent from scrub_print_warning. He only captured the trace
    but looking at scrub_print_warning we drop the path right before we mess with
    the extent buffer to print out a bunch of stuff, which isn't right. So fix this
    by dropping the path after we use the eb if we need to. Thanks,

    Cc: stable@vger.kernel.org
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Pull sysfs fixes from Greg Kroah-Hartman:
    "Here are two fixes for sysfs that resolve issues that have been found
    by the Trinity fuzz tool, causing oopses in sysfs. They both have
    been in linux-next for a while to ensure that they do not cause any
    other problems."

    * tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    sysfs: handle failure path correctly for readdir()
    sysfs: fix race between readdir and lseek

    Linus Torvalds
     
  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     

28 Mar, 2013

8 commits

  • If we don't find the expected csum item, but find a csum item which is
    adjacent to the specified extent, we should return -EFBIG, or we should
    return -ENOENT. But btrfs_lookup_csum() return -EFBIG even the csum item
    is not adjacent to the specified extent. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • We reserve the space for csums only when we write data into a file, in
    the other cases, such as tree log, log replay, we don't do reservation,
    so we can use the reservation of the transaction handle just for the former.
    And for the latter, we should use the tree's own reservation. But the
    function - btrfs_csum_file_blocks() didn't differentiate between these
    two types of the cases, fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • The function btrfs_find_all_roots is responsible to allocate
    memory for 'roots' and free it if errors happen,so the caller should not
    free it again since the work has been done.

    Besides,'tmp' is allocated after the function btrfs_find_all_roots,
    so we can return directly if btrfs_find_all_roots() fails.

    Signed-off-by: Wang Shilong
    Reviewed-by: Miao Xie
    Reviewed-by: Jan Schmidt
    Signed-off-by: Josef Bacik

    Wang Shilong
     
  • A user reported a problem where he was getting early ENOSPC with hundreds of
    gigs of free data space and 6 gigs of free metadata space. This is because the
    global block reserve was taking up the entire free metadata space. This is
    ridiculous, we have infrastructure in place to throttle if we start using too
    much of the global reserve, so instead of letting it get this huge just limit it
    to 512mb so that users can still get work done. This allowed the user to
    complete his rsync without issues. Thanks

    Cc: stable@vger.kernel.org
    Reported-and-tested-by: Stefan Priebe
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We need to hold the ordered_operations mutex while waiting on ordered extents
    since we splice and run the ordered extents list. We need to make sure anybody
    else who wants to wait on ordered extents does actually wait for them to be
    completed. This will keep us from bailing out of flushing in case somebody is
    already waiting on ordered extents to complete. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We are way over-reserving for unlink and rename. Rename is just some random
    huge number and unlink accounts for tree log operations that don't actually
    happen during unlink, not to mention the tree log doesn't take from the trans
    block rsv anyway so it's completely useless. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Dave reported a warning when running xfstest 275. We have been leaking delalloc
    metadata space when our reservations fail. This is because we were improperly
    calculating how much space to free for our checksum reservations. The problem
    is we would sometimes free up space that had already been freed in another
    thread and we would end up with negative usage for the delalloc space. This
    patch fixes the problem by calculating how much space the other threads would
    have already freed, and then calculate how much space we need to free had we not
    done the reservation at all, and then freeing any excess space. This makes
    xfstests 275 no longer have leaked space. Thanks

    Cc: stable@vger.kernel.org
    Reported-by: David Sterba
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • When you take a snapshot, punch a hole where there has been data, then take
    another snapshot and try to send an incremental stream, btrfs send would
    give you EIO. That is because is_extent_unchanged had no support for holes
    being punched. With this patch, instead of returning EIO we just return
    0 (== the extent is not unchanged) and we're good.

    Signed-off-by: Jan Schmidt
    Cc: Alexander Block
    Signed-off-by: Josef Bacik

    Jan Schmidt