05 Jan, 2012

1 commit

  • bitmap size sanity checks should be done *before* allocating ->s_root;
    there their cleanup on failure would be correct. As it is, we do iput()
    on root inode, but leak the root dentry...

    Signed-off-by: Al Viro
    Acked-by: Josh Boyer
    Signed-off-by: Linus Torvalds

    Al Viro
     

04 Jan, 2012

2 commits

  • Turned out the ntlmv2 (default security authentication)
    upgrade was harder to test than expected, and we ran
    out of time to test against Apple and a few other servers
    that we wanted to. Delay upgrade of default security
    from ntlm to ntlmv2 (on mount) to 3.3. Still works
    fine to specify it explicitly via "sec=ntlmv2" so this
    should be fine.

    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     
  • The current check looks to see if the RFC1002 length is larger than
    CIFSMaxBufSize, and fails if it is. The buffer is actually larger than
    that by MAX_CIFS_HDR_SIZE.

    This bug has been around for a long time, but the fact that we used to
    cap the clients MaxBufferSize at the same level as the server tended
    to paper over it. Commit c974befa changed that however and caused this
    bug to bite in more cases.

    Reported-and-Tested-by: Konstantinos Skarlatos
    Tested-by: Shirish Pargaonkar
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

31 Dec, 2011

1 commit


30 Dec, 2011

3 commits

  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: log all dirty inodes in xfs_fs_sync_fs
    xfs: log the inode in ->write_inode calls for kupdate

    Linus Torvalds
     
  • Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
    for nohz") did not take into account that one some architectures jiffies
    and cputime use different units.

    This causes get_idle_time() to return numbers in the wrong units, making
    the idle time fields in /proc/stat wrong.

    Instead of converting the usec value returned by
    get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
    usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

    Signed-off-by: Andreas Schwab
    Acked-by: Michal Hocko
    Cc: Arnd Bergmann
    Cc: "Artem S. Tashkinov"
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     
  • Ceph attempts to use the dcache to satisfy negative lookups and readdir
    when the entire directory contents are in cache. Disable this behavior
    until lingering bugs in this code are shaken out; we'll re-enable these
    hooks once things are fully stable.

    Signed-off-by: Sage Weil

    Sage Weil
     

27 Dec, 2011

1 commit

  • Bruce Fields notes that commit 778fc546f749 ("locks: fix tracking of
    inprogress lease breaks") introduced a possible error pointer
    dereference on failure to allocate memory. locks_conflict() will
    dereference the passed-in new lease lock structure that may be an error pointer.

    This means an open (without O_NONBLOCK set) on a file with a lease
    applied (generally only done when Samba or nfsd (with v4) is running)
    could crash if a kmalloc() fails.

    So instead of playing games with IS_ERROR() all over the place, just
    check the allocation failure early. That makes the code more
    straightforward, and avoids this possible bad pointer dereference.

    Based-on-patch-by: J. Bruce Fields
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Dec, 2011

4 commits

  • for linus: writeback reason binary tracing format fix

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: show writeback reason with __print_symbolic

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: call d_instantiate after all ops are setup
    Btrfs: fix worker lock misuse in find_worker

    Linus Torvalds
     
  • Since Linux 2.6.36 the writeback code has introduces various measures for
    live lock prevention during sync(). Unfortunately some of these are
    actively harmful for the XFS model, where the inode gets marked dirty for
    metadata from the data I/O handler.

    The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

    by only calling into __writeback_inodes_sb and thus only sampling the
    current cut off time once. But on a slow enough devices the previous
    asynchronous sync pass might not have fully completed yet, and thus XFS
    might mark metadata dirty only after that sampling of the cut off time for
    the blocking pass already happened. I have not myself reproduced this
    myself on a real system, but by introducing artificial delay into the
    XFS I/O completion workqueues it can be reproduced easily.

    Fix this by iterating over all XFS inodes in ->sync_fs and log all that
    are dirty. This might log inode that only got redirtied after the
    previous pass, but given how cheap delayed logging of inodes is it
    isn't a major concern for performance.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Tested-by: Mark Tinguely
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • If the writeback code writes back an inode because it has expired we currently
    use the non-blockin ->write_inode path. This means any inode that is pinned
    is skipped. With delayed logging and a workload that has very little log
    traffic otherwise it is very likely that an inode that gets constantly
    written to is always pinned, and thus we keep refusing to write it. The VM
    writeback code at that point redirties it and doesn't try to write it again
    for another 30 seconds. This means under certain scenarious time based
    metadata writeback never happens.

    Fix this by calling into xfs_log_inode for kupdate in addition to data
    integrity syncs, and thus transfer the inode to the log ASAP.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Tested-by: Mark Tinguely
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

23 Dec, 2011

2 commits


21 Dec, 2011

3 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS: Fix a regression in nfs_file_llseek()
    NFSv4: Do not accept delegated opens when a delegation recall is in effect
    NFSv4: Ensure correct locking when accessing the 'lock_states' list
    NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
    NFSv4: Don't error if we handled it in nfs4_recovery_handle_error
    SUNRPC: Ensure we always bump the backlog queue in xprt_free_slot
    SUNRPC: Fix the execution time statistics in the face of RPC restarts

    Linus Torvalds
     
  • There is a potential integer overflow in nilfs_ioctl_clean_segments().
    When a large argv[n].v_nmembs is passed from the userspace, the subsequent
    call to vmalloc() will allocate a buffer smaller than expected, which
    leads to out-of-bound access in nilfs_ioctl_move_blocks() and
    lfs_clean_segments().

    The following check does not prevent the overflow because nsegs is also
    controlled by the userspace and could be very large.

    if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment)
    goto out_free;

    This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and
    returns -EINVAL when overflow.

    Signed-off-by: Haogang Chen
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haogang Chen
     
  • commit 828b1c50ae ("nilfs2: add compat ioctl") incidentally broke all
    other NILFS compat ioctls. Make them work again.

    Signed-off-by: Thomas Meyer
    Signed-off-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc: [3.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Meyer
     

18 Dec, 2011

1 commit


17 Dec, 2011

2 commits

  • …inux/kernel/git/mason/linux-btrfs

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: unplug every once and a while
    Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
    Btrfs: only set cache_generation if we setup the block group
    Btrfs: don't panic if orphan item already exists
    Btrfs: fix leaked space in truncate
    Btrfs: fix how we do delalloc reservations and how we free reservations on error
    Btrfs: deal with enospc from dirtying inodes properly
    Btrfs: fix num_workers_starting bug and other bugs in async thread
    BTRFS: Establish i_ops before calling d_instantiate
    Btrfs: add a cond_resched() into the worker loop
    Btrfs: fix ctime update of on-disk inode
    btrfs: keep orphans for subvolume deletion
    Btrfs: fix inaccurate available space on raid0 profile
    Btrfs: fix wrong disk space information of the files
    Btrfs: fix wrong i_size when truncating a file to a larger size
    Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

    * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: lower the dirty balance poll interval

    Linus Torvalds
     
  • Tests show that the original large intervals can easily make the dirty
    limit exceeded on 100 concurrent dd's. So adapt to as large as the
    next check point selected by the dirty throttling algorithm.

    Signed-off-by: Wu Fengguang
    Signed-off-by: Chris Mason

    Wu Fengguang
     

16 Dec, 2011

10 commits

  • After commit 06222e491e663dac939f04b125c9dc52126a75c4 (fs: handle
    SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek)
    the behaviour of llseek() was changed so that it always revalidates
    the file size. The bug appears to be due to a logic error in the
    afore-mentioned commit, which always evaluates to 'true'.

    Reported-by: Roel Kluin
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>=3.1]

    Trond Myklebust
     
  • The btrfs io submission threads can build up massive plug lists. This
    keeps things more reasonable so we don't hand over huge dumps of IO at
    once.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • …/btrfs-work into integration

    Conflicts:
    fs/btrfs/inode.c

    Signed-off-by: Chris Mason <chris.mason@oracle.com>

    Chris Mason
     
  • btrfs_update_inode is sometimes called with a null reservation.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • A user reported a problem booting into a new kernel with the old format inodes.
    He was panicing in cow_file_range while writing out the inode cache. This is
    because if the block group is not cached we'll just skip writing out the cache,
    however if it gets dirtied again in the same transaction and it finished caching
    we'd go ahead and write it out, but since we set cache_generation to the transid
    we think we've already truncated it and will just carry on, running into
    cow_file_range and blowing up. We need to make sure we only set
    cache_generation if we've done the truncate. The user tested this patch and
    verified that the panic no longer occured. Thanks,

    Reported-and-Tested-by: Klaus Bitto
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
    a loop. This is because we will add an orphan item, do the truncate, the
    truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
    left with an orphan item still in the fs. Then we come back later to do another
    truncate and it blows up because we already have an orphan item. This is ok so
    just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We were occasionaly leaking space when running xfstest 269. This is because if
    we failed to start the transaction in the truncate loop we'd just goto out, but
    we need to break so that the inode is removed from the orphan list and the space
    is properly freed. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Running xfstests 269 with some tracing my scripts kept spitting out errors about
    releasing bytes that we didn't actually have reserved. This took me down a huge
    rabbit hole and it turns out the way we deal with reserved_extents is wrong,
    we need to only be setting it if the reservation succeeds, otherwise the free()
    method will come in and unreserve space that isn't actually reserved yet, which
    can lead to other warnings and such. The math was all working out right in the
    end, but it caused all sorts of other issues in addition to making my scripts
    yell and scream and generally make it impossible for me to track down the
    original issue I was looking for. The other problem is with our error handling
    in the reservation code. There are two cases that we need to deal with

    1) We raced with free. In this case free won't free anything because csum_bytes
    is modified before we dro the lock in our reservation path, so free rightly
    doesn't release any space because the reservation code may be depending on that
    reservation. However if we fail, we need the reservation side to do the free at
    that point since that space is no longer in use. So as it stands the code was
    doing this fine and it worked out, except in case #2

    2) We don't race with free. Nobody comes in and changes anything, and our
    reservation fails. In this case we didn't reserve anything anyway and we just
    need to clean up csum_bytes but not free anything. So we keep track of
    csum_bytes before we drop the lock and if it hasn't changed we know we can just
    decrement csum_bytes and carry on.

    Because of the case where we can race with free()'s since we have to drop our
    spin_lock to do the reservation, I'm going to serialize all reservations with
    the i_mutex. We already get this for free in the heavy use paths, truncate and
    file write all hold the i_mutex, just needed to add it to page_mkwrite and
    various ioctl/balance things. With this patch my space leak scripts no longer
    scream bloody murder. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Now that we're properly keeping track of delayed inode space we've been getting
    a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
    because a bunch of people call mark_inode_dirty, which is void so we can't
    return ENOSPC. This needs to be fixed in a few areas

    1) file_update_time - this updates the mtime and such when writing to a file,
    which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
    call btrfs_dirty_inode directly and return an error if we get one appropriately.

    2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
    setting ->setattr for symlinks, even though we should have been. This catches
    one of the cases where we were getting errors in mark_inode_dirty.

    3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
    instead of mark_inode_dirty. This lets us return errors properly for truncate
    and chown/anything related to setattr.

    4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
    print an error if we have one. The only remaining user we can't control for
    this is touch_atime(), but we don't really want to keep people from walking
    down the tree if we don't have space to save the atime update, so just complain
    but don't worry about it.

    With this patch xfstests 83 complains a handful of times instead of hundreds of
    times. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Al pointed out we have some random problems with the way we account for
    num_workers_starting in the async thread stuff. First of all we need to make
    sure to decrement num_workers_starting if we fail to start the worker, so make
    __btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
    doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
    failed to create a worker. Also check_pending_worker_creates needs to call
    __btrfs_start_work in it's work function since it already increments
    num_workers_starting.

    People only start one worker at a time, so get rid of the num_workers argument
    everywhere, and make btrfs_queue_worker a void since it will always succeed.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

15 Dec, 2011

10 commits

  • The Smack LSM hook for security_d_instantiate checks
    the inode's i_op->getxattr value to determine if the
    containing filesystem supports extended attributes.
    The BTRFS filesystem sets the inode's i_op value only
    after it has instantiated the inode. This results in
    Smack incorrectly giving new BTRFS inodes attributes
    from the filesystem defaults on the assumption that
    values can't be stored on the filesystem. This patch
    moves the assignment of inode operation vectors ahead
    of the calls to d_instantiate, letting Smack know that
    the filesystem supports extended attributes. There
    should be no impact on the performance or behavior of
    BTRFS.

    Signed-off-by: Casey Schaufler
    Signed-off-by: Chris Mason

    Casey Schaufler
     
  • If we have a constant stream of end_io completions or crc work,
    we can hit softlockup messages from the async helper threads. This
    adds a cond_resched() into the loop to avoid them.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • To reproduce the bug:

    # touch /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800
    # chattr +i /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:43.198105295 +0800
    # umount /mnt
    # mount /dev/loop1 /mnt
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800

    We should update ctime of in-memory inode before calling
    btrfs_update_inode().

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • Since we have the free space caches, btrfs_orphan_cleanup also runs for
    the tree_root. Unfortunately this also cleans up the orphans used to mark
    subvol deletions in progress.

    Currently if a subvol deletion gets interrupted twice by umount/mount, the
    deletion will not be continued and the space permanently lost, though it
    would be possible to write a tool to recover those lost subvol deletions.
    This patch checks if the orphan belongs to a subvol (dead root) and skips
    the deletion.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • When we use raid0 as the data profile, df command may show us a very
    inaccurate value of the available space, which may be much less than the
    real one. It may make the users puzzled. Fix it by changing the calculation
    of the available space, and making it be more similar to a fake chunk
    allocation.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Btrfsck report errors after the 83th case of xfstests was run, The error
    number is 400, it means the used disk space of the file is wrong.

    The reason of this bug is that:
    The file truncation may fail when the space of the file system is not enough,
    and leave some file extents, whose offset are beyond the end of the files.
    When we want to expand those files, we will drop those file extents, and
    put in dummy file extents, and then we should update the i-node. But btrfs
    forgets to do it.

    This patch adds the forgotten i-node update.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Btrfsck report error 100 after the 83th case of xfstests was run, it means
    the i_size of the file is wrong.

    The reason of this bug is that:
    Btrfs increased i_size of the file at the beginning, but it failed to expand
    the file, and failed to update the i_size to the old size because there is no
    enough space in the file system, so we found a wrong i_size.

    This patch fixes this bug by updating the i_size just when we pass the file
    expanding and get enough space to update i-node.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • * tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: handle EOF correctly in ext4_bio_write_page()
    ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
    ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers()
    ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize
    ext4: avoid hangs in ext4_da_should_update_i_disksize()
    ext4: display the correct mount option in /proc/mounts for [no]init_itable
    ext4: Fix crash due to getting bogus eh_depth value on big-endian systems
    ext4: fix ext4_end_io_dio() racing against fsync()

    .. using the new signed tag merge of git that now verifies the gpg
    signature automatically. Yay. The branchname was just 'dev', which is
    prettier. I'll tell Ted to use nicer tag names for future cases.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: llseek fix race
    fuse: fix llseek bug
    fuse: fix fuse_retrieve

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/ncpfs: fix error paths and goto statements in ncp_fill_super()
    configfs: register_filesystem() called too early
    fuse: register_filesystem() called too early
    ubifs: too early register_filesystem()
    ... and the same kind of leak for mqueue
    procfs: fix a vfsmount longterm reference leak

    Linus Torvalds