19 Feb, 2011

1 commit


18 Feb, 2011

1 commit


17 Feb, 2011

5 commits

  • * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
    nfsd: correctly handle return value from nfsd_map_name_to_*

    Linus Torvalds
     
  • This reverts commit 75f1dc0d076d ("block: check bdev_read_only() from
    blkdev_get()"). That commit added stricter checking to make sure
    devices that were being used read-only were actually opened in that
    mode.

    It turns out that the change breaks a bunch of kernel code that opens
    block devices. Affected systems include dm, md, and the loop device.
    Because strict checking for read-only opens of block devices was not
    done before this, the code that opens the devices was opening them
    read-write even if they were being used read-only. Auditing all that
    code will take time, and new userspace packages for dm, mdadm, etc.
    will also be required.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     
  • These functions return an nfs status, not a host_err. So don't
    try to convert before returning.

    This is a regression introduced by
    3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
    but missed these two.

    Cc: stable@kernel.org
    Reported-by: Herbert Poetzl
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     
  • When Al moved the nameidata_dentry_drop_rcu_maybe() call into the
    do_follow_link function in commit 844a391799c2 ("nothing in
    do_follow_link() is going to see RCU"), he mistakenly left the

    BUG_ON(inode != path->dentry->d_inode);

    behind. Which would otherwise be ok, but that BUG_ON() really needs to
    be _after_ dropping RCU, since the dentry isn't necessarily stable
    otherwise.

    So complete the code movement in that commit, and move the BUG_ON() into
    do_follow_link() too. This means that we need to pass in 'inode' as an
    argument (just for this one use), but that's a small thing. And
    eventually we may be confident enough in our path lookup that we can
    just remove the BUG_ON() and the unnecessary inode argument.

    Reported-and-tested-by: Eric Dumazet
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • There are two spellings in use for 'freeze' + 'able' - 'freezable' and
    'freezeable'. The former is the more prominent one. The latter is
    mostly used by workqueue and in a few other odd places. Unify the
    spelling to 'freezable'.

    Signed-off-by: Tejun Heo
    Reported-by: Alan Stern
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Dmitry Torokhov
    Cc: David Woodhouse
    Cc: Alex Dubov
    Cc: "David S. Miller"
    Cc: Steven Whitehouse

    Tejun Heo
     

16 Feb, 2011

3 commits

  • * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
    nfsd: break lease on unlink due to rename
    nfsd4: acquire only one lease per file
    nfsd4: modify fi_delegations under recall_lock
    nfsd4: remove unused deleg dprintk's.
    nfsd4: split lease setting into separate function
    nfsd4: fix leak on allocation error
    nfsd4: add helper function for lease setup
    nfsd4: split up nfsd_break_deleg_cb
    NFSD: memory corruption due to writing beyond the stat array
    NFSD: use nfserr for status after decode_cb_op_status
    nfsd: don't leak dentry count on mnt_want_write failure

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu()
    drop out of RCU in return_reval
    split do_revalidate() into RCU and non-RCU cases
    in do_lookup() split RCU and non-RCU cases of need_revalidate
    nothing in do_follow_link() is going to see RCU

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: check return value of alloc_extent_map()
    Btrfs - Fix memory leak in btrfs_init_new_device()
    btrfs: prevent heap corruption in btrfs_ioctl_space_info()
    Btrfs: Fix balance panic
    Btrfs: don't release pages when we can't clear the uptodate bits
    Btrfs: fix page->private races

    Linus Torvalds
     

15 Feb, 2011

12 commits

  • task_show_regs used to be a debugging aid in the early bringup days
    of Linux on s390. /proc//status is a world readable file, it
    is not a good idea to show the registers of a process. The only
    correct fix is to remove task_show_regs.

    Reported-by: Al Viro
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     
  • can't happen anymore and didn't work right anyway

    Signed-off-by: Al Viro

    Al Viro
     
  • ... thus killing the need to handle drop-from-RCU in d_revalidate()

    Signed-off-by: Al Viro

    Al Viro
     
  • fixing oopsen in lookup_one_len()

    Signed-off-by: Al Viro

    Al Viro
     
  • and use unlikely() instead of gotos, for fsck sake...

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • I add the check on the return value of alloc_extent_map() to several places.
    In addition, alloc_extent_map() returns only the address or NULL.
    Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • Memory allocated by calling kstrdup() should be freed.

    Signed-off-by: Ilya Dryomov
    Signed-off-by: Chris Mason

    Ilya Dryomov
     
  • Commit bf5fc093c5b625e4259203f1cee7ca73488a5620 refactored
    btrfs_ioctl_space_info() and introduced several security issues.

    space_args.space_slots is an unsigned 64-bit type controlled by a
    possibly unprivileged caller. The comparison as a signed int type
    allows providing values that are treated as negative and cause the
    subsequent allocation size calculation to wrap, or be truncated to 0.
    By providing a size that's truncated to 0, kmalloc() will return
    ZERO_SIZE_PTR. It's also possible to provide a value smaller than the
    slot count. The subsequent loop ignores the allocation size when
    copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.

    The fix changes the slot count type and comparison typecast to u64,
    which prevents truncation or signedness errors, and also ensures that we
    don't copy more data than we've allocated in the subsequent loop. Note
    that zero-size allocations are no longer possible since there is already
    an explicit check for space_args.space_slots being 0 and truncation of
    this value is no longer an issue.

    Signed-off-by: Dan Rosenberg
    Signed-off-by: Josef Bacik
    Reviewed-by: Josef Bacik
    Signed-off-by: Chris Mason

    Dan Rosenberg
     
  • Mark the cloned backref_node as checked in clone_backref_node()

    Signed-off-by: Yan, Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • Btrfs tracks uptodate state in an rbtree as well as in the
    page bits. This is supposed to enable us to use block sizes other than
    the page size, but there are a few parts still missing before that
    completely works.

    But, our readpage routine trusts this additional range based tracking
    of uptodateness, much in the same way the buffer head up to date bits
    are trusted for the other filesystems.

    The problem is that sometimes we need to allocate memory in order to
    split records in the rbtree, even when we are just clearing bits. This
    can be difficult when our clearing function is called GFP_ATOMIC, which
    can happen in the releasepage path.

    So, what happens today looks like this:

    releasepage called with GFP_ATOMIC
    btrfs_releasepage calls clear_extent_bit
    clear_extent_bit fails to allocate ram, leaving the up to date bit set
    btrfs_releasepage returns success

    The end result is the page being gone, but btrfs thinking the range is
    up to date. Later on if someone tries to read that same page, the
    btrfs readpage code will return immediately thinking the page is already
    up to date.

    This commit fixes things to fail the releasepage when we can't clear the
    extent state bits. It covers both data pages and metadata tree blocks.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • There is a race where btrfs_releasepage can drop the
    page->private contents just as alloc_extent_buffer is setting
    up pages for metadata. Because of how the Btrfs page flags work,
    this results in us skipping the crc on the page during IO.

    This patch sovles the race by waiting until after the extent buffer
    is inserted into the radix tree before it sets page private.

    Signed-off-by: Chris Mason

    Chris Mason
     

14 Feb, 2011

11 commits


13 Feb, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: call __jbd2_log_start_commit with j_state_lock write locked
    ext4: serialize unaligned asynchronous DIO
    ext4: make grpinfo slab cache names static
    ext4: Fix data corruption with multi-block writepages support
    ext4: fix up ext4 error handling
    ext4: unregister features interface on module unload
    ext4: fix panic on module unload when stopping lazyinit thread

    Linus Torvalds
     

12 Feb, 2011

6 commits

  • On an SMP ARM system running ext4, I've received a report that the
    first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

    J_ASSERT(journal->j_running_transaction != NULL);

    While investigating possible causes for this problem, I noticed that
    __jbd2_log_start_commit() is getting called with j_state_lock only
    read-locked, in spite of the fact that it's possible for it might
    j_commit_request. Fix this by grabbing the necessary information so
    we can test to see if we need to start a new transaction before
    dropping the read lock, and then calling jbd2_log_start_commit() which
    will grab the write lock.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • ext4 has a data corruption case when doing non-block-aligned
    asynchronous direct IO into a sparse file, as demonstrated
    by xfstest 240.

    The root cause is that while ext4 preallocates space in the
    hole, mappings of that space still look "new" and
    dio_zero_block() will zero out the unwritten portions. When
    more than one AIO thread is going, they both find this "new"
    block and race to zero out their portion; this is uncoordinated
    and causes data corruption.

    Dave Chinner fixed this for xfs by simply serializing all
    unaligned asynchronous direct IO. I've done the same here.
    The difference is that we only wait on conversions, not all IO.
    This is a very big hammer, and I'm not very pleased with
    stuffing this into ext4_file_write(). But since ext4 is
    DIO_LOCKING, we need to serialize it at this high level.

    I tried to move this into ext4_ext_direct_IO, but by then
    we have the i_mutex already, and we will wait on the
    work queue to do conversions - which must also take the
    i_mutex. So that won't work.

    This was originally exposed by qemu-kvm installing to
    a raw disk image with a normal sector-63 alignment. I've
    tested a backport of this patch with qemu, and it does
    avoid the corruption. It is also quite a lot slower
    (14 min for package installs, vs. 8 min for well-aligned)
    but I'll take slow correctness over fast corruption any day.

    Mingming suggested that we can track outstanding
    conversions, and wait on those so that non-sparse
    files won't be affected, and I've implemented that here;
    unaligned AIO to nonsparse files won't take a perf hit.

    [tytso@mit.edu: Keep the mutex as a hashed array instead
    of bloating the ext4 inode]

    [tytso@mit.edu: Fix up namespace issues so that global
    variables are protected with an "ext4_" prefix.]

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • In 2.6.37 I was running into oopses with repeated module
    loads & unloads. I tracked this down to:

    fb1813f4 ext4: use dedicated slab caches for group_info structures

    (this was in addition to the features advert unload problem)

    The kstrdup & subsequent kfree of the cache name was causing
    a double free. In slub, at least, if I read it right it allocates
    & frees the name itself, slab seems to do something different...
    so in slub I think we were leaking -our- cachep->name, and double
    freeing the one allocated by slub.

    After getting lost in slab/slub/slob a bit, I just looked at other
    sized-caches that get allocated. jbd2, biovec, sgpool all do it
    more or less the way jbd2 does. Below patch follows the jbd2
    method of dynamically allocating a cache at mount time from
    a list of static names.

    (This might also possibly fix a race creating the caches with
    parallel mounts running).

    [Folded in a fix from Dan Carpenter which fixed an off-by-one error in
    the original patch]

    Cc: stable@kernel.org
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
    dlm: use single thread workqueues

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: don't always drop malformed replies on the floor (try #3)
    cifs: clean up checks in cifs_echo_request
    [CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate

    Linus Torvalds
     
  • In commit fa0d7e3de6d6 ("fs: icache RCU free inodes"), we use rcu free
    inode instead of freeing the inode directly. It causes a crash when we
    rmmod immediately after we umount the volume[1].

    So we need to call rcu_barrier after we kill_sb so that the inode is
    freed before we do rmmod. The idea is inspired by Aneesh Kumar.
    rcu_barrier will wait for all callbacks to end before preceding. The
    original patch was done by Tao Ma, but synchronize_rcu() is not enough
    here.

    1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2

    Tested-by: Tao Ma
    Signed-off-by: Boaz Harrosh
    Cc: Nick Piggin
    Cc: Al Viro
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boaz Harrosh