23 Aug, 2012

3 commits

  • Pull ceph fixes from Sage Weil:
    "Jim's fix closes a narrow race introduced with the msgr changes. One
    fix resolves problems with debugfs initialization that Yan found when
    multiple client instances are created (e.g., two clusters mounted, or
    rbd + cephfs), another one fixes problems with mounting a nonexistent
    server subdirectory, and the last one fixes a divide by zero error
    from unsanitized ioctl input that Dan Carpenter found."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: avoid divide by zero in __validate_layout()
    libceph: avoid truncation due to racing banners
    ceph: tolerate (and warn on) extraneous dentry from mds
    libceph: delay debugfs initialization until we learn global_id

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    - NFSv3 mounts need to fail if the FSINFO rpc call fails
    - Ensure that the NFS commit cache gets torn down when we unload the
    NFS module.
    - Fix memory scribble issues when interrupting a LAYOUTGET rpc call
    - Fix NFSv4 legacy idmapper regressions
    - Fix issues with the NFSv4 getacl command
    - Fix a regression when using the legacy "mount -t nfs4"

    * tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv3: Ensure that do_proc_get_root() reports errors correctly
    NFSv4: Ensure that nfs4_alloc_client cleans up on error.
    NFS: return -ENOKEY when the upcall fails to map the name
    NFS: Clear key construction data if the idmap upcall fails
    NFSv4: Don't use private xdr_stream fields in decode_getacl
    NFSv4: Fix the acl cache size calculation
    NFSv4: Fix pointer arithmetic in decode_getacl
    NFS: Alias the nfs module to nfs4
    NFS: Fix a regression when loading the NFS v4 module
    NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
    pnfs-obj: Better IO pattern in case of unaligned offset
    NFS41: add pg_layout_private to nfs_pageio_descriptor
    pnfs: nfs4_proc_layoutget returns void
    pnfs: defer release of pages in layoutget
    nfs: tear down caches in nfs_init_writepagecache when allocation fails

    Linus Torvalds
     
  • Pull assorted fixes - mostly vfs - from Al Viro:
    "Assorted fixes, with an unexpected detour into vfio refcounting logics
    (fell out when digging in an analog of eventpoll race in there)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    task_work: add a scheduling point in task_work_run()
    fs: fix fs/namei.c kernel-doc warnings
    eventpoll: use-after-possible-free in epoll_create1()
    vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
    vfio: get rid of vfio_device_put()/vfio_group_get_device* races
    vfio: get rid of open-coding kref_put_mutex
    introduce kref_put_mutex()
    vfio: don't dereference after kfree...
    mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit

    Linus Torvalds
     

22 Aug, 2012

4 commits


21 Aug, 2012

4 commits


19 Aug, 2012

1 commit

  • Pull vfs fixes from Miklos Szeredi.

    This mainly fixes some confusion about whether the open 'mode' variable
    passed around should contain the full file type (S_IFREG etc)
    information or just the permission mode. In particular, the lack of
    proper file type information had confused fuse.

    * 'vfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    vfs: fix propagation of atomic_open create error on negative dentry
    fuse: check create mode in atomic open
    vfs: pass right create mode to may_o_create()
    vfs: atomic_open(): fix create mode usage
    vfs: canonicalize create mode in build_open_flags()

    Linus Torvalds
     

17 Aug, 2012

15 commits

  • Pull ext4 bug fixes from Ted Ts'o:
    "The following are all bug fixes and regressions. The most notable are
    the ones which cause problems for ext4 on RAID --- a performance
    problem when mounting very large filesystems, and a kernel OOPS when
    doing an rm -rf on large directory hierarchies on fast devices."

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix kernel BUG on large-scale rm -rf commands
    ext4: fix long mount times on very big file systems
    ext4: don't call ext4_error while block group is locked
    ext4: avoid kmemcheck complaint from reading uninitialized memory
    ext4: make sure the journal sb is written in ext4_clear_journal_err()

    Linus Torvalds
     
  • In some cases when an autofs indirect mount is contained in a file
    system that is marked as shared (such as when systemd does the
    equivalent of "mount --make-rshared /" early in the boot), mounts
    stop expiring.

    When this happens the first expiry check on a mountpoint dentry in
    autofs_expire_indirect() sees a mountpoint dentry with a higher
    than minimal reference count. Consequently the dentry is condidered
    busy and the actual expiry check is never done.

    This particular check was originally meant as an optimisation to
    detect a path walk in progress but with the addition of rcu-walk
    it can be ineffective anyway.

    Removing the test allows automounts to expire again since the
    actual expire check doesn't rely on the dentry reference count.

    Signed-off-by: Ian Kent
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • Commit 968dee7722: "ext4: fix hole punch failure when depth is greater
    than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel
    crashes when users ran run "rm -rf" on large directory hierarchy on
    ext4 filesystems on RAID devices:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028

    Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710)
    Call Trace:
    [] ? __ext4_handle_dirty_metadata+0x83/0x110
    [] ext4_ext_truncate+0x193/0x1d0
    [] ? ext4_mark_inode_dirty+0x7f/0x1f0
    [] ext4_truncate+0xf5/0x100
    [] ext4_evict_inode+0x461/0x490
    [] evict+0xa2/0x1a0
    [] iput+0x103/0x1f0
    [] do_unlinkat+0x154/0x1c0
    [] ? sys_newfstatat+0x2a/0x40
    [] sys_unlinkat+0x1b/0x50
    [] system_call_fastpath+0x16/0x1b
    Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00

    RIP [] ext4_ext_remove_space+0xa34/0xdf0

    This could be reproduced as follows:

    The problem in commit 968dee7722 was that caused the variable 'i' to
    be left uninitialized if the truncate required more space than was
    available in the journal. This resulted in the function
    ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused
    ext4_ext_remove_space() to restart the truncate operation after
    starting a new jbd2 handle.

    Reported-by: Maciej Żenczykowski
    Reported-by: Marti Raudsepp
    Tested-by: Fengguang Wu
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
    ext4_statfs()" introduced a O(n**2) calculation which makes very large
    file systems take forever to mount. Fix this with an optimization for
    non-bigalloc file systems. (For bigalloc file systems the overhead
    needs to be set in the the superblock.)

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • While in ext4_validate_block_bitmap(), if an block allocation bitmap
    is found to be invalid, we call ext4_error() while the block group is
    still locked. This causes ext4_commit_super() to call a function
    which might sleep while in an atomic context.

    There's no need to keep the block group locked at this point, so hoist
    the ext4_error() call up to ext4_validate_block_bitmap() and release
    the block group spinlock before calling ext4_error().

    The reported stack trace can be found at:

    http://article.gmane.org/gmane.comp.file-systems.ext4/33731

    Reported-by: Dave Jones
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • This allows the normal error-paths to handle the error, rather than
    making a special call to complete_request_key() just for this instance.

    Signed-off-by: Bryan Schumaker
    Tested-by: William Dauchy
    Cc: stable@vger.kernel.org [>= 3.4]
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • idmap_pipe_downcall already clears this field if the upcall succeeds,
    but if it fails (rpc.idmapd isn't running) the field will still be set
    on the next call triggering a BUG_ON(). This patch tries to handle all
    possible ways that the upcall could fail and clear the idmap key data
    for each one.

    Signed-off-by: Bryan Schumaker
    Tested-by: William Dauchy
    Cc: stable@vger.kernel.org [>= 3.4]
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Instead of using the private field xdr->p from struct xdr_stream,
    use the public xdr_stream_pos().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Currently, we do not take into account the size of the 16 byte
    struct nfs4_cached_acl header, when deciding whether or not we should
    cache the acl data. Consequently, we will end up allocating an
    8k buffer in order to fit a maximum size 4k acl.

    This patch adjusts the calculation so that we limit the cache size
    to 4k for the acl header+data.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Resetting the cursor xdr->p to a previous value is not a safe
    practice: if the xdr_stream has crossed out of the initial iovec,
    then a bunch of other fields would need to be reset too.

    Fix this issue by using xdr_enter_page() so that the buffer gets
    page aligned at the bitmap _before_ we decode it.

    Also fix the confusion of the ACL length with the page buffer length
    by not adding the base offset to the ACL length...

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • This allows distros to remove the line from their modprobe
    configuration.

    Signed-off-by: Bryan Schumaker
    Cc: stable@vger.kernel.org
    Signed-off-by: Trond Myklebust

    bjschuma@gmail.com
     
  • Some systems have a modprobe.d/nfs.conf file that sets an nfs4 alias
    pointing to nfs.ko, rather than nfs4.ko. This can prevent the v4 module
    from loading on mount, since the kernel sees that something named "nfs4"
    has already been loaded. To work around this, I've renamed the modules
    to "nfsv2.ko" "nfsv3.ko" and "nfsv4.ko".

    I also had to move the nfs4_fs_type back to nfs.ko to ensure that `mount
    -t nfs4` still works.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    bjschuma@gmail.com
     
  • Following a report of a crash during an automount expire I found that
    the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong.
    Not only is the locking wrong but the function is more complex than it
    needs to be.

    The function is meant to calculate (and dget) the next entry in the list
    of directories contained in the root of an autofs mount point (an autofs
    indirect mount to be precise). The main problem was that the d_lock of
    the owner of the list was not being taken when walking the list, which
    lead to list corruption under load. The only other lock that needs to
    be taken is against the next dentry candidate so it can be checked for
    usability.

    Signed-off-by: Ian Kent
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • Pull fuse updates from Miklos Szeredi.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: verify all ioctl retry iov elements
    fuse: add missing INIT flag descriptions
    fuse: add missing INIT flags
    fuse: update attributes on aio_read
    fuse: invalidate inode mapping if mtime changes
    fuse: add FUSE_AUTO_INVAL_DATA init flag

    Linus Torvalds
     
  • If ->atomic_open() returns -ENOENT, we take care to return the create
    error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1
    and provides a negative dentry.

    This fixes a regression where an unprivileged open O_CREAT fails with
    ENOENT instead of EACCES, introduced with the new atomic_open code. It
    is tested by the open/08.t test in the pjd posix test suite, and was
    observed on top of fuse (backed by ceph-fuse).

    Signed-off-by: Sage Weil
    Signed-off-by: Miklos Szeredi

    Sage Weil
     

15 Aug, 2012

4 commits


13 Aug, 2012

1 commit


09 Aug, 2012

2 commits

  • We got a recursive lock in mksubvol because the caller already held
    a lock. I think we got into this due to a merge error. Commit a874a63
    removed the mnt_want_write call from btrfs_mksubvol and added a
    replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid.
    Commit e7848683 however tried to move all calls to mnt_want_write above
    i_mutex. So somewhere while merging this, it got mixed up. The
    solution is to remove the mnt_want_write call completely from
    mksubvol.

    Reported-by: David Sterba
    Signed-off-by: Alexander Block
    Signed-off-by: Chris Mason

    Alexander Block
     
  • Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence
    disconnected data server) we've been sending layoutreturn calls
    while there is potentially still outstanding I/O to the data
    servers. The reason we do this is to avoid races between replayed
    writes to the MDS and the original writes to the DS.

    When this happens, the BUG_ON() in nfs4_layoutreturn_done can
    be triggered because it assumes that we would never call
    layoutreturn without knowing that all I/O to the DS is
    finished. The fix is to remove the BUG_ON() now that the
    assumptions behind the test are obsolete.

    Reported-by: Boaz Harrosh
    Reported-by: Tigran Mkrtchyan
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>=3.5]

    Trond Myklebust
     

07 Aug, 2012

1 commit

  • Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that
    the total iovec from the client doesn't overflow iov_length() but it
    only checked the first element. The iovec could still overflow by
    starting with a small element. The obvious fix is to check all the
    elements.

    The overflow case doesn't look dangerous to the kernel as the copy is
    limited by the length after the overflow. This fix restores the
    intention of returning an error instead of successfully copying less
    than the iovec represented.

    I found this by code inspection. I built it but don't have a test case.
    I'm cc:ing stable because the initial commit did as well.

    Signed-off-by: Zach Brown
    Signed-off-by: Miklos Szeredi
    CC: [2.6.37+]

    Zach Brown
     

06 Aug, 2012

2 commits

  • Commit 03179fe923 introduced a kmemcheck complaint in
    ext4_da_get_block_prep() because we save and restore
    ei->i_da_metadata_calc_last_lblock even though it is left
    uninitialized in the case where i_da_metadata_calc_len is zero.

    This doesn't hurt anything, but silencing the kmemcheck complaint
    makes it easier for people to find real bugs.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
    (which is marked as a regression).

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • After we transfer set the EXT4_ERROR_FS bit in the file system
    superblock, it's not enough to call jbd2_journal_clear_err() to clear
    the error indication from journal superblock --- we need to call
    jbd2_journal_update_sb_errno() as well. Otherwise, when the root file
    system is mounted read-only, the journal is replayed, and the error
    indicator is transferred to the superblock --- but the s_errno field
    in the jbd2 superblock is left set (since although we cleared it in
    memory, we never flushed it out to disk).

    This can end up confusing e2fsck. We should make e2fsck more robust
    in this case, but the kernel shouldn't be leaving things in this
    confused state, either.

    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Theodore Ts'o
     

04 Aug, 2012

3 commits