28 Jun, 2011

4 commits

  • Under heavy memory and filesystem load, users observe the assertion
    mapping->nrpages == 0 in end_writeback() trigger. This can be caused by
    page reclaim reclaiming the last page from a mapping in the following
    race:

    CPU0 CPU1
    ...
    shrink_page_list()
    __remove_mapping()
    __delete_from_page_cache()
    radix_tree_delete()
    evict_inode()
    truncate_inode_pages()
    truncate_inode_pages_range()
    pagevec_lookup() - finds nothing
    end_writeback()
    mapping->nrpages != 0 -> BUG
    page->mapping = NULL
    mapping->nrpages--

    Fix the problem by doing a reliable check of mapping->nrpages under
    mapping->tree_lock in end_writeback().

    Analyzed by Jay , lost in LKML, and dug out
    by Miklos Szeredi .

    Cc: Jay
    Cc: Miklos Szeredi
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • romfs_get_unmapped_area() checks argument `len' without considering
    PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after
    commit f67d9b1576c ("nommu: add page_align to mmap").

    Fix the check by changing it in same way ramfs_nommu_get_unmapped_area()
    was changed in ramfs/file-nommu.c.

    Signed-off-by: Bob Liu
    Cc: David Howells
    Cc: Paul Mundt
    Acked-by: Greg Ungerer
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    btrfs: fix inconsonant inode information
    Btrfs: make sure to update total_bitmaps when freeing cache V3
    Btrfs: fix type mismatch in find_free_extent()
    Btrfs: make sure to record the transid in new inodes

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: prevent bogus assert when trying to remove non-existent attribute
    xfs: clear XFS_IDIRTY_RELEASE on truncate down
    xfs: reset inode per-lifetime state when recycling it

    Linus Torvalds
     

27 Jun, 2011

3 commits

  • When iputting the inode, We may leave the delayed nodes if they have some
    delayed items that have not been dealt with. So when the inode is read again,
    we must look up the relative delayed node, and use the information in it to
    initialize the inode. Or we will get inconsonant inode information, it may
    cause that the same directory index number is allocated again, and hit the
    following oops:

    [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
    insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
    [ 5447.569766] ------------[ cut here ]------------
    [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
    [SNIP]
    [ 5447.790721] Call Trace:
    [ 5447.793191] [] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
    [ 5447.800156] [] btrfs_add_link+0x12b/0x191 [btrfs]
    [ 5447.806517] [] btrfs_add_nondir+0x31/0x58 [btrfs]
    [ 5447.812876] [] btrfs_create+0xf9/0x197 [btrfs]
    [ 5447.818961] [] vfs_create+0x72/0x92
    [ 5447.824090] [] do_last+0x22c/0x40b
    [ 5447.829133] [] path_openat+0xc0/0x2ef
    [ 5447.834438] [] ? __perf_event_task_sched_out+0x24/0x44
    [ 5447.841216] [] ? perf_event_task_sched_out+0x59/0x67
    [ 5447.847846] [] do_filp_open+0x3d/0x87
    [ 5447.853156] [] ? strncpy_from_user+0x43/0x4d
    [ 5447.859072] [] ? getname_flags+0x2e/0x80
    [ 5447.864636] [] ? do_getname+0x14b/0x173
    [ 5447.870112] [] ? audit_getname+0x16/0x26
    [ 5447.875682] [] ? spin_lock+0xe/0x10
    [ 5447.880882] [] do_sys_open+0x69/0xae
    [ 5447.886153] [] sys_open+0x20/0x22
    [ 5447.891114] [] system_call_fastpath+0x16/0x1b

    Fix it by reusing the old delayed node.

    Reported-by: Jim Schutt
    Signed-off-by: Miao Xie
    Tested-by: Jim Schutt
    Signed-off-by: Chris Mason

    Miao Xie
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN
    cifs: free blkcipher in smbhash

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    cifs: propagate errors from cifs_get_root() to mount(2)
    cifs: tidy cifs_do_mount() up a bit
    cifs: more breakage on mount failures
    cifs: close sget() races
    cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount()
    cifs: move cifs_umount() call into ->kill_sb()
    cifs: pull cifs_mount() call up
    sanitize cifs_umount() prototype
    cifs: initialize ->tlink_tree in cifs_setup_cifs_sb()
    cifs: allocate mountdata earlier
    cifs: leak on mount if we share superblock
    cifs: don't pass superblock to cifs_mount()
    cifs: don't leak nls on mount failure
    cifs: double free on mount failure
    take bdi setup/destruction into cifs_mount/cifs_umount

    Acked-by: Steve French

    Linus Torvalds
     

25 Jun, 2011

20 commits


24 Jun, 2011

7 commits

  • * 'for-linus' of git://git.kernel.dk/linux-block:
    block: add REQ_SECURE to REQ_COMMON_MASK
    block: use the passed in @bdev when claiming if partno is zero
    block: Add __attribute__((format(printf...) and fix fallout
    block: make disk_block_events() properly wait for work cancellation
    block: remove non-syncing __disk_block_events() and fold it into disk_block_events()
    block: don't use non-syncing event blocking in disk_check_events()
    cfq-iosched: fix locking around ioc->ioc_data assignment

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: fix wsize negotiation to respect max buffer size and active signing (try #4)
    CIFS: Fix problem with 3.0-rc1 null user mount failure

    Linus Torvalds
     
  • It was pointed out by 'make versioncheck' that some includes of
    linux/version.h were not needed in fs/ (fs/btrfs/ctree.h and
    fs/omfs/file.c).

    This patch removes them.

    Signed-off-by: Jesper Juhl
    Acked-by: Bob Copeland
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • If the attribute fork on an inode is in btree format and has
    multiple levels (i.e node format rather than leaf format), then a
    lookup failure will trigger an assert failure in xfs_da_path_shift
    if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to
    indicate to the directory btree code that not finding an entry is
    not a fatal error. In the case of doing a lookup for a directory
    name removal, this is valid as a user cannot insert an arbitrary
    name to remove from the directory btree.

    However, in the case of the attribute tree, a user has direct
    control over the attribute name and can ask for any random name to
    be removed without any validation. In this case, fsstress is asking
    for a non-existent user.selinux attribute to be removed, and that is
    causing xfs_da_path_shift() to fall off the bottom of the tree where
    it asserts that a lookup failure is allowed. Because the flag is not
    set, we die a horrible death on a debug enable kernel.

    Prevent this assert from firing on attribute removes by adding the
    op_flag XFS_DA_OP_OKNOENT to atribute removal operations.

    Discovered when testing on a SELinux enabled system by fsstress in
    test 070 by trying to remove a non-existent user.selinux attribute.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • When an inode is truncated down, speculative preallocation is
    removed from the inode. This should also reset the state bits for
    controlling whether preallocation is subsequently removed when the
    file is next closed. The flag is not being cleared, so repeated
    operations on a file that first involve a truncate (e.g. multiple
    repeated dd invocations on a file) give different file layouts for
    the second and subsequent invocations.

    Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the
    XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure
    that speculative delalloc is removed on files that have been
    truncated down.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • XFS inodes has several per-lifetime state fields that determine the
    behaviour of the inode. These state fields are not all reset when an
    inode is reused from the reclaimable state.

    This can lead to unexpected behaviour of the new inode such as
    speculative preallocation not being truncated away in the expected
    manner for local files until the inode is subsequently truncated,
    freed or cycles out of the cache. It can also lead to an inode being
    considered to be a filestream inode or having been truncated when
    that is not the case.

    Rework the reinitialisation of the inode when it is recycled to
    ensure that it is pristine before it is reused. While there, also
    fix the resetting of state flags in the recycling error paths so the
    inode does not become unreclaimable.

    Signed-off-by: Dave Chinner
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Hopefully last version. Base signing check on CAP_UNIX instead of
    tcon->unix_ext, also clean up the comments a bit more.

    According to Hongwei Sun's blog posting here:

    http://blogs.msdn.com/b/openspecification/archive/2009/04/10/smb-maximum-transmit-buffer-size-and-performance-tuning.aspx

    CAP_LARGE_WRITEX is ignored when signing is active. Also, the maximum
    size for a write without CAP_LARGE_WRITEX should be the maxBuf that
    the server sent in the NEGOTIATE request.

    Fix the wsize negotiation to take this into account. While we're at it,
    alter the other wsize definitions to use sizeof(WRITE_REQ) to allow for
    slightly larger amounts of data to potentially be written per request.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

23 Jun, 2011

2 commits


22 Jun, 2011

2 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: Fix decode_secinfo_maxsz
    NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
    NFSv4.1: Fix some issues with pnfs_generic_pg_test
    NFSv4.1: file layout must consider pg_bsize for coalescing
    pnfs-obj: No longer needed to take an extra ref at add_device
    SUNRPC: Ensure the RPC client only quits on fatal signals
    NFSv4: Fix a readdir regression
    nfs4.1: mark layout as bad on error path in _pnfs_return_layout
    nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
    NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
    NFS: (d)printks should use %zd for ssize_t arguments
    NFSv4.1: fix break condition in pnfs_find_lseg
    nfs4.1: fix several problems with _pnfs_return_layout
    NFSv4.1: allow zero fh array in filelayout decode layout
    NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
    NFSv4.1: Fix a refcounting issue in the pNFS device id cache
    NFSv4.1: deprecate headerpadsz in CREATE_SESSION
    NFS41: do not update isize if inode needs layoutcommit
    NLM: Don't hang forever on NLM unlock requests
    NFS: fix umount of pnfs filesystems

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: Fix oops in jbd2_journal_remove_journal_head()
    jbd2: Remove obsolete parameters in the comments for some jbd2 functions
    ext4: fixed tracepoints cleanup
    ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
    ext4: Fix max file size and logical block counting of extent format file
    ext4: correct comments for ext4_free_blocks()

    Linus Torvalds
     

21 Jun, 2011

2 commits