24 Nov, 2011

2 commits

  • Dirty pages weren't being written back when an mmap'ed eCryptfs file was
    closed before the mapping was unmapped. Since f_ops->flush() is not
    called by the munmap() path, the lower file was simply being released.
    This patch flushes the eCryptfs file in the vm_ops->close() path.

    https://launchpad.net/bugs/870326

    Signed-off-by: Tyler Hicks
    Cc: stable@kernel.org [2.6.39+]

    Tyler Hicks
     
  • The file creation path prematurely called d_instantiate() and
    unlock_new_inode() before the eCryptfs inode info was fully
    allocated and initialized and before the eCryptfs metadata was written
    to the lower file.

    This could result in race conditions in subsequent file and inode
    operations leading to unexpected error conditions or a null pointer
    dereference while attempting to use the unallocated memory.

    https://launchpad.net/bugs/813146

    Signed-off-by: Tyler Hicks
    Cc: stable@kernel.org

    Tyler Hicks
     

19 Nov, 2011

2 commits


18 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-block:
    block: add missed trace_block_plug
    paride: fix potential information leak in pg_read()
    bio: change some signed vars to unsigned
    block: avoid unnecessary plug list flush
    cciss: auto engage SCSI mid layer at driver load time
    loop: cleanup set_status interface
    include/linux/bio.h: use a static inline function for bio_integrity_clone()
    loop: prevent information leak after failed read
    block: Always check length of all iov entries in blk_rq_map_user_iov()
    The Windows driver .inf disables ASPM on all cciss devices. Do the same.
    backing-dev: ensure wakeup_timer is deleted
    block: Revert "[SCSI] genhd: add a new attribute "alias" in gendisk"

    Linus Torvalds
     

17 Nov, 2011

3 commits


16 Nov, 2011

3 commits

  • This is just a cleanup patch to silence a static checker warning.

    The problem is that we cap "nr_iovecs" so it can't be larger than
    "UIO_MAXIOV" but we don't check for negative values. It turns out this is
    prevented at other layers, but logically it doesn't make sense to have
    negative nr_iovecs so making it unsigned is nicer.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Dan Carpenter
     
  • The doalloc arg in xfs_qm_dqattach_one() is a flag that indicates
    whether a new area to handle quota information will be allocated
    if needed. Originally, it was passed to xfs_qm_dqget(), but has
    been removed by the following commit (probably by mistake):

    commit 8e9b6e7fa4544ea8a0e030c8987b918509c8ff47
    Author: Christoph Hellwig
    Date: Sun Feb 8 21:51:42 2009 +0100

    xfs: remove the unused XFS_QMOPT_DQLOCK flag

    As the result, xfs_qm_dqget() called from xfs_qm_dqattach_one()
    never allocates the new area even if it is needed.

    This patch gives the doalloc arg to xfs_qm_dqget() in
    xfs_qm_dqattach_one() to fix this problem.

    Signed-off-by: Mitsuo Hayasaka
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mitsuo Hayasaka
     
  • On a corrupted file system the ->len field could be wrong leading to
    a buffer overflow.

    Reported-and-acked-by: Clement LECIGNE
    Signed-off-by: Dan Carpenter
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     

12 Nov, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: rename the option to nospace_cache
    Btrfs: handle bio_add_page failure gracefully in scrub
    Btrfs: fix deadlock caused by the race between relocation
    Btrfs: only map pages if we know we need them when reading the space cache
    Btrfs: fix orphan backref nodes
    Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
    Btrfs: fix unreleased path in btrfs_orphan_cleanup()
    Btrfs: fix no reserved space for writing out inode cache
    Btrfs: fix nocow when deleting the item
    Btrfs: tweak the delayed inode reservations again
    Btrfs: rework error handling in btrfs_mount()
    Btrfs: close devices on all error paths in open_ctree()
    Btrfs: avoid null dereference and leaks when bailing from open_ctree()
    Btrfs: fix subvol_name leak on error in btrfs_mount()
    Btrfs: fix memory leak in btrfs_parse_early_options()
    Btrfs: fix our reservations for updating an inode when completing io
    Btrfs: fix oops on NULL trans handle in btrfs_truncate
    btrfs: fix double-free 'tree_root' in 'btrfs_mount()'

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: fix force shutdown handling in xfs_end_io
    xfs: constify xfs_item_ops
    xfs: Fix possible memory corruption in xfs_readlink

    Linus Torvalds
     

11 Nov, 2011

11 commits

  • Rename no_space_cache option to nospace_cache to be more consistent with
    the rest, where the simple prefix 'no' is used to negate an option.

    The option has been introduced during the -rc1 cycle and there are has not been
    widely used, so it's safe.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
    dm based targets accept only one page per bio, thus making scrub always
    fails. This patch just submits the current bio when an error is encountered
    and starts a new one.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • We can not do flushable reservation for the relocation when we create snapshot,
    because it may make the transaction commit task and the flush task wait for
    each other and the deadlock happens.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • People have been running into a warning when loading space cache because the
    page is already mapped when trying to read in a bitmap. The way we read in
    entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry
    maps the entries if it needs to, and if it hits the end of the page it simply
    unmaps the page. That way we can unconditionally unmap the io_ctl before
    reading in the bitmap and we should stop hitting these warnings. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • If the root node of a fs/file tree is in the block group that is
    being relocated, but the others are not in the other block groups.
    when we create a snapshot for this tree between the relocation tree
    creation ends and ->create_reloc_tree is set to 0, Btrfs will create
    some backref nodes that are the lowest nodes of the backrefs cache.
    But we forget to add them into ->leaves list of the backref cache
    and deal with them, and at last, they will triggered BUG_ON().

    kernel BUG at fs/btrfs/relocation.c:239!

    This patch fixes it by adding them into ->leaves list of backref cache.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • When we did stress test for the space relocation, the deadlock happened.
    By debugging, We found it was caused by the carelessness that we forgot
    to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
    before we end the transaction handle, so the transaction commit task waited
    the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
    but that task waited the commit task to end the transaction commit, and
    the deadlock happened. Fix it.

    Signed-ff-by: Miao Xie

    Signed-off-by: Chris Mason

    Miao Xie
     
  • I-node cache forgets to reserve the space when writing out it. And when
    we do some stress test, such as synctest, it will trigger WARN_ON() in
    use_block_rsv().

    WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
    ...
    Call Trace:
    [] warn_slowpath_common+0x80/0x98
    [] warn_slowpath_null+0x15/0x17
    [] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
    [] ? __set_page_dirty_nobuffers+0xfe/0x108
    [] __btrfs_cow_block+0x118/0x3b5 [btrfs]
    [] btrfs_cow_block+0x103/0x14e [btrfs]
    [] btrfs_search_slot+0x249/0x6a4 [btrfs]
    [] btrfs_lookup_inode+0x2a/0x8a [btrfs]
    [] btrfs_update_inode+0xaa/0x141 [btrfs]
    [] btrfs_save_ino_cache+0xea/0x202 [btrfs]
    [] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
    [] commit_fs_roots+0xaa/0x158 [btrfs]
    [] btrfs_commit_transaction+0x405/0x731 [btrfs]
    [] ? wake_up_bit+0x25/0x25
    [] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
    [] btrfs_sync_file+0x16a/0x198 [btrfs]
    [] ? mntput+0x21/0x23
    [] vfs_fsync_range+0x18/0x21
    [] vfs_fsync+0x17/0x19
    [] do_fsync+0x29/0x3e
    [] sys_fsync+0xb/0xf
    [] system_call_fastpath+0x16/0x1b

    Sometimes it causes BUG_ON() in the reservation code of the delayed inode
    is triggered.

    So we must reserve enough space for inode cache.

    Note: If we can not reserve the enough space for inode cache, we will
    give up writing out it.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
    if we modify the result of it, the meta-data will be broken. fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Chris Mason
     
  • Josef sent along an incremental to the inode reservation
    code to make sure we try and fall back to directly updating
    the inode item if things go horribly wrong.

    This reworks that patch slightly, adding a fallback function
    that will always try to update the inode item directly without
    going through the delayed_inode code.

    Signed-off-by: Chris Mason

    Chris Mason
     

10 Nov, 2011

6 commits

  • This reverts commit aa6afca5bcaba8101f3ea09d5c3e4100b2b9f0e5.

    It escalates of some of the google-chrome SELinux problems with ptrace
    ("Check failed: pid_ > 0. Did not find zygote process"), and Andrew
    says that it is also causing mystery lockdep reports.

    Reported-by: Alex Villacís Lasso
    Requested-by: James Morris
    Requested-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer
    dereference on error paths, also we would leave all devices busy and
    leak fs_info with all sub-structures on error when trying to mount an
    already mounted fs to a different directory.

    Fix this by doing all allocations before trying to open any of the
    devices, adjust error path for mount-already-mounted-fs case.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Fix a bug introduced by 7e662854 where we would leave devices busy on
    certain error paths in open_ctree(). fs_info is guaranteed to be
    non-NULL now so it's safe to dereference it on all error paths.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Fix bugs introduced by 6c41761f. Firstly, after failing to allocate any
    of the tree roots (first 'goto fail' in open_ctree()) we would
    dereference a NULL fs_info pointer in free_fs_info(). Secondly, after
    failures from init_srcu_struct(), setup_bdi() and new_inode() we would
    leak all earlier allocated roots: fs_info fields haven't been
    initialized yet so free_fs_info() is rendered useless.

    Fix this by initializing fs_info pointer and fs_info fields before any
    allocations happen.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • btrfs_parse_early_options() can fail due to error while scanning devices
    (-o device= option), but still strdup() subvol_name string:

    mount -o subvol=SUBV,device=BAD_DEVICE

    So free subvol_name string on error.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Don't leak subvol_name string in case multiple subvol= options are
    given. "The lastest option is effective" behavior (consistent with
    subvolid= and subvolrootid= options) is preserved.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

09 Nov, 2011

5 commits

  • People have been reporting ENOSPC crashes in finish_ordered_io. This is because
    we try to steal from the delalloc block rsv to satisfy a reservation to update
    the inode. The problem with this is we don't explicitly save space for updating
    the inode when doing delalloc. This is kind of a problem and we've gotten away
    with this because way back when we just stole from the delalloc reserve without
    any questions, and this worked out fine because generally speaking the leaf had
    been modified either by the mtime update when we did the original write or
    because we just updated the leaf when we inserted the file extent item, only on
    rare occasions had the leaf not actually been modified, and that was still ok
    because we'd just use a block or two out of the over-reservation that is
    delalloc.

    Then came the delayed inode stuff. This is amazing, except it wants a full
    reservation for updating the inode since it may do it at some point down the
    road after we've written the blocks and we have to recow everything again. This
    worked out because the delayed inode stuff just stole from the global reserve,
    that is until recently when I changed that because it caused other problems.

    So here we are, we're doing everything right and being screwed for it. So take
    an extra reservation for the inode at delalloc reservation time and carry it
    through the life of the delalloc reservation. If we need it we can steal it in
    the delayed inode stuff. If we have already stolen it try and do a normal
    metadata reservation. If that fails try to steal from the delalloc reservation.
    If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
    solve this and in the meantime we'll steal from the global reserve.

    With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
    any problems.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • If we fail to reserve space in the transaction during truncate, we can
    error out with a NULL trans handle. The cleanup code needs an extra
    check to make sure we aren't trying to use the bad handle.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Ensure ioend->io_error gets propagated back to e.g. AIO completions.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Christoph Hellwig
     
  • The log item ops aren't nessecarily the biggest exploit vector, but marking
    them const is easy enough. Also remove the unused xfs_item_ops_t typedef
    while we're at it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Alex Elder

    Christoph Hellwig
     
  • Fixes a possible memory corruption when the link is larger than
    MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
    S_ISLNK assert, since the inode mode is checked previously in
    xfs_readlink_by_handle() and via VFS.

    Updated to address concerns raised by Ben Hutchings about the loose
    attention paid to 32- vs 64-bit values, and the lack of handling a
    potentially negative pathlen value:
    - Changed type of "pathlen" to be xfs_fsize_t, to match that of
    ip->i_d.di_size
    - Added checking for a negative pathlen to the too-long pathlen
    test, and generalized the message that gets reported in that case
    to reflect the change
    As a result, if a negative pathlen were encountered, this function
    would return EFSCORRUPTED (and would fail an assertion for a debug
    build)--just as would a too-long pathlen.

    Signed-off-by: Alex Elder
    Signed-off-by: Carlos Maiolino
    Reviewed-by: Christoph Hellwig

    Carlos Maiolino
     

08 Nov, 2011

5 commits

  • Mountpoint crossing is similar to following procfs symlinks - we do
    not get ->d_revalidate() called for dentry we have arrived at, with
    unpleasant consequences for NFS4.

    Simple way to reproduce the problem in mainline:

    cat >/tmp/a.c <
    #include
    #include
    main()
    {
    struct flock fl = {.l_type = F_RDLCK, .l_whence = SEEK_SET, .l_len = 1};
    if (fcntl(0, F_SETLK, &fl))
    perror("setlk");
    }
    EOF
    cc /tmp/a.c -o /tmp/test

    then on nfs4:

    mount --bind file1 file2
    /tmp/test < file1 # ok
    /tmp/test < file2 # spews "setlk: No locks available"...

    What happens is the missing call of ->d_revalidate() after mountpoint
    crossing and that's where NFS4 would issue OPEN request to server.

    The fix is simple - treat mountpoint crossing the same way we deal with
    following procfs-style symlinks. I.e. set LOOKUP_JUMPED...

    Cc: stable@kernel.org
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • On error path 'tree_root' is treed in 'free_fs_info()'.
    No need to free it explicitely. Noticed by SLUB in debug mode:

    Complete reproducer under usermode linux (discovered on real
    machine):

    bdev=/dev/ubda
    btr_root=/btr
    /mkfs.btrfs $bdev
    mount $bdev $btr_root
    mkdir $btr_root/subvols/
    cd $btr_root/subvols/
    /btrfs su cr foo
    /btrfs su cr bar
    mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar
    umount $btr_root/subvols/bar

    which gives

    device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda
    =============================================================================
    BUG kmalloc-2048: Object already free
    -----------------------------------------------------------------------------

    INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277
    INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277
    INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081
    INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968
    ...
    Call Trace:
    70b31948: [] print_trailer+0xe2/0x130
    70b31978: [] object_err+0x3a/0x50
    70b319a8: [] free_debug_processing+0x142/0x2a0
    70b319e0: [] btrfs_mount+0x55f/0x7f0
    70b319f8: [] __slab_free+0x221/0x2d0

    Signed-off-by: Sergei Trofimovich
    Cc: Arne Jansen
    Cc: Chris Mason
    Cc: David Sterba
    Signed-off-by: Chris Mason

    slyich@gmail.com
     
  • Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • * git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Cleanup byte-range locking code style
    CIFS: Simplify setlk error handling for mandatory locking

    Linus Torvalds
     
  • * git://git.infradead.org/mtd-2.6: (226 commits)
    mtd: tests: annotate as DANGEROUS in Kconfig
    mtd: tests: don't use mtd0 as a default
    mtd: clean up usage of MTD_DOCPROBE_ADDRESS
    jffs2: add compr=lzo and compr=zlib options
    jffs2: implement mount option parsing and compression overriding
    mtd: nand: initialize ops.mode
    mtd: provide an alias for the redboot module name
    mtd: m25p80: don't probe device which has status of 'disabled'
    mtd: nand_h1900 never worked
    mtd: Add DiskOnChip G3 support
    mtd: m25p80: add EON flash EN25Q32B into spi flash id table
    mtd: mark block device queue as non-rotational
    mtd: r852: make r852_pm_ops static
    mtd: m25p80: add support for at25df321a spi data flash
    mtd: mxc_nand: preset_v1_v2: unlock all NAND flash blocks
    mtd: nand: switch `check_pattern()' to standard `memcmp()'
    mtd: nand: invalidate cache on unaligned reads
    mtd: nand: do not scan bad blocks with NAND_BBT_NO_OOB set
    mtd: nand: wait to set BBT version
    mtd: nand: scrub BBT on ECC errors
    ...

    Fix up trivial conflicts:
    - arch/arm/mach-at91/board-usb-a9260.c
    Merged into board-usb-a926x.c
    - drivers/mtd/maps/lantiq-flash.c
    add_mtd_partitions -> mtd_device_register vs changed to use
    mtd_device_parse_register.

    Linus Torvalds