20 Jul, 2010

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE
    Btrfs: fix CLONE ioctl destination file size expansion to block boundary
    Btrfs: fix split_leaf double split corner case

    Linus Torvalds
     
  • 1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check
    whether the donor file is append-only before writing to it.

    2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer
    overflow that allows a user to specify an out-of-bounds range to copy
    from the source file (if off + len wraps around). I haven't been able
    to successfully exploit this, but I'd imagine that a clever attacker
    could use this to read things he shouldn't. Even if it's not
    exploitable, it couldn't hurt to be safe.

    Signed-off-by: Dan Rosenberg
    cc: stable@kernel.org
    Signed-off-by: Chris Mason

    Dan Rosenberg
     
  • The CLONE and CLONE_RANGE ioctls round up the range of extents being
    cloned to the block size when the range to clone extends to the end of file
    (this is always the case with CLONE). It was then using that offset when
    extending the destination file's i_size. Fix this by not setting i_size
    beyond the originally requested ending offset.

    This bug was introduced by a22285a6 (2.6.35-rc1).

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • split_leaf was not properly balancing leaves when it was forced to
    split a leaf twice. This commit adds an extra push left and right
    before forcing the double split in hopes of getting the slot where
    we want to insert at either the start or end of the leaf.

    If the extra pushes do work, then we are able to avoid splitting twice
    and we keep the tree properly balanced.

    Signed-off-by: Chris Mason

    Chris Mason
     

06 Jul, 2010

1 commit


12 Jun, 2010

14 commits


11 Jun, 2010

4 commits

  • when we use remap_file_pages() to remap a file, remap_file_pages always return
    error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • refs can be used with uninitialized data if btrfs_lookup_extent_info()
    fails on the first pass through the loop. In the original code if that
    happens then check_path_shared() probably returns 1, this patch
    changes it to return 1 for safety.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     
  • Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we
    dropped passing the mode to the function that actually does the preallocation.
    This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • We cannot use the loop device which has been connected to a file in the btrf

    The reproduce steps is following:
    # dd if=/dev/zero of=vdev0 bs=1M count=1024
    # losetup /dev/loop0 vdev0
    # mkfs.btrfs /dev/loop0
    ...
    failed to zero device start -5

    The reason is that the btrfs don't implement either ->write_begin or ->write
    the VFS API, so we fix it by setting ->write to do_sync_write().

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     

28 May, 2010

2 commits

  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits)
    Btrfs: add more error checking to btrfs_dirty_inode
    Btrfs: allow unaligned DIO
    Btrfs: drop verbose enospc printk
    Btrfs: Fix block generation verification race
    Btrfs: fix preallocation and nodatacow checks in O_DIRECT
    Btrfs: avoid ENOSPC errors in btrfs_dirty_inode
    Btrfs: move O_DIRECT space reservation to btrfs_direct_IO
    Btrfs: rework O_DIRECT enospc handling
    Btrfs: use async helpers for DIO write checksumming
    Btrfs: don't walk around with task->state != TASK_RUNNING
    Btrfs: do aio_write instead of write
    Btrfs: add basic DIO read/write support
    direct-io: do not merge logically non-contiguous requests
    direct-io: add a hook for the fs to provide its own submit_bio function
    fs: allow short direct-io reads to be completed via buffered IO
    Btrfs: Metadata ENOSPC handling for balance
    Btrfs: Pre-allocate space for data relocation
    Btrfs: Metadata ENOSPC handling for tree log
    Btrfs: Metadata reservation for orphan inodes
    Btrfs: Introduce global metadata reservation
    ...

    Linus Torvalds
     

27 May, 2010

5 commits


26 May, 2010

4 commits

  • btrfs_dirty_inode tries to sneak in without much waiting or
    space reservation, mostly for performance reasons. This
    usually works well but can cause problems when there are
    many many writers.

    When btrfs_update_inode fails with ENOSPC, we fallback
    to a slower btrfs_start_transaction call that will reserve
    some space.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This moves the delalloc space reservation done for O_DIRECT
    into btrfs_direct_IO. This way we don't leak reserved space
    if the generic O_DIRECT write code errors out before it
    calls into btrfs_direct_IO.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This changes O_DIRECT write code to mark extents as delalloc
    while it is processing them. Yan Zheng has reworked the
    enospc accounting based on tracking delalloc extents and
    this makes it much easier to track enospc in the O_DIRECT code.

    There are a few space cases with the O_DIRECT code though,
    it only sets the EXTENT_DELALLOC bits, instead of doing
    EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because
    we don't want to mess with clearing the dirty and uptodate
    bits when things go wrong. This is important because there
    are no pages in the page cache, so any extent state structs
    that we put in the tree won't get freed by releasepage. We have
    to clear them ourselves as the DIO ends.

    With this commit, we reserve space at in btrfs_file_aio_write,
    and then as each btrfs_direct_IO call progresses it sets
    EXTENT_DELALLOC on the range.

    btrfs_get_blocks_direct is responsible for clearing the delalloc
    at the same time it drops the extent lock.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This adds:
    alias: devname:
    to some common kernel modules, which will allow the on-demand loading
    of the kernel module when the device node is accessed.

    Ideally all these modules would be compiled-in, but distros seems too
    much in love with their modularization that we need to cover the common
    cases with this new facility. It will allow us to remove a bunch of pretty
    useless init scripts and modprobes from init scripts.

    The static device node aliases will be carried in the module itself. The
    program depmod will extract this information to a file in the module directory:
    $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname
    # Device nodes to trigger on-demand module loading.
    microcode cpu/microcode c10:184
    fuse fuse c10:229
    ppp_generic ppp c108:0
    tun net/tun c10:200
    dm_mod mapper/control c10:235

    Udev will pick up the depmod created file on startup and create all the
    static device nodes which the kernel modules specify, so that these modules
    get automatically loaded when the device node is accessed:
    $ /sbin/udevd --debug
    ...
    static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
    static_dev_create_from_modules: mknod '/dev/fuse' c10:229
    static_dev_create_from_modules: mknod '/dev/ppp' c108:0
    static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
    static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
    udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
    udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666

    A few device nodes are switched to statically allocated numbers, to allow
    the static nodes to work. This might also useful for systems which still run
    a plain static /dev, which is completely unsafe to use with any dynamic minor
    numbers.

    Note:
    The devname aliases must be limited to the *common* and *single*instance*
    device nodes, like the misc devices, and never be used for conceptually limited
    systems like the loop devices, which should rather get fixed properly and get a
    control node for losetup to talk to, instead of creating a random number of
    device nodes in advance, regardless if they are ever used.

    This facility is to hide the mess distros are creating with too modualized
    kernels, and just to hide that these modules are not compiled-in, and not to
    paper-over broken concepts. Thanks! :)

    Cc: Greg Kroah-Hartman
    Cc: David S. Miller
    Cc: Miklos Szeredi
    Cc: Chris Mason
    Cc: Alasdair G Kergon
    Cc: Tigran Aivazian
    Cc: Ian Kent
    Signed-Off-By: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

25 May, 2010

6 commits

  • The async helper threads offload crc work onto all the
    CPUs, and make streaming writes much faster. This
    changes the O_DIRECT write code to use them. The only
    small complication was that we need to pass in the
    logical offset in the file for each bio, because we can't
    find it in the bio's pages.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Yan Zheng noticed two places we were doing a lot of work
    without task->state set to TASK_RUNNING. This sets the state
    properly after we get ready to sleep but decide not to.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • In order for AIO to work, we need to implement aio_write. This patch converts
    our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and
    nothing broke, and the AIO stuff magically started working. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This provides basic DIO support for reading and writing. It does not do the
    work to recover from mismatching checksums, that will come later. A few design
    changes have been made from Jim's code (sorry Jim!)

    1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO
    code in order to account for all of BTRFS's oddities, but thanks to that work it
    seems like the best bet is to just ignore compression and such and just opt to
    fallback on buffered IO.

    2) Fallback on buffered IO for compressed or inline extents. Jim's code did
    it's own buffering to make dio with compressed extents work. Now we just
    fallback onto normal buffered IO.

    3) Use ordered extents for the writes so that all of the

    lock_extent()
    lookup_ordered()

    type checks continue to work.

    4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
    DIO writes.

    I've tested this with fsx and everything works great. This patch depends on my
    dio and filemap.c patches to work. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch adds metadata ENOSPC handling for the balance code.
    It is consisted by following major changes:

    1. Avoid COW tree leave in the phrase of merging tree.

    2. Handle interaction with snapshot creation.

    3. make the backref cache can live across transactions.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • Pre-allocate space for data relocation. This can detect ENOPSC
    condition caused by fragmentation of free space.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng