18 Jan, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
    Btrfs: forced readonly mounts on errors
    btrfs: Require CAP_SYS_ADMIN for filesystem rebalance
    Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check
    btrfs: Fix memory leak in btrfs_read_fs_root_no_radix()
    btrfs: check NULL or not
    btrfs: Don't pass NULL ptr to func that may deref it.
    btrfs: mount failure return value fix
    btrfs: Mem leak in btrfs_get_acl()
    btrfs: fix wrong free space information of btrfs
    btrfs: make the chunk allocator utilize the devices better
    btrfs: restructure find_free_dev_extent()
    btrfs: fix wrong calculation of stripe size
    btrfs: try to reclaim some space when chunk allocation fails
    btrfs: fix wrong data space statistics
    fs/btrfs: Fix build of ctree
    Btrfs: fix off by one while setting block groups readonly
    Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls
    Btrfs: Add readonly snapshots support
    Btrfs: Refactor btrfs_ioctl_snap_create()
    btrfs: Extract duplicate decompress code
    ...

    Linus Torvalds
     
  • This patch comes from "Forced readonly mounts on errors" ideas.

    As we know, this is the first step in being more fault tolerant of disk
    corruptions instead of just using BUG() statements.

    The major content:
    - add a framework for generating errors that should result in filesystems
    going readonly.
    - keep FS state in disk super block.
    - make sure that all of resource will be freed and released at umount time.
    - make sure that fter FS is forced readonly on error, there will be no more
    disk change before FS is corrected. For this, we should stop write operation.

    After this patch is applied, the conversion from BUG() to such a framework can
    happen incrementally.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    liubo
     

17 Jan, 2011

2 commits

  • When we store data by raid profile in btrfs with two or more different size
    disks, df command shows there is some free space in the filesystem, but the
    user can not write any data in fact, df command shows the wrong free space
    information of btrfs.

    # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
    # btrfs-show
    Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
    Total devices 2 FS bytes used 28.00KB
    devid 1 size 5.01GB used 2.03GB path /dev/sda9
    devid 2 size 10.00GB used 2.01GB path /dev/sda10
    # btrfs device scan /dev/sda9 /dev/sda10
    # mount /dev/sda9 /mnt
    # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
    (fill the filesystem)
    # sync
    # df -TH
    Filesystem Type Size Used Avail Use% Mounted on
    /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt
    # btrfs-show
    Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
    Total devices 2 FS bytes used 3.99GB
    devid 1 size 5.01GB used 5.01GB path /dev/sda9
    devid 2 size 10.00GB used 4.99GB path /dev/sda10

    It is because btrfs cannot allocate chunks when one of the pairing disks has
    no space, the free space on the other disks can not be used for ever, and should
    be subtracted from the total space, but btrfs doesn't subtract this space from
    the total. It is strange to the user.

    This patch fixes it by calcing the free space that can be used to allocate
    chunks.

    Implementation:
    1. get all the devices free space, and align them by stripe length.
    2. sort the devices by the free space.
    3. check the free space of the devices,
    3.1. if it is not zero, and then check the number of the devices that has
    more free space than this device,
    if the number of the devices is beyond the min stripe number, the free
    space can be used, and add into total free space.
    if the number of the devices is below the min stripe number, we can not
    use the free space, the check ends.
    3.2. if the free space is zero, check the next devices, goto 3.1

    This implementation is just likely fake chunk allocation.

    After appling this patch, df can show correct space information:
    # df -TH
    Filesystem Type Size Used Avail Use% Mounted on
    /dev/sda9 btrfs 17G 8.6G 0 100% /mnt

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Josef has implemented mixed data/metadata chunks, we must add those chunks'
    space just like data chunks.

    Signed-off-by: Miao Xie
    Reviewed-by: Josef Bacik
    Signed-off-by: Chris Mason

    Miao Xie
     

13 Jan, 2011

1 commit


22 Dec, 2010

2 commits

  • Lzo is a much faster compression algorithm than gzib, so would allow
    more users to enable transparent compression, and some users can
    choose from compression ratio and speed for different applications

    Usage:

    # mount -t btrfs -o compress[=] dev /mnt
    or
    # mount -t btrfs -o compress-force[=] dev /mnt

    "-o compress" without argument is still allowed for compatability.

    Compatibility:

    If we mount a filesystem with lzo compression, it will not be able be
    mounted in old kernels. One reason is, otherwise btrfs will directly
    dump compressed data, which sits in inline extent, to user.

    Performance:

    The test copied a linux source tarball (~400M) from an ext4 partition
    to the btrfs partition, and then extracted it.

    (time in second)
    lzo zlib nocompress
    copy: 10.6 21.7 14.9
    extract: 70.1 94.4 66.6

    (data size in MB)
    lzo zlib nocompress
    copy: 185.87 108.69 394.49
    extract: 193.80 132.36 381.21

    Changelog:

    v1 -> v2:
    - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
    - Add incompability flag.
    - Fix error handling in compress code.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • Make the code aware of compression type, instead of always assuming
    zlib compression.

    Also make the zlib workspace function as common code for all
    compression types.

    Signed-off-by: Li Zefan

    Li Zefan
     

15 Dec, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: prevent RAID level downgrades when space is low
    Btrfs: account for missing devices in RAID allocation profiles
    Btrfs: EIO when we fail to read tree roots
    Btrfs: fix compiler warnings
    Btrfs: Make async snapshot ioctl more generic
    Btrfs: pwrite blocked when writing from the mmaped buffer of the same page
    Btrfs: Fix a crash when mounting a subvolume
    Btrfs: fix sync subvol/snapshot creation
    Btrfs: Fix page leak in compressed writeback path
    Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots
    Btrfs: fixup return code for btrfs_del_orphan_item
    Btrfs: do not do fast caching if we are allocating blocks for tree_root
    Btrfs: deal with space cache errors better
    Btrfs: fix use after free in O_DIRECT

    Linus Torvalds
     

11 Dec, 2010

1 commit

  • We should drop dentry before deactivating the superblock, otherwise
    we can hit this bug:

    BUG: Dentry f349a690{i=100,n=/} still in use (1) [unmount of btrfs loop1]
    ...

    Steps to reproduce the bug:

    # mount /dev/loop1 /mnt
    # mkdir save
    # btrfs subvolume snapshot /mnt save/snap1
    # umount /mnt
    # mount -o subvol=save/snap1 /dev/loop1 /mnt
    (crash)

    Reported-by: Michael Niederle
    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     

30 Nov, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
    Btrfs: don't use migrate page without CONFIG_MIGRATION
    Btrfs: deal with DIO bios that span more than one ordered extent
    Btrfs: setup blank root and fs_info for mount time
    Btrfs: fix fiemap
    Btrfs - fix race between btrfs_get_sb() and umount
    Btrfs: update inode ctime when using links
    Btrfs: make sure new inode size is ok in fallocate
    Btrfs: fix typo in fallocate to make it honor actual size
    Btrfs: avoid NULL pointer deref in try_release_extent_buffer
    Btrfs: make btrfs_add_nondir take parent inode as an argument
    Btrfs: hold i_mutex when calling btrfs_log_dentry_safe
    Btrfs: use dget_parent where we can UPDATED
    Btrfs: fix more ESTALE problems with NFS
    Btrfs: handle NFS lookups properly
    btrfs: make 1-bit signed fileds unsigned
    btrfs: Show device attr correctly for symlinks
    btrfs: Set file size correctly in file clone
    btrfs: Check if dest_offset is block-size aligned before cloning file
    Btrfs: handle the space_cache option properly
    btrfs: Fix early enospc because 'unused' calculated with wrong sign.
    ...

    Linus Torvalds
     

28 Nov, 2010

2 commits

  • There is a problem with how we use sget, it searches through the list of supers
    attached to the fs_type looking for a super with the same fs_devices as what
    we're trying to mount. This depends on sb->s_fs_info being filled, but we don't
    fill that in until we get to btrfs_fill_super, so we could hit supers on the
    fs_type super list that have a null s_fs_info. In order to fix that we need to
    go ahead and setup a blank root with a blank fs_info to hold fs_devices, that
    way our test will work out right and then we can set s_fs_info in
    btrfs_set_super, and then open_ctree will simply use our pre-allocated root and
    fs_info when setting everything up. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • When mounting a btrfs file system btrfs_test_super() may attempt to
    use sb->s_fs_info, the btrfs root, of a super block that is going away
    and that has had the btrfs root set to NULL in its ->put_super(). But
    if the super block is going away it cannot be an existing super block
    so we can return false in this case.

    Signed-off-by: Ian Kent
    Signed-off-by: Chris Mason

    Ian Kent
     

22 Nov, 2010

1 commit

  • When I added the clear_cache option I screwed up and took the break out of
    the space_cache case statement, so whenever you mount with space_cache you also
    get clear_cache, which does you no good if you say set space_cache in fstab so
    it always gets set. This patch adds the break back in properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

31 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (39 commits)
    Btrfs: deal with errors from updating the tree log
    Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed
    Btrfs: make SNAP_DESTROY async
    Btrfs: add SNAP_CREATE_ASYNC ioctl
    Btrfs: add START_SYNC, WAIT_SYNC ioctls
    Btrfs: async transaction commit
    Btrfs: fix deadlock in btrfs_commit_transaction
    Btrfs: fix lockdep warning on clone ioctl
    Btrfs: fix clone ioctl where range is adjacent to extent
    Btrfs: fix delalloc checks in clone ioctl
    Btrfs: drop unused variable in block_alloc_rsv
    Btrfs: cleanup warnings from gcc 4.6 (nonbugs)
    Btrfs: Fix variables set but not read (bugs found by gcc 4.6)
    Btrfs: Use ERR_CAST helpers
    Btrfs: use memdup_user helpers
    Btrfs: fix raid code for removing missing drives
    Btrfs: Switch the extent buffer rbtree into a radix tree
    Btrfs: restructure try_release_extent_buffer()
    Btrfs: use the flusher threads for delalloc throttling
    Btrfs: tune the chunk allocation to 5% of the FS as metadata
    ...

    Fix up trivial conflicts in fs/btrfs/super.c and fs/fs-writeback.c, and
    remove use of INIT_RCU_HEAD in fs/btrfs/extent_io.c (that init macro was
    useless and removed in commit 5e8067adfdba: "rcu head remove init")

    Linus Torvalds
     

30 Oct, 2010

3 commits

  • Add a mount option user_subvol_rm_allowed that allows users to delete a
    (potentially non-empty!) subvol when they would otherwise we allowed to do
    an rmdir(2). We duplicate the may_delete() checks from the core VFS code
    to implement identical security checks (minus the directory size check).
    We additionally require that the user has write+exec permission on the
    subvol root inode.

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • These are all the cases where a variable is set, but not read which are
    not bugs as far as I can see, but simply leftovers.

    Still needs more review.

    Found by gcc 4.6's new warnings

    Signed-off-by: Andi Kleen
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Andi Kleen
     
  • Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
    clear what is the purpose of the operation, which otherwise looks like a
    no-op.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    type T;
    T x;
    identifier f;
    @@

    T f (...) { }

    @@
    expression x;
    @@

    - ERR_PTR(PTR_ERR(x))
    + ERR_CAST(x)
    //

    Signed-off-by: Julia Lawall
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Julia Lawall
     

29 Oct, 2010

4 commits

  • Conflicts:
    fs/btrfs/extent-tree.c

    Signed-off-by: Chris Mason

    Chris Mason
     
  • If something goes wrong with the free space cache we need a way to make sure
    it's not loaded on mount and that it's cleared for everybody. When you pass the
    clear_cache option it will make it so all block groups are setup to be cleared,
    which keeps them from being loaded and then they will be truncated when the
    transaction is committed. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • In order to save free space cache, we need an inode to hold the data, and we
    need a special item to point at the right inode for the right block group. So
    first, create a special item that will point to the right inode, and the number
    of extent entries we will have and the number of bitmaps we will have. We
    truncate and pre-allocate space everytime to make sure it's uptodate.

    This feature will be turned on as soon as you mount with -o space_cache, however
    it is safe to boot into old kernels, they will just generate the cache the old
    fashion way. When you boot back into a newer kernel we will notice that we
    modified and not the cache and automatically discard the cache.

    Signed-off-by: Josef Bacik

    Josef Bacik
     

23 Oct, 2010

2 commits

  • If we failed to find the root subvol id, or the subvol=, we would
    deactivate the locked super and close the devices. The problem is at this point
    we have gotten the SB all setup, which includes setting super_operations, so
    when we'd deactiveate the super, we'd do a close_ctree() which closes the
    devices, so we'd end up closing the devices twice. So if you do something like
    this

    mount /dev/sda1 /mnt/test1
    mount /dev/sda1 /mnt/test2 -o subvol=xxx
    umount /mnt/test1

    it would blow up (if subvol xxx doesn't exist). This patch fixes that problem.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • The new ENOSPC stuff breaks out the raid types which breaks the way we were
    reporting df to the system. This fixes it back so that Available is the total
    space available to data and used is the actual bytes used by the filesystem.
    This means that Available is Total - data used - all of the metadata space.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

10 Aug, 2010

1 commit

  • NB: do we want btrfs_wait_ordered_range() on eviction of
    inodes with positive i_nlink on subvolume with zero root_refs?
    If not, btrfs_evict_inode() can be simplified by unconditionally
    bailing out in case of i_nlink > 0 in the very beginning...

    Signed-off-by: Al Viro

    Al Viro
     

12 Jun, 2010

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: The file argument for fsync() is never null
    Btrfs: handle ERR_PTR from posix_acl_from_xattr()
    Btrfs: avoid BUG when dropping root and reference in same transaction
    Btrfs: prohibit a operation of changing acl's mask when noacl mount option used
    Btrfs: should add a permission check for setfacl
    Btrfs: btrfs_lookup_dir_item() can return ERR_PTR
    Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs
    Btrfs: unwind after btrfs_start_transaction() errors
    Btrfs: btrfs_iget() returns ERR_PTR
    Btrfs: handle kzalloc() failure in open_ctree()
    Btrfs: handle error returns from btrfs_lookup_dir_item()
    Btrfs: Fix BUG_ON for fs converted from extN
    Btrfs: Fix null dereference in relocation.c
    Btrfs: fix remap_file_pages error
    Btrfs: uninitialized data is check_path_shared()
    Btrfs: fix fallocate regression
    Btrfs: fix loop device on top of btrfs

    Linus Torvalds
     
  • btrfs_iget() returns an ERR_PTR() on failure and not null.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     
  • If btrfs_lookup_dir_item() fails, we should can just let the mount fail
    with an error.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     

28 May, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits)
    Btrfs: add more error checking to btrfs_dirty_inode
    Btrfs: allow unaligned DIO
    Btrfs: drop verbose enospc printk
    Btrfs: Fix block generation verification race
    Btrfs: fix preallocation and nodatacow checks in O_DIRECT
    Btrfs: avoid ENOSPC errors in btrfs_dirty_inode
    Btrfs: move O_DIRECT space reservation to btrfs_direct_IO
    Btrfs: rework O_DIRECT enospc handling
    Btrfs: use async helpers for DIO write checksumming
    Btrfs: don't walk around with task->state != TASK_RUNNING
    Btrfs: do aio_write instead of write
    Btrfs: add basic DIO read/write support
    direct-io: do not merge logically non-contiguous requests
    direct-io: add a hook for the fs to provide its own submit_bio function
    fs: allow short direct-io reads to be completed via buffered IO
    Btrfs: Metadata ENOSPC handling for balance
    Btrfs: Pre-allocate space for data relocation
    Btrfs: Metadata ENOSPC handling for tree log
    Btrfs: Metadata reservation for orphan inodes
    Btrfs: Introduce global metadata reservation
    ...

    Linus Torvalds
     

26 May, 2010

1 commit

  • This adds:
    alias: devname:
    to some common kernel modules, which will allow the on-demand loading
    of the kernel module when the device node is accessed.

    Ideally all these modules would be compiled-in, but distros seems too
    much in love with their modularization that we need to cover the common
    cases with this new facility. It will allow us to remove a bunch of pretty
    useless init scripts and modprobes from init scripts.

    The static device node aliases will be carried in the module itself. The
    program depmod will extract this information to a file in the module directory:
    $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname
    # Device nodes to trigger on-demand module loading.
    microcode cpu/microcode c10:184
    fuse fuse c10:229
    ppp_generic ppp c108:0
    tun net/tun c10:200
    dm_mod mapper/control c10:235

    Udev will pick up the depmod created file on startup and create all the
    static device nodes which the kernel modules specify, so that these modules
    get automatically loaded when the device node is accessed:
    $ /sbin/udevd --debug
    ...
    static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
    static_dev_create_from_modules: mknod '/dev/fuse' c10:229
    static_dev_create_from_modules: mknod '/dev/ppp' c108:0
    static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
    static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
    udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
    udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666

    A few device nodes are switched to statically allocated numbers, to allow
    the static nodes to work. This might also useful for systems which still run
    a plain static /dev, which is completely unsafe to use with any dynamic minor
    numbers.

    Note:
    The devname aliases must be limited to the *common* and *single*instance*
    device nodes, like the misc devices, and never be used for conceptually limited
    systems like the loop devices, which should rather get fixed properly and get a
    control node for losetup to talk to, instead of creating a random number of
    device nodes in advance, regardless if they are ever used.

    This facility is to hide the mess distros are creating with too modualized
    kernels, and just to hide that these modules are not compiled-in, and not to
    paper-over broken concepts. Thanks! :)

    Cc: Greg Kroah-Hartman
    Cc: David S. Miller
    Cc: Miklos Szeredi
    Cc: Chris Mason
    Cc: Alasdair G Kergon
    Cc: Tigran Aivazian
    Cc: Ian Kent
    Signed-Off-By: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

25 May, 2010

3 commits


06 Apr, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: add check for changed leaves in setup_leaf_for_split
    Btrfs: create snapshot references in same commit as snapshot
    Btrfs: fix small race with delalloc flushing waitqueue's
    Btrfs: use add_to_page_cache_lru, use __page_cache_alloc
    Btrfs: fix chunk allocate size calculation
    Btrfs: kill max_extent mount option
    Btrfs: fail to mount if we have problems reading the block groups
    Btrfs: check btrfs_get_extent return for IS_ERR()
    Btrfs: handle kmalloc() failure in inode lookup ioctl
    Btrfs: dereferencing freed memory
    Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk()
    Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree()
    Btrfs: Remove unnecessary finish_wait() in wait_current_trans()
    Btrfs: add NULL check for do_walk_down()
    Btrfs: remove duplicate include in ioctl.c

    Fix trivial conflict in fs/btrfs/compression.c due to slab.h include
    cleanups.

    Linus Torvalds
     

31 Mar, 2010

1 commit

  • As Yan pointed out, theres not much reason for all this complicated math to
    account for file extents being split up into max_extent chunks, since they are
    likely to all end up in the same leaf anyway. Since there isn't much reason to
    use max_extent, just remove the option altogether so we have one less thing we
    need to test.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

15 Mar, 2010

4 commits

  • Use memparse() instead of its own private implementation.

    Signed-off-by: Akinobu Mita
    Cc: Chris Mason
    Cc: linux-btrfs@vger.kernel.org
    Signed-off-by: Chris Mason

    Akinobu Mita
     
  • The way we report df usage is way confusing for everybody, including some other
    utilities (bacula for one). So this patch makes df a little bit more
    understandable. First we make used actually count the total amount of used
    space in all space info's. This will give us a real view of how much disk space
    is in use. Second, for blocks available, only count data space. This makes
    things like bacula work because it says 0 when you can no longer write anymore
    data to the disk. I think this is a nice compromise, since you will end up with
    something like the following

    [root@alpha ~]# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/VolGroup-lv_root
    148G 30G 111G 21% /
    /dev/sda1 194M 116M 68M 64% /boot
    tmpfs 985M 12K 985M 1% /dev/shm
    /dev/mapper/VolGroup-LogVol02
    145G 140G 0 100% /mnt/btrfs-test

    Compare this with btrfsctl -i output

    [root@alpha btrfs-progs-unstable]# ./btrfsctl -i /mnt/btrfs-test/
    Metadata, DUP: total=4.62GB, used=2.46GB
    System, DUP: total=8.00MB, used=24.00KB
    Data: total=134.80GB, used=134.80GB
    Metadata: total=8.00MB, used=0.00
    System: total=4.00MB, used=0.00
    operation complete

    This way we show that there is no more data space to be used, but we have
    another 5GB of space left for metadata. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Since theres not a good way to make sure the user sees the original default root
    tree id, and not to mention it's 5 so is way different than any other volume,
    just make subvol=0 mount the original default root. This makes it a bit easier
    for users to handle in the long run. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This work is in preperation for being able to set a different root as the
    default mounting root.

    There is currently a problem with how we mount subvolumes. We cannot currently
    mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
    default subvolume. So say you take a snapshot of the default subvolume and call
    it snap1, and then take a snapshot of snap1 and call it snap2, so now you have

    /
    /snap1
    /snap1/snap2

    as your available volumes. Currently you can only mount / and /snap1,
    you cannot mount /snap1/snap2. To fix this problem instead of passing
    subvolid= you must pass in subvolid=, where is
    the tree id that gets spit out via the subvolume listing you get from
    the subvolume listing patches (btrfs filesystem list). This allows us
    to mount /, /snap1 and /snap1/snap2 as the root volume.

    In addition to the above, we also now read the default dir item in the
    tree root to get the root key that it points to. For now this just
    points at what has always been the default subvolme, but later on I plan
    to change it to point at whatever root you want to be the new default
    root, so you can just set the default mount and not have to mount with
    -o subvolid=. I tested this out with the above scenario and it
    worked perfectly. Thanks,

    mount -o subvol operates inside the selected subvolid. For example:

    mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt

    /mnt will have the snap1 directory for the subvolume with id
    256.

    mount -o subvol=snap /dev/xxx /mnt

    /mnt will be the snap directory of whatever the default subvolume
    is.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik