07 Jan, 2012

1 commit


04 Jan, 2012

1 commit


17 Dec, 2011

1 commit

  • …inux/kernel/git/mason/linux-btrfs

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: unplug every once and a while
    Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
    Btrfs: only set cache_generation if we setup the block group
    Btrfs: don't panic if orphan item already exists
    Btrfs: fix leaked space in truncate
    Btrfs: fix how we do delalloc reservations and how we free reservations on error
    Btrfs: deal with enospc from dirtying inodes properly
    Btrfs: fix num_workers_starting bug and other bugs in async thread
    BTRFS: Establish i_ops before calling d_instantiate
    Btrfs: add a cond_resched() into the worker loop
    Btrfs: fix ctime update of on-disk inode
    btrfs: keep orphans for subvolume deletion
    Btrfs: fix inaccurate available space on raid0 profile
    Btrfs: fix wrong disk space information of the files
    Btrfs: fix wrong i_size when truncating a file to a larger size
    Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

    * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: lower the dirty balance poll interval

    Linus Torvalds
     

16 Dec, 2011

2 commits

  • …/btrfs-work into integration

    Conflicts:
    fs/btrfs/inode.c

    Signed-off-by: Chris Mason <chris.mason@oracle.com>

    Chris Mason
     
  • Now that we're properly keeping track of delayed inode space we've been getting
    a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
    because a bunch of people call mark_inode_dirty, which is void so we can't
    return ENOSPC. This needs to be fixed in a few areas

    1) file_update_time - this updates the mtime and such when writing to a file,
    which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
    call btrfs_dirty_inode directly and return an error if we get one appropriately.

    2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
    setting ->setattr for symlinks, even though we should have been. This catches
    one of the cases where we were getting errors in mark_inode_dirty.

    3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
    instead of mark_inode_dirty. This lets us return errors properly for truncate
    and chown/anything related to setattr.

    4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
    print an error if we have one. The only remaining user we can't control for
    this is touch_atime(), but we don't really want to keep people from walking
    down the tree if we don't have space to save the atime update, so just complain
    but don't worry about it.

    With this patch xfstests 83 complains a handful of times instead of hundreds of
    times. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

15 Dec, 2011

1 commit

  • When we use raid0 as the data profile, df command may show us a very
    inaccurate value of the available space, which may be much less than the
    real one. It may make the users puzzled. Fix it by changing the calculation
    of the available space, and making it be more similar to a fake chunk
    allocation.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     

02 Dec, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix meta data raid-repair merge problem
    Btrfs: skip allocation attempt from empty cluster
    Btrfs: skip block groups without enough space for a cluster
    Btrfs: start search for new cluster at the beginning
    Btrfs: reset cluster's max_size when creating bitmap
    Btrfs: initialize new bitmaps' list
    Btrfs: fix oops when calling statfs on readonly device
    Btrfs: Don't error on resizing FS to same size
    Btrfs: fix deadlock on metadata reservation when evicting a inode
    Fix URL of btrfs-progs git repository in docs
    btrfs scrub: handle -ENOMEM from init_ipath()

    Linus Torvalds
     

01 Dec, 2011

1 commit

  • To reproduce this bug:

    # dd if=/dev/zero of=img bs=1M count=256
    # mkfs.btrfs img
    # losetup -r /dev/loop1 img
    # mount /dev/loop1 /mnt
    OOPS!!

    It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

    To fix this, instead of checking write-only devices, we check all open
    deivces:

    # df -h /dev/loop1
    Filesystem Size Used Avail Use% Mounted on
    /dev/loop1 250M 28K 238M 1% /mnt

    Signed-off-by: Li Zefan

    Li Zefan
     

17 Nov, 2011

3 commits


11 Nov, 2011

1 commit

  • Rename no_space_cache option to nospace_cache to be more consistent with
    the rest, where the simple prefix 'no' is used to negate an option.

    The option has been introduced during the -rc1 cycle and there are has not been
    widely used, so it's safe.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     

10 Nov, 2011

3 commits

  • Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer
    dereference on error paths, also we would leave all devices busy and
    leak fs_info with all sub-structures on error when trying to mount an
    already mounted fs to a different directory.

    Fix this by doing all allocations before trying to open any of the
    devices, adjust error path for mount-already-mounted-fs case.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • btrfs_parse_early_options() can fail due to error while scanning devices
    (-o device= option), but still strdup() subvol_name string:

    mount -o subvol=SUBV,device=BAD_DEVICE

    So free subvol_name string on error.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Don't leak subvol_name string in case multiple subvol= options are
    given. "The lastest option is effective" behavior (consistent with
    subvolid= and subvolrootid= options) is preserved.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

08 Nov, 2011

1 commit

  • On error path 'tree_root' is treed in 'free_fs_info()'.
    No need to free it explicitely. Noticed by SLUB in debug mode:

    Complete reproducer under usermode linux (discovered on real
    machine):

    bdev=/dev/ubda
    btr_root=/btr
    /mkfs.btrfs $bdev
    mount $bdev $btr_root
    mkdir $btr_root/subvols/
    cd $btr_root/subvols/
    /btrfs su cr foo
    /btrfs su cr bar
    mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar
    umount $btr_root/subvols/bar

    which gives

    device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda
    =============================================================================
    BUG kmalloc-2048: Object already free
    -----------------------------------------------------------------------------

    INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277
    INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277
    INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081
    INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968
    ...
    Call Trace:
    70b31948: [] print_trailer+0xe2/0x130
    70b31978: [] object_err+0x3a/0x50
    70b319a8: [] free_debug_processing+0x142/0x2a0
    70b319e0: [] btrfs_mount+0x55f/0x7f0
    70b319f8: [] __slab_free+0x221/0x2d0

    Signed-off-by: Sergei Trofimovich
    Cc: Arne Jansen
    Cc: Chris Mason
    Cc: David Sterba
    Signed-off-by: Chris Mason

    slyich@gmail.com
     

06 Nov, 2011

2 commits

  • This takes some of the free space in the btrfs super block
    to record information about most of the roots in the last four
    commits.

    It also adds a -o recovery to use the root history log when
    we're not able to read the tree of tree roots, the extent
    tree root, the device tree root or the csum root.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • fs_info has now ~9kb, more than fits into one page. This will cause
    mount failure when memory is too fragmented. Top space consumers are
    super block structures super_copy and super_for_commit, ~2.8kb each.
    Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

    Add a wrapper for freeing fs_info and all of it's dynamically allocated
    members.

    Signed-off-by: David Sterba

    David Sterba
     

24 Oct, 2011

1 commit

  • There's a missing test whether the path passed to subvol=path option
    during mount is a real subvolume, allowing any directory located in
    default subovlume to be passed and accepted for mount.

    (current btrfs progs prevent this early)
    $ btrfs subvol snapshot . p1-snap
    ERROR: '.' is not a subvolume

    (with "is subvolume?" test bypassed)
    $ btrfs subvol snapshot . p1-snap
    Create a snapshot of '.' in './p1-snap'

    $ btrfs subvol list -p .
    ID 258 parent 5 top level 5 path subvol
    ID 259 parent 5 top level 5 path subvol1
    ID 260 parent 5 top level 5 path default-subvol1
    ID 262 parent 5 top level 5 path p1/p1-snapshot
    ID 263 parent 259 top level 5 path subvol1/subvol1-snap

    The problem I see is that this makes a false impression of snapshotting the
    given subvolume but in fact snapshots the default one: a user expects outcome
    like ID 263 but in fact gets ID 262 .

    This patch makes mount fail with EINVAL with a message in syslog.

    Signed-off-by: David Sterba

    David Sterba
     

21 Oct, 2011

2 commits


20 Oct, 2011

3 commits

  • Some users have requested this and I've found I needed a way to disable cache
    loading without actually clearing the cache, so introduce the no_space_cache
    option. Before we check the super blocks cache generation field and if it was
    populated we always turned space caching on. Now we check this and set the
    space cache option on, and then parse the mount options so that if we want it
    off it get's turned off. Then we check the mount option all the places we do
    the caching work instead of checking the super's cache generation. This makes
    things more consistent and lets us turn space caching off. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We've only been able to mount with subvol= where whatever was a subvol
    within whatever root we had as the default. This allows us to mount -o
    subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Currently what we do is just wrong. We either

    1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong
    as we could walk into this subvol later on via another path and hilarity could
    ensue. Also we don't check the return value of d_splice_alias which isn't good
    either.

    or

    2) Do a d_find_alias() which we could have lost our dentry from cache at this
    point and found nothing.

    So use d_obtain_alias(). In the case that we already have the inode/dentry in
    cache we will get the correct dentry. If not we will get a disconnected dentry
    tree so if we walk into it later on everything will be connected up properly.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

09 Jul, 2011

1 commit


07 Jul, 2011

1 commit


08 Jun, 2011

1 commit


05 Jun, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
    btrfs: fix uninitialized variable warning
    btrfs: add helper for fs_info->closing
    Btrfs: add mount -o inode_cache
    btrfs: scrub: add explicit plugging
    btrfs: use btrfs_ino to access inode number
    Btrfs: don't save the inode cache if we are deleting this root
    btrfs: false BUG_ON when degraded
    Btrfs: don't save the inode cache in non-FS roots
    Btrfs: make sure we don't overflow the free space cache crc page
    Btrfs: fix uninit variable in the delayed inode code
    btrfs: scrub: don't reuse bios and pages
    Btrfs: leave spinning on lookup and map the leaf
    Btrfs: check for duplicate entries in the free space cache
    Btrfs: don't try to allocate from a block group that doesn't have enough space
    Btrfs: don't always do readahead
    Btrfs: try not to sleep as much when doing slow caching
    Btrfs: kill BTRFS_I(inode)->block_group
    Btrfs: don't look at the extent buffer level 3 times in a row
    Btrfs: map the node block when looking for readahead targets
    Btrfs: set range_start to the right start in count_range_bits
    ...

    Linus Torvalds
     

04 Jun, 2011

2 commits

  • This makes the inode map cache default to off until we
    fix the overflow problem when the free space crcs don't fit
    inside a single page.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Caching "we have already removed suid/caps" was overenthusiastic as merged.
    On network filesystems we might have had suid/caps set on another client,
    silently picked by this client on revalidate, all of that *without* clearing
    the S_NOSEC flag.

    AFAICS, the only reasonably sane way to deal with that is
    * new superblock flag; unless set, S_NOSEC is not going to be set.
    * local block filesystems set it in their ->mount() (more accurately,
    mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
    local block ones clear it)
    * if any network filesystem (or a cluster one) wants to use S_NOSEC,
    it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
    inode attribute changes are picked from other clients.

    It's not an earth-shattering hole (anybody that can set suid on another client
    will almost certainly be able to write to the file before doing that anyway),
    but it's a bug that needs fixing.

    Signed-off-by: Al Viro

    Al Viro
     

28 May, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (58 commits)
    Btrfs: use the device_list_mutex during write_dev_supers
    Btrfs: setup free ino caching in a more asynchronous way
    btrfs scrub: don't coalesce pages that are logically discontiguous
    Btrfs: return -ENOMEM in clear_extent_bit
    Btrfs: add mount -o auto_defrag
    Btrfs: using rcu lock in the reader side of devices list
    Btrfs: drop unnecessary device lock
    Btrfs: fix the race between remove dev and alloc chunk
    Btrfs: fix the race between reading and updating devices
    Btrfs: fix bh leak on __btrfs_open_devices path
    Btrfs: fix unsafe usage of merge_state
    Btrfs: allocate extent state and check the result properly
    fs/btrfs: Add missing btrfs_free_path
    Btrfs: check return value of btrfs_inc_extent_ref()
    Btrfs: return error to caller if read_one_inode() fails
    Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item
    Btrfs: return error code to caller when btrfs_del_item fails
    Btrfs: return error code to caller when btrfs_previous_item fails
    btrfs: fix typo 'testeing' -> 'testing'
    btrfs: typo: 'btrfS' -> 'btrfs'
    ...

    Linus Torvalds
     

27 May, 2011

2 commits

  • This will detect small random writes into files and
    queue the up for an auto defrag process. It isn't well suited to
    database workloads yet, but works for smaller files such as rpm, sqlite
    or bdb databases.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This sixth patch of eight in this cleancache series "opts-in"
    cleancache for btrfs. Filesystems must explicitly enable
    cleancache by calling cleancache_init_fs anytime an instance
    of the filesystem is mounted. Btrfs uses its own readpage
    which must be hooked, but all other cleancache hooks are in
    the VFS layer including the matching cleancache_flush_fs hook
    which must be called on unmount.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    [v6-v8: no changes]
    [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
    Signed-off-by: Dan Magenheimer
    Signed-off-by: Chris Mason
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Jan Beulich
    Cc: Andreas Dilger
    Cc: Ted Ts'o
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Nitin Gupta

    Dan Magenheimer
     

24 May, 2011

2 commits

  • Conflicts:
    fs/btrfs/tree-log.c
    fs/btrfs/volumes.c

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @r exists@
    local idexpression struct btrfs_path * x;
    expression ra,rb;
    position p1,p2;
    @@

    x = btrfs_alloc_path@p1(...)
    ... when != btrfs_free_path(x,...)
    when != if (...) { ... btrfs_free_path(x,...) ...}
    when != x = ra
    if(...) { ... when != x = rb
    when forall
    when != btrfs_free_path(x,...)
    \(return ; \| return@p2...; \) }

    @script:python@
    p1 << r.p1;
    p2 << r.p2;
    @@

    cocci.print_main("alloc",p1)
    cocci.print_secs("return",p2)
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Chris Mason

    Julia Lawall
     

23 May, 2011

2 commits


21 May, 2011

1 commit

  • Changelog V5 -> V6:
    - Fix oom when the memory load is high, by storing the delayed nodes into the
    root's radix tree, and letting btrfs inodes go.

    Changelog V4 -> V5:
    - Fix the race on adding the delayed node to the inode, which is spotted by
    Chris Mason.
    - Merge Chris Mason's incremental patch into this patch.
    - Fix deadlock between readdir() and memory fault, which is reported by
    Itaru Kitayama.

    Changelog V3 -> V4:
    - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
    inode in time.

    Changelog V2 -> V3:
    - Fix the race between the delayed worker and the task which does delayed items
    balance, which is reported by Tsutomu Itoh.
    - Modify the patch address David Sterba's comment.
    - Fix the bug of the cpu recursion spinlock, reported by Chris Mason

    Changelog V1 -> V2:
    - break up the global rb-tree, use a list to manage the delayed nodes,
    which is created for every directory and file, and used to manage the
    delayed directory name index items and the delayed inode item.
    - introduce a worker to deal with the delayed nodes.

    Compare with Ext3/4, the performance of file creation and deletion on btrfs
    is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
    such as inode item, directory name item, directory name index and so on.

    If we can do some delayed b+ tree insertion or deletion, we can improve the
    performance, so we made this patch which implemented delayed directory name
    index insertion/deletion and delayed inode update.

    Implementation:
    - introduce a delayed root object into the filesystem, that use two lists to
    manage the delayed nodes which are created for every file/directory.
    One is used to manage all the delayed nodes that have delayed items. And the
    other is used to manage the delayed nodes which is waiting to be dealt with
    by the work thread.
    - Every delayed node has two rb-tree, one is used to manage the directory name
    index which is going to be inserted into b+ tree, and the other is used to
    manage the directory name index which is going to be deleted from b+ tree.
    - introduce a worker to deal with the delayed operation. This worker is used
    to deal with the works of the delayed directory name index items insertion
    and deletion and the delayed inode update.
    When the delayed items is beyond the lower limit, we create works for some
    delayed nodes and insert them into the work queue of the worker, and then
    go back.
    When the delayed items is beyond the upper bound, we create works for all
    the delayed nodes that haven't been dealt with, and insert them into the work
    queue of the worker, and then wait for that the untreated items is below some
    threshold value.
    - When we want to insert a directory name index into b+ tree, we just add the
    information into the delayed inserting rb-tree.
    And then we check the number of the delayed items and do delayed items
    balance. (The balance policy is above.)
    - When we want to delete a directory name index from the b+ tree, we search it
    in the inserting rb-tree at first. If we look it up, just drop it. If not,
    add the key of it into the delayed deleting rb-tree.
    Similar to the delayed inserting rb-tree, we also check the number of the
    delayed items and do delayed items balance.
    (The same to inserting manipulation)
    - When we want to update the metadata of some inode, we cached the data of the
    inode into the delayed node. the worker will flush it into the b+ tree after
    dealing with the delayed insertion and deletion.
    - We will move the delayed node to the tail of the list after we access the
    delayed node, By this way, we can cache more delayed items and merge more
    inode updates.
    - If we want to commit transaction, we will deal with all the delayed node.
    - the delayed node will be freed when we free the btrfs inode.
    - Before we log the inode items, we commit all the directory name index items
    and the delayed inode update.

    I did a quick test by the benchmark tool[1] and found we can improve the
    performance of file creation by ~15%, and file deletion by ~20%.

    Before applying this patch:
    Create files:
    Total files: 50000
    Total time: 1.096108
    Average time: 0.000022
    Delete files:
    Total files: 50000
    Total time: 1.510403
    Average time: 0.000030

    After applying this patch:
    Create files:
    Total files: 50000
    Total time: 0.932899
    Average time: 0.000019
    Delete files:
    Total files: 50000
    Total time: 1.215732
    Average time: 0.000024

    [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3

    Many thanks for Kitayama-san's help!

    Signed-off-by: Miao Xie
    Reviewed-by: David Sterba
    Tested-by: Tsutomu Itoh
    Tested-by: Itaru Kitayama
    Signed-off-by: Chris Mason

    Miao Xie
     

13 May, 2011

1 commit


02 May, 2011

1 commit