23 Sep, 2013

1 commit

  • Pull btrfs fixes from Chris Mason:
    "These are mostly bug fixes and a two small performance fixes. The
    most important of the bunch are Josef's fix for a snapshotting
    regression and Mark's update to fix compile problems on arm"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
    Btrfs: create the uuid tree on remount rw
    btrfs: change extent-same to copy entire argument struct
    Btrfs: dir_inode_operations should use btrfs_update_time also
    btrfs: Add btrfs: prefix to kernel log output
    btrfs: refuse to remount read-write after abort
    Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
    Btrfs: don't leak transaction in btrfs_sync_file()
    Btrfs: add the missing mutex unlock in write_all_supers()
    Btrfs: iput inode on allocation failure
    Btrfs: remove space_info->reservation_progress
    Btrfs: kill delay_iput arg to the wait_ordered functions
    Btrfs: fix worst case calculator for space usage
    Revert "Btrfs: rework the overcommit logic to be based on the total size"
    Btrfs: improve replacing nocow extents
    Btrfs: drop dir i_size when adding new names on replay
    Btrfs: replay dir_index items before other items
    Btrfs: check roots last log commit when checking if an inode has been logged
    Btrfs: actually log directory we are fsync()'ing
    Btrfs: actually limit the size of delalloc range
    Btrfs: allocate the free space by the existed max extent size when ENOSPC
    ...

    Linus Torvalds
     

21 Sep, 2013

1 commit

  • By the current code, if the requested size is very large, and all the extents
    in the free space cache are small, we will waste lots of the cpu time to cut
    the requested size in half and search the cache again and again until it gets
    down to the size the allocator can return. In fact, we can know the max extent
    size in the cache after the first search, so we needn't cut the size in half
    repeatedly, and just use the max extent size directly. This way can save
    lots of cpu time and make the performance grow up when there are only fragments
    in the free space cache.

    According to my test, if there are only 4KB free space extents in the fs,
    and the total size of those extents are 256MB, we can reduce the execute
    time of the following test from 5.4s to 1.4s.
    dd if=/dev/zero of= bs=1MB count=1 oflag=sync

    Changelog v2 -> v3:
    - fix the problem that we skip the block group with the space which is
    less than we need.

    Changelog v1 -> v2:
    - address the problem that we return a wrong start position when searching
    the free space in a bitmap.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Miao Xie
     

13 Sep, 2013

2 commits

  • Merge more patches from Andrew Morton:
    "The rest of MM. Plus one misc cleanup"

    * emailed patches from Andrew Morton : (35 commits)
    mm/Kconfig: add MMU dependency for MIGRATION.
    kernel: replace strict_strto*() with kstrto*()
    mm, thp: count thp_fault_fallback anytime thp fault fails
    thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
    thp: do_huge_pmd_anonymous_page() cleanup
    thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
    mm: cleanup add_to_page_cache_locked()
    thp: account anon transparent huge pages into NR_ANON_PAGES
    truncate: drop 'oldsize' truncate_pagecache() parameter
    mm: make lru_add_drain_all() selective
    memcg: document cgroup dirty/writeback memory statistics
    memcg: add per cgroup writeback pages accounting
    memcg: check for proper lock held in mem_cgroup_update_page_stat
    memcg: remove MEMCG_NR_FILE_MAPPED
    memcg: reduce function dereference
    memcg: avoid overflow caused by PAGE_ALIGN
    memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
    memcg: correct RESOURCE_MAX to ULLONG_MAX
    mm: memcg: do not trap chargers with full callstack on OOM
    mm: memcg: rework and document OOM waiting and wakeup
    ...

    Linus Torvalds
     
  • truncate_pagecache() doesn't care about old size since commit
    cedabed49b39 ("vfs: Fix vmtruncate() regression"). Let's drop it.

    Signed-off-by: Kirill A. Shutemov
    Cc: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

01 Sep, 2013

4 commits

  • All of these are logic checks to make sure we're not breaking anything, so
    convert them over to ASSERT(). Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • u64 is "unsigned long long" on all architectures now, so there's no need to
    cast it when formatting it using the "ll" length modifier.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Geert Uytterhoeven
     
  • The plan is to have a bunch of unit tests that run when btrfs is loaded when you
    build with the appropriate config option. My ultimate goal is to have a test
    for every non-static function we have, but at first I'm going to focus on the
    things that cause us the most problems. To start out with this just adds a
    tests/ directory and moves the existing free space cache tests into that
    directory and sets up all of the infrastructure. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • I noticed while looking at a deadlock that we are always starting a transaction
    in cow_file_range(). This isn't really needed since we only need a transaction
    if we are doing an inline extent, or if the allocator needs to allocate a chunk.
    So push down all the transaction start stuff to be closer to where we actually
    need a transaction in all of these cases. This will hopefully reduce our write
    latency when we are committing often. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

10 Jul, 2013

1 commit

  • Pull btrfs update from Chris Mason:
    "These are the usual mixture of bugs, cleanups and performance fixes.
    Miao has some really nice tuning of our crc code as well as our
    transaction commits.

    Josef is peeling off more and more problems related to early enospc,
    and has a number of important bug fixes in here too"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits)
    Btrfs: wait ordered range before doing direct io
    Btrfs: only do the tree_mod_log_free_eb if this is our last ref
    Btrfs: hold the tree mod lock in __tree_mod_log_rewind
    Btrfs: make backref walking code handle skinny metadata
    Btrfs: fix crash regarding to ulist_add_merge
    Btrfs: fix several potential problems in copy_nocow_pages_for_inode
    Btrfs: cleanup the code of copy_nocow_pages_for_inode()
    Btrfs: fix oops when recovering the file data by scrub function
    Btrfs: make the chunk allocator completely tree lockless
    Btrfs: cleanup orphaned root orphan item
    Btrfs: fix wrong mirror number tuning
    Btrfs: cleanup redundant code in btrfs_submit_direct()
    Btrfs: remove btrfs_sector_sum structure
    Btrfs: check if we can nocow if we don't have data space
    Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc
    Btrfs: use a percpu to keep track of possibly pinned bytes
    Btrfs: check for actual acls rather than just xattrs when caching no acl
    Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate
    Btrfs: optimize reada_for_balance
    Btrfs: optimize read_block_for_search
    ...

    Linus Torvalds
     

14 Jun, 2013

3 commits


28 May, 2013

1 commit


18 May, 2013

2 commits


07 May, 2013

5 commits

  • Big patch, but all it does is add statics to functions which
    are in fact static, then remove the associated dead-code fallout.

    removed functions:

    btrfs_iref_to_path()
    __btrfs_lookup_delayed_deletion_item()
    __btrfs_search_delayed_insertion_item()
    __btrfs_search_delayed_deletion_item()
    find_eb_for_page()
    btrfs_find_block_group()
    range_straddles_pages()
    extent_range_uptodate()
    btrfs_file_extent_length()
    btrfs_scrub_cancel_devid()
    btrfs_start_transaction_lflush()

    btrfs_print_tree() is left because it is used for debugging.
    btrfs_start_transaction_lflush() and btrfs_reada_detach() are
    left for symmetry.

    ulist.c functions are left, another patch will take care of those.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Josef Bacik

    Eric Sandeen
     
  • So everybody who got hit by my fsync bug will still continue to hit this
    BUG_ON() in the free space cache, which is pretty heavy handed. So I took a
    file system that had this bug and fixed up all the BUG_ON()'s and leaks that
    popped up when I tried to mount a broken file system like this. With this patch
    we just fail to mount instead of panicing. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • With more than one btrfs volume mounted, it can be very difficult to find
    out which volume is hitting an error. btrfs_error() will print this, but
    it is currently rigged as more of a fatal error handler, while many of
    the printk()s are currently for debugging and yet-unhandled cases.

    This patch just changes the functions where the device information is
    already available. Some cases remain where the root or fs_info is not
    passed to the function emitting the error.

    This may introduce some confusion with volumes backed by multiple devices
    emitting errors referring to the primary device in the set instead of the
    one on which the error occurred.

    Use btrfs_printk(fs_info, format, ...) rather than writing the device
    string every time, and introduce macro wrappers ala XFS for brevity.
    Since the function already cannot be used for continuations, print a
    newline as part of the btrfs_printk() message rather than at each caller.

    Signed-off-by: Simon Kirby
    Reviewed-by: David Sterba
    Signed-off-by: Josef Bacik

    Simon Kirby
     
  • Argument 'root' is no more used in btrfs_csum_data().

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik

    Liu Bo
     
  • We keep hitting bugs in the tree log replay because btrfs_remove_free_space
    doesn't account for some corner case. So add a bunch of tests to try and fully
    test btrfs_remove_free_space since the only time it is called is during tree log
    replay. These tests all finish successfully, so as we find more of these bugs
    we need to add to these tests to make sure we don't regress in fixing things.
    I've hidden the tests behind a Kconfig option, but they take no time to run so
    all btrfs developers should have this turned on all the time. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

21 Feb, 2013

2 commits

  • Signed-off-by: Chris Mason

    Conflicts:
    fs/btrfs/ctree.h
    fs/btrfs/extent-tree.c
    fs/btrfs/inode.c
    fs/btrfs/volumes.c

    Chris Mason
     
  • Dave pointed out that xfstests 273 will tell you that it failed to load the
    space cache for a block group when it remounts. This is because we run out
    of space writing out the block group cache. This is ok and is working as it
    should, but let's try to be a bit nicer. This happens because the block
    group was 100mb, but bitmap entries cover 128mb, so we were only getting
    extent entries for this block group, which ended up being too many to fit in
    the free space cache. So relax the bitmap size requirements to block groups
    that are at least half the size a bitmap will cover or larger, that way we
    can still keep the amount of space used in the free space cache low enough
    to be able to write it out. With this patch I no longer fail to write out
    the free space cache. Thanks,

    Reported-by: David Sterba
    Signed-off-by: Josef Bacik

    Josef Bacik
     

05 Feb, 2013

1 commit


02 Feb, 2013

1 commit

  • This builds on David Woodhouse's original Btrfs raid5/6 implementation.
    The code has changed quite a bit, blame Chris Mason for any bugs.

    Read/modify/write is done after the higher levels of the filesystem have
    prepared a given bio. This means the higher layers are not responsible
    for building full stripes, and they don't need to query for the topology
    of the extents that may get allocated during delayed allocation runs.
    It also means different files can easily share the same stripe.

    But, it does expose us to incorrect parity if we crash or lose power
    while doing a read/modify/write cycle. This will be addressed in a
    later commit.

    Scrub is unable to repair crc errors on raid5/6 chunks.

    Discard does not work on raid5/6 (yet)

    The stripe size is fixed at 64KiB per disk. This will be tunable
    in a later commit.

    Signed-off-by: Chris Mason

    David Woodhouse
     

25 Jan, 2013

1 commit

  • A user reported a BUG_ON(ret) that occured during tree log replay. Ret was
    -EAGAIN, so what I think happened is that we removed an extent that covered
    a bitmap entry and an extent entry. We remove the part from the bitmap and
    return -EAGAIN and then search for the next piece we want to remove, which
    happens to be an entire extent entry, so we just free the sucker and return.
    The problem is ret is still set to -EAGAIN so we trip the BUG_ON(). The
    user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
    added a WARN_ON() to catch the other possibility. Thanks,

    Reported-by: Jan Steffens
    Signed-off-by: Josef Bacik

    Josef Bacik
     

17 Dec, 2012

2 commits


12 Dec, 2012

1 commit

  • When we find a bitmap free space entry, we may check the previous extent
    entry covers the offset or not. But if we find this entry is also a bitmap
    entry, we will continue to check the previous entry of the current one by
    a while loop. It is unnecessary because it is impossible that the extent
    entry which is in front of a bitmap entry can cover the offset of the entry
    after that bitmap entry.

    Signed-off-by: Miao Xie
    Reviewed-by: Liu Bo
    Signed-off-by: Chris Mason

    Miao Xie
     

09 Oct, 2012

1 commit

  • Everytime we write out dirty pages we search for an offset in the tree,
    convert the bits in the state, and then when we wait we search for the
    offset again and clear the bits. So for every dirty range in the io tree we
    are doing 4 rb searches, which is suboptimal. With this patch we are only
    doing 2 searches for every cycle (modulo weird things happening). Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

04 Oct, 2012

1 commit


24 Jul, 2012

1 commit


06 Jul, 2012

1 commit

  • Pull btrfs updates from Chris Mason:
    "I held off on my rc5 pull because I hit an oops during log recovery
    after a crash. I wanted to make sure it wasn't a regression because
    we have some logging fixes in here.

    It turns out that a commit during the merge window just made it much
    more likely to trigger directory logging instead of full commits,
    which exposed an old bug.

    The new backref walking code got some additional fixes. This should
    be the final set of them.

    Josef fixed up a corner where our O_DIRECT writes and buffered reads
    could expose old file contents (not stale, just not the most recent).
    He and Liu Bo fixed crashes during tree log recover as well.

    Ilya fixed errors while we resume disk balancing operations on
    readonly mounts."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: run delayed directory updates during log replay
    Btrfs: hold a ref on the inode during writepages
    Btrfs: fix tree log remove space corner case
    Btrfs: fix wrong check during log recovery
    Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
    Btrfs: resume balance on rw (re)mounts properly
    Btrfs: restore restriper state on all mounts
    Btrfs: fix dio write vs buffered read race
    Btrfs: don't count I/O statistic read errors for missing devices
    Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
    Btrfs: fix tree mod log rewind of ADD operations
    Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
    Btrfs: always put insert_ptr modifications into the tree mod log
    Btrfs: fix tree mod log for root replacements at leaf level
    Btrfs: support root level changes in __resolve_indirect_ref
    Btrfs: avoid waiting for delayed refs when we must not

    Linus Torvalds
     

03 Jul, 2012

1 commit

  • The tree log stuff can have allocated space that we end up having split
    across a bitmap and a real extent. The free space code does not deal with
    this, it assumes that if it finds an extent or bitmap entry that the entire
    range must fall within the entry it finds. This isn't necessarily the case,
    so rework the remove function so it can handle this case properly. This
    fixed two panics the user hit, first in the case where the space was
    initially in a bitmap and then in an extent entry, and then the reverse
    case. Thanks,

    Reported-and-tested-by: Shaun Reich
    Signed-off-by: Josef Bacik

    Josef Bacik
     

02 Jun, 2012

1 commit

  • Pull vfs changes from Al Viro.
    "A lot of misc stuff. The obvious groups:
    * Miklos' atomic_open series; kills the damn abuse of
    ->d_revalidate() by NFS, which was the major stumbling block for
    all work in that area.
    * ripping security_file_mmap() and dealing with deadlocks in the
    area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
    general.
    * ->encode_fh() switched to saner API; insane fake dentry in
    mm/cleancache.c gone.
    * assorted annotations in fs (endianness, __user)
    * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
    * ->update_time() work from Josef.
    * other bits and pieces all over the place.

    Normally it would've been in two or three pull requests, but
    signal.git stuff had eaten a lot of time during this cycle ;-/"

    Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
    'truncate_range' inode method was removed by the VM changes, the VFS
    update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
    to sparse fix added twice, with other changes nearby).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
    nfs: don't open in ->d_revalidate
    vfs: retry last component if opening stale dentry
    vfs: nameidata_to_filp(): don't throw away file on error
    vfs: nameidata_to_filp(): inline __dentry_open()
    vfs: do_dentry_open(): don't put filp
    vfs: split __dentry_open()
    vfs: do_last() common post lookup
    vfs: do_last(): add audit_inode before open
    vfs: do_last(): only return EISDIR for O_CREAT
    vfs: do_last(): check LOOKUP_DIRECTORY
    vfs: do_last(): make ENOENT exit RCU safe
    vfs: make follow_link check RCU safe
    vfs: do_last(): use inode variable
    vfs: do_last(): inline walk_component()
    vfs: do_last(): make exit RCU safe
    vfs: split do_lookup()
    Btrfs: move over to use ->update_time
    fs: introduce inode operation ->update_time
    reiserfs: get rid of resierfs_sync_super
    reiserfs: mark the superblock as dirty a bit later
    ...

    Linus Torvalds
     

30 May, 2012

3 commits

  • When we write out the free space cache we will write out everything that is
    in our in memory tree, and then we will just walk the pinned extents tree
    and write anything we see there. The problem with this is that during
    normal operations the pinned extents will be merged back into the free space
    tree normally, and then we can allocate space from the merged areas and
    commit them to the tree log. If we crash and replay the tree log we will
    crash again because the tree log will try to free up space from what looks
    like 2 seperate but contiguous entries, since one entry is from the original
    free space cache and the other was a pinned extent that was merged back. To
    fix this we just need to walk the free space tree after we load it and merge
    contiguous entries back together. This will keep the tree log stuff from
    breaking and it will make the allocator behave more nicely. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We noticed that the ordered extent completion doesn't really rely on having
    a page and that it could be done independantly of ending the writeback on a
    page. This patch makes us not do the threaded endio stuff for normal
    buffered writes and direct writes so we can end page writeback as soon as
    possible (in irq context) and only start threads to do the ordered work when
    it is actually done. Compression needs to be reworked some to take
    advantage of this as well, but atm it has to do a find_get_page in its endio
    handler so it must be done in its own thread. This makes direct writes
    quite a bit faster. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Signed-off-by: Al Viro

    Al Viro
     

14 Apr, 2012

1 commit

  • Pull the minimal btrfs branch from Chris Mason:
    "We have a use-after-free in there, along with errors when mount -o
    discard is enabled, and a BUG_ON(we should compile with UP more
    often)."

    * 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: use commit root when loading free space cache
    Btrfs: fix use-after-free in __btrfs_end_transaction
    Btrfs: check return value of bio_alloc() properly
    Btrfs: remove lock assert from get_restripe_target()
    Btrfs: fix eof while discarding extents
    Btrfs: fix uninit variable in repair_eb_io_failure
    Revert "Btrfs: increase the global block reserve estimates"

    Linus Torvalds
     

13 Apr, 2012

1 commit

  • A user reported that booting his box up with btrfs root on 3.4 was way
    slower than on 3.3 because I removed the ideal caching code. It turns out
    that we don't load the free space cache if we're in a commit for deadlock
    reasons, but since we're reading the cache and it hasn't changed yet we are
    safe reading the inode and free space item from the commit root, so do that
    and remove all of the deadlock checks so we don't unnecessarily skip loading
    the free space cache. The user reported this fixed the slowness. Thanks,

    Tested-by: Calvin Walton
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

31 Mar, 2012

1 commit

  • Pull btrfs fixes and features from Chris Mason:
    "We've merged in the error handling patches from SuSE. These are
    already shipping in the sles kernel, and they give btrfs the ability
    to abort transactions and go readonly on errors. It involves a lot of
    churn as they clarify BUG_ONs, and remove the ones we now properly
    deal with.

    Josef reworked the way our metadata interacts with the page cache.
    page->private now points to the btrfs extent_buffer object, which
    makes everything faster. He changed it so we write an whole extent
    buffer at a time instead of allowing individual pages to go down,,
    which will be important for the raid5/6 code (for the 3.5 merge
    window ;)

    Josef also made us more aggressive about dropping pages for metadata
    blocks that were freed due to COW. Overall, our metadata caching is
    much faster now.

    We've integrated my patch for metadata bigger than the page size.
    This allows metadata blocks up to 64KB in size. In practice 16K and
    32K seem to work best. For workloads with lots of metadata, this cuts
    down the size of the extent allocation tree dramatically and fragments
    much less.

    Scrub was updated to support the larger block sizes, which ended up
    being a fairly large change (thanks Stefan Behrens).

    We also have an assortment of fixes and updates, especially to the
    balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
    the defragging code (Liu Bo)."

    Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
    removal of the second argument to k[un]map_atomic() in commit
    7ac687d9e047.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
    Btrfs: update the checks for mixed block groups with big metadata blocks
    Btrfs: update to the right index of defragment
    Btrfs: do not bother to defrag an extent if it is a big real extent
    Btrfs: add a check to decide if we should defrag the range
    Btrfs: fix recursive defragment with autodefrag option
    Btrfs: fix the mismatch of page->mapping
    Btrfs: fix race between direct io and autodefrag
    Btrfs: fix deadlock during allocating chunks
    Btrfs: show useful info in space reservation tracepoint
    Btrfs: don't use crc items bigger than 4KB
    Btrfs: flush out and clean up any block device pages during mount
    btrfs: disallow unequal data/metadata blocksize for mixed block groups
    Btrfs: enhance superblock sanity checks
    Btrfs: change scrub to support big blocks
    Btrfs: minor cleanup in scrub
    Btrfs: introduce common define for max number of mirrors
    Btrfs: fix infinite loop in btrfs_shrink_device()
    Btrfs: fix memory leak in resolver code
    Btrfs: allow dup for data chunks in mixed mode
    Btrfs: validate target profiles only if we are going to use them
    ...

    Linus Torvalds