25 Feb, 2012

1 commit

  • Quoth Chris:
    "This is later than I wanted because I got backed up running through
    btrfs bugs from the Oracle QA teams. But they are all bug fixes that
    we've queued and tested since rc1.

    Nothing in particular stands out, this just reflects bug fixing and QA
    done in parallel by all the btrfs developers. The most user visible
    of these is:

    Btrfs: clear the extent uptodate bits during parent transid failures

    Because that helps deal with out of date drives (say an iscsi disk
    that has gone away and come back). The old code wasn't always
    properly retrying the other mirror for this type of failure."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
    Btrfs: fix compiler warnings on 32 bit systems
    Btrfs: increase the global block reserve estimates
    Btrfs: clear the extent uptodate bits during parent transid failures
    Btrfs: add extra sanity checks on the path names in btrfs_mksubvol
    Btrfs: make sure we update latest_bdev
    Btrfs: improve error handling for btrfs_insert_dir_item callers
    Btrfs: be less strict on finding next node in clear_extent_bit
    Btrfs: fix a bug on overcommit stuff
    Btrfs: kick out redundant stuff in convert_extent_bit
    Btrfs: skip states when they does not contain bits to clear
    Btrfs: check return value of lookup_extent_mapping() correctly
    Btrfs: fix deadlock on page lock when doing auto-defragment
    Btrfs: fix return value check of extent_io_ops
    btrfs: honor umask when creating subvol root
    btrfs: silence warning in raid array setup
    btrfs: fix structs where bitfields and spinlock/atomic share 8B word
    btrfs: delalloc for page dirtied out-of-band in fixup worker
    Btrfs: fix memory leak in load_free_space_cache()
    btrfs: don't check DUP chunks twice
    Btrfs: fix trim 0 bytes after a device delete
    ...

    Linus Torvalds
     

15 Feb, 2012

1 commit


01 Feb, 2012

1 commit


18 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
    Btrfs: use larger system chunks
    Btrfs: add a delalloc mutex to inodes for delalloc reservations
    Btrfs: space leak tracepoints
    Btrfs: protect orphan block rsv with spin_lock
    Btrfs: add allocator tracepoints
    Btrfs: don't call btrfs_throttle in file write
    Btrfs: release space on error in page_mkwrite
    Btrfs: fix btrfsck error 400 when truncating a compressed
    Btrfs: do not use btrfs_end_transaction_throttle everywhere
    Btrfs: add balance progress reporting
    Btrfs: allow for resuming restriper after it was paused
    Btrfs: allow for canceling restriper
    Btrfs: allow for pausing restriper
    Btrfs: add skip_balance mount option
    Btrfs: recover balance on mount
    Btrfs: save balance parameters to disk
    Btrfs: soft profile changing mode (aka soft convert)
    Btrfs: implement online profile changing
    Btrfs: do not reduce profile in do_chunk_alloc()
    Btrfs: virtual address space subset filter
    ...

    Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
    mnt_drop_write_file() helper.

    Linus Torvalds
     

17 Jan, 2012

2 commits


11 Jan, 2012

2 commits

  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
    writeback: balanced_rate cannot exceed write bandwidth
    writeback: do strict bdi dirty_exceeded
    writeback: avoid tiny dirty poll intervals
    writeback: max, min and target dirty pause time
    writeback: dirty ratelimit - think time compensation
    btrfs: fix dirtied pages accounting on sub-page writes
    writeback: fix dirtied pages accounting on redirty
    writeback: fix dirtied pages accounting on sub-page writes
    writeback: charge leaked page dirties to active tasks
    writeback: Include all dirty inodes in background writeback

    Linus Torvalds
     
  • Tell the page allocator that pages allocated for a buffered write are
    expected to become dirty soon.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Acked-by: Mel Gorman
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Christoph Hellwig
    Cc: Wu Fengguang
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Shaohua Li
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

22 Dec, 2011

1 commit

  • Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
    from every call site. The for_cow parameter will later on be used to
    determine if a ref will change anything with respect to qgroups.

    Delayed refs coming from relocation are always counted as for_cow, as they
    don't change subvol quota.

    Also pass in the fs_info for later use.

    btrfs_find_all_roots() will use this as an optimization, as changes that are
    for_cow will not change anything with respect to which root points to a
    certain leaf. Thus, we don't need to add the current sequence number to
    those delayed refs.

    Signed-off-by: Arne Jansen
    Signed-off-by: Jan Schmidt

    Arne Jansen
     

18 Dec, 2011

1 commit

  • When doing 1KB sequential writes to the same page,
    balance_dirty_pages_ratelimited_nr() should be called once instead of 4
    times, the latter makes the dirtier tasks be throttled much too heavy.

    Fix it with proper de-accounting on clear_page_dirty_for_io().

    CC: Chris Mason
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     

17 Dec, 2011

2 commits

  • …inux/kernel/git/mason/linux-btrfs

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: unplug every once and a while
    Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
    Btrfs: only set cache_generation if we setup the block group
    Btrfs: don't panic if orphan item already exists
    Btrfs: fix leaked space in truncate
    Btrfs: fix how we do delalloc reservations and how we free reservations on error
    Btrfs: deal with enospc from dirtying inodes properly
    Btrfs: fix num_workers_starting bug and other bugs in async thread
    BTRFS: Establish i_ops before calling d_instantiate
    Btrfs: add a cond_resched() into the worker loop
    Btrfs: fix ctime update of on-disk inode
    btrfs: keep orphans for subvolume deletion
    Btrfs: fix inaccurate available space on raid0 profile
    Btrfs: fix wrong disk space information of the files
    Btrfs: fix wrong i_size when truncating a file to a larger size
    Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

    * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: lower the dirty balance poll interval

    Linus Torvalds
     
  • Tests show that the original large intervals can easily make the dirty
    limit exceeded on 100 concurrent dd's. So adapt to as large as the
    next check point selected by the dirty throttling algorithm.

    Signed-off-by: Wu Fengguang
    Signed-off-by: Chris Mason

    Wu Fengguang
     

16 Dec, 2011

1 commit

  • Now that we're properly keeping track of delayed inode space we've been getting
    a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
    because a bunch of people call mark_inode_dirty, which is void so we can't
    return ENOSPC. This needs to be fixed in a few areas

    1) file_update_time - this updates the mtime and such when writing to a file,
    which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
    call btrfs_dirty_inode directly and return an error if we get one appropriately.

    2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
    setting ->setattr for symlinks, even though we should have been. This catches
    one of the cases where we were getting errors in mark_inode_dirty.

    3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
    instead of mark_inode_dirty. This lets us return errors properly for truncate
    and chown/anything related to setattr.

    4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
    print an error if we have one. The only remaining user we can't control for
    this is touch_atime(), but we don't really want to keep people from walking
    down the tree if we don't have space to save the atime update, so just complain
    but don't worry about it.

    With this patch xfstests 83 complains a handful of times instead of hundreds of
    times. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

07 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
    Btrfs: check for a null fs root when writing to the backup root log
    Btrfs: fix race during transaction joins
    Btrfs: fix a potential btrfs_bio leak on scrub fixups
    Btrfs: rename btrfs_bio multi -> bbio for consistency
    Btrfs: stop leaking btrfs_bios on readahead
    Btrfs: stop the readahead threads on failed mount
    Btrfs: fix extent_buffer leak in the metadata IO error handling
    Btrfs: fix the new inspection ioctls for 32 bit compat
    Btrfs: fix delayed insertion reservation
    Btrfs: ClearPageError during writepage and clean_tree_block
    Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
    Btrfs: make a delayed_block_rsv for the delayed item insertion
    Btrfs: add a log of past tree roots
    btrfs: separate superblock items out of fs_info
    Btrfs: use the global reserve when truncating the free space cache inode
    Btrfs: release metadata from global reserve if we have to fallback for unlink
    Btrfs: make sure to flush queued bios if write_cache_pages waits
    Btrfs: fix extent pinning bugs in the tree log
    Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
    Btrfs: don't wait as long for more batches during SSD log commit
    ...

    Linus Torvalds
     

28 Oct, 2011

1 commit

  • The i_mutex lock use of generic _file_llseek hurts. Independent processes
    accessing the same file synchronize over a single lock, even though
    they have no need for synchronization at all.

    Under high utilization this can cause llseek to scale very poorly on larger
    systems.

    This patch does some rethinking of the llseek locking model:

    First the 64bit f_pos is not necessarily atomic without locks
    on 32bit systems. This can already cause races with read() today.
    This was discussed on linux-kernel in the past and deemed acceptable.
    The patch does not change that.

    Let's look at the different seek variants:

    SEEK_SET: Doesn't really need any locking.
    If there's a race one writer wins, the other loses.

    For 32bit the non atomic update races against read()
    stay the same. Without a lock they can also happen
    against write() now. The read() race was deemed
    acceptable in past discussions, and I think if it's
    ok for read it's ok for write too.

    => Don't need a lock.

    SEEK_END: This behaves like SEEK_SET plus it reads
    the maximum size too. Reading the maximum size would have the
    32bit atomic problem. But luckily we already have a way to read
    the maximum size without locking (i_size_read), so we
    can just use that instead.

    Without i_mutex there is no synchronization with write() anymore,
    however since the write() update is atomic on 64bit it just behaves
    like another racy SEEK_SET. On non atomic 32bit it's the same
    as SEEK_SET.

    => Don't need a lock, but need to use i_size_read()

    SEEK_CUR: This has a read-modify-write race window
    on the same file. One could argue that any application
    doing unsynchronized seeks on the same file is already broken.
    But for the sake of not adding a regression here I'm
    using the file->f_lock to synchronize this. Using this
    lock is much better than the inode mutex because it doesn't
    synchronize between processes.

    => So still need a lock, but can use a f_lock.

    This patch implements this new scheme in generic_file_llseek.
    I dropped generic_file_llseek_unlocked and changed all callers.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     

20 Oct, 2011

2 commits

  • Johannes pointed out we were allocating only kernel pages for doing writes,
    which is kind of a big deal if you are on 32bit and have more than a gig of ram.
    So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we
    don't re-enter. Thanks,

    Reported-by: Johannes Weiner
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Lukas found a problem where if he tries to fallocate over the same region twice
    and the first fallocate took up all the space we would fail with ENOSPC. This
    is because we reserve the total space we want to use for fallocate, regardless
    of wether or not we will have to actually preallocate. So instead move the
    check into the loop where we actually have to do the preallocate. Thanks,

    Tested-by: Lukas Czerner
    Signed-off-by: Josef Bacik

    Josef Bacik
     

04 Oct, 2011

1 commit


01 Oct, 2011

1 commit

  • A user reported a problem where ceph was getting into 100% cpu usage while doing
    some writing. It turns out it's because we were doing a short write on a not
    uptodate page, which means we'd fall back at one page at a time and fault the
    page in. The problem is our position is on the page boundary, so our fault in
    logic wasn't actually reading the page, so we'd just spin forever or until the
    page got read in by somebody else. This will force a readpage if we end up
    doing a short copy. Alexandre could reproduce this easily with ceph and reports
    it fixes his problem. I also wrote a reproducer that no longer hangs my box
    with this patch. Thanks,

    Reported-and-tested-by: Alexandre Oliva
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

18 Sep, 2011

1 commit

  • The recent reworking of btrfs' lseek lead to incorrect
    values being returned. This adds checks for seeking
    beyond EOF in SEEK_HOLE and makes sure the error
    values come back correct.

    Andi Kleen also sent in similar patches.

    Signed-off-by: Jie Liu
    Reported-by: Andi Kleen
    Signed-off-by: Chris Mason

    Jeff Liu
     

13 Sep, 2011

1 commit

  • * 'for-linus' of git://github.com/chrismason/linux:
    Btrfs: add dummy extent if dst offset excceeds file end in
    Btrfs: calc file extent num_bytes correctly in file clone
    btrfs: xattr: fix attribute removal
    Btrfs: fix wrong nbytes information of the inode
    Btrfs: fix the file extent gap when doing direct IO
    Btrfs: fix unclosed transaction handle in btrfs_cont_expand
    Btrfs: fix misuse of trans block rsv
    Btrfs: reset to appropriate block rsv after orphan operations
    Btrfs: skip locking if searching the commit root in csum lookup
    btrfs: fix warning in iput for bad-inode
    Btrfs: fix an oops when deleting snapshots

    Linus Torvalds
     

11 Sep, 2011

1 commit

  • When we write some data to the place that is beyond the end of the file
    in direct I/O mode, a data hole will be created. And Btrfs should insert
    a file extent item that point to this hole into the fs tree. But unfortunately
    Btrfs forgets doing it.

    The following is a simple way to reproduce it:
    # mkfs.btrfs /dev/sdc2
    # mount /dev/sdc2 /test4
    # touch /test4/a
    # dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
    # umount /test4
    # btrfsck /dev/sdc2
    root 5 inode 257 errors 100

    Reported-by: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Tested-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Miao Xie
     

18 Aug, 2011

3 commits


17 Aug, 2011

1 commit


03 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits)
    Btrfs: don't call writepages from within write_full_page
    Btrfs: Remove unused variable 'last_index' in file.c
    Btrfs: clean up for find_first_extent_bit()
    Btrfs: clean up for wait_extent_bit()
    Btrfs: clean up for insert_state()
    Btrfs: remove unused members from struct extent_state
    Btrfs: clean up code for merging extent maps
    Btrfs: clean up code for extent_map lookup
    Btrfs: clean up search_extent_mapping()
    Btrfs: remove redundant code for dir item lookup
    Btrfs: make acl functions really no-op if acl is not enabled
    Btrfs: remove remaining ref-cache code
    Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
    Btrfs: use wait_event()
    Btrfs: check the nodatasum flag when writing compressed files
    Btrfs: copy string correctly in INO_LOOKUP ioctl
    Btrfs: don't print the leaf if we had an error
    btrfs: make btrfs_set_root_node void
    Btrfs: fix oops while writing data to SSD partitions
    Btrfs: Protect the readonly flag of block group
    ...

    Fix up trivial conflicts (due to acl and writeback cleanups) in
    - fs/btrfs/acl.c
    - fs/btrfs/ctree.h
    - fs/btrfs/extent_io.c

    Linus Torvalds
     

02 Aug, 2011

3 commits


28 Jul, 2011

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
    Btrfs: use the commit_root for reading free_space_inode crcs
    Btrfs: reduce extent_state lock contention for metadata
    Btrfs: remove lockdep magic from btrfs_next_leaf
    Btrfs: make a lockdep class for each root
    Btrfs: switch the btrfs tree locks to reader/writer
    Btrfs: fix deadlock when throttling transactions
    Btrfs: stop using highmem for extent_buffers
    Btrfs: fix BUG_ON() caused by ENOSPC when relocating space
    Btrfs: tag pages for writeback in sync
    Btrfs: fix enospc problems with delalloc
    Btrfs: don't flush delalloc arbitrarily
    Btrfs: use find_or_create_page instead of grab_cache_page
    Btrfs: use a worker thread to do caching
    Btrfs: fix how we merge extent states and deal with cached states
    Btrfs: use the normal checksumming infrastructure for free space cache
    Btrfs: serialize flushers in reserve_metadata_bytes
    Btrfs: do transaction space reservation before joining the transaction
    Btrfs: try to only do one btrfs_search_slot in do_setxattr

    Linus Torvalds
     
  • So I had this brilliant idea to use atomic counters for outstanding and reserved
    extents, but this turned out to be a bad idea. Consider this where we have 1
    outstanding extent and 1 reserved extent

    Reserver Releaser
    atomic_dec(outstanding) now 0
    atomic_read(outstanding)+1 get 1
    atomic_read(reserved) get 1
    don't actually reserve anything because
    they are the same
    atomic_cmpxchg(reserved, 1, 0)
    atomic_inc(outstanding)
    atomic_add(0, reserved)
    free reserved space for 1 extent

    Then the reserver now has no actual space reserved for it, and when it goes to
    finish the ordered IO it won't have enough space to do it's allocation and you
    get those lovely warnings.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
    GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we
    need GFP_NOFS so we don't deadlock. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

21 Jul, 2011

2 commits

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     
  • In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek.
    Basically for the normal SEEK_*'s we will just defer to the generic helper, and
    for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest
    hole or data. Currently this helper doesn't check for delalloc bytes for
    prealloc space, so for now treat prealloc as data until that is fixed. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

15 Jul, 2011

1 commit

  • This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation
    failure. All the sites that are fixed in this patch were checked by me to
    be fairly trivial to fix because of at least one of two criteria:

    - Callers of the function catch errors from it already so bubbling the
    error up will be handled.
    - Callers of the function might BUG_ON any nonzero return code in which
    case there is no behavior changed (but we still got to remove a BUG_ON)

    The following functions were updated:

    btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group,
    btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written,
    btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink,
    insert_reserved_file_extent, and run_delalloc_nocow

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

04 Jun, 2011

2 commits


28 May, 2011

1 commit

  • git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus

    Conflicts:
    fs/btrfs/disk-io.c
    fs/btrfs/extent-tree.c
    fs/btrfs/free-space-cache.c
    fs/btrfs/inode.c
    fs/btrfs/transaction.c

    Signed-off-by: Chris Mason

    Chris Mason
     

27 May, 2011

1 commit

  • This will detect small random writes into files and
    queue the up for an auto defrag process. It isn't well suited to
    database workloads yet, but works for smaller files such as rpm, sqlite
    or bdb databases.

    Signed-off-by: Chris Mason

    Chris Mason