03 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits)
    Btrfs: don't call writepages from within write_full_page
    Btrfs: Remove unused variable 'last_index' in file.c
    Btrfs: clean up for find_first_extent_bit()
    Btrfs: clean up for wait_extent_bit()
    Btrfs: clean up for insert_state()
    Btrfs: remove unused members from struct extent_state
    Btrfs: clean up code for merging extent maps
    Btrfs: clean up code for extent_map lookup
    Btrfs: clean up search_extent_mapping()
    Btrfs: remove redundant code for dir item lookup
    Btrfs: make acl functions really no-op if acl is not enabled
    Btrfs: remove remaining ref-cache code
    Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
    Btrfs: use wait_event()
    Btrfs: check the nodatasum flag when writing compressed files
    Btrfs: copy string correctly in INO_LOOKUP ioctl
    Btrfs: don't print the leaf if we had an error
    btrfs: make btrfs_set_root_node void
    Btrfs: fix oops while writing data to SSD partitions
    Btrfs: Protect the readonly flag of block group
    ...

    Fix up trivial conflicts (due to acl and writeback cleanups) in
    - fs/btrfs/acl.c
    - fs/btrfs/ctree.h
    - fs/btrfs/extent_io.c

    Linus Torvalds
     

02 Aug, 2011

5 commits


28 Jul, 2011

6 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
    Btrfs: use the commit_root for reading free_space_inode crcs
    Btrfs: reduce extent_state lock contention for metadata
    Btrfs: remove lockdep magic from btrfs_next_leaf
    Btrfs: make a lockdep class for each root
    Btrfs: switch the btrfs tree locks to reader/writer
    Btrfs: fix deadlock when throttling transactions
    Btrfs: stop using highmem for extent_buffers
    Btrfs: fix BUG_ON() caused by ENOSPC when relocating space
    Btrfs: tag pages for writeback in sync
    Btrfs: fix enospc problems with delalloc
    Btrfs: don't flush delalloc arbitrarily
    Btrfs: use find_or_create_page instead of grab_cache_page
    Btrfs: use a worker thread to do caching
    Btrfs: fix how we merge extent states and deal with cached states
    Btrfs: use the normal checksumming infrastructure for free space cache
    Btrfs: serialize flushers in reserve_metadata_bytes
    Btrfs: do transaction space reservation before joining the transaction
    Btrfs: try to only do one btrfs_search_slot in do_setxattr

    Linus Torvalds
     
  • Chris Mason
     
  • For metadata buffers that don't straddle pages (all of them), btrfs
    can safely use the page uptodate bits and extent_buffer uptodate bit
    instead of needing to use the extent_state tree.

    This greatly reduces contention on the state tree lock.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The btrfs metadata btree is the source of significant
    lock contention, especially in the root node. This
    commit changes our locking to use a reader/writer
    lock.

    The lock is built on top of rw spinlocks, and it
    extends the lock tracking to remember if we have a
    read lock or a write lock when we go to blocking. Atomics
    count the number of blocking readers or writers at any
    given time.

    It removes all of the adaptive spinning from the old code
    and uses only the spinning/blocking hints inside of btrfs
    to decide when it should continue spinning.

    In read heavy workloads this is dramatically faster. In write
    heavy workloads we're still faster because of less contention
    on the root node lock.

    We suffer slightly in dbench because we schedule more often
    during write locks, but all other benchmarks so far are improved.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The extent_buffers have a very complex interface where
    we use HIGHMEM for metadata and try to cache a kmap mapping
    to access the memory.

    The next commit adds reader/writer locks, and concurrent use
    of this kmap cache would make it even more complex.

    This commit drops the ability to use HIGHMEM with extent buffers,
    and rips out all of the related code.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Everybody else does this, we need to do it too. If we're syncing, we need to
    tag the pages we're going to write for writeback so we don't end up writing the
    same stuff over and over again if somebody is constantly redirtying our file.
    This will keep us from having latencies with heavy sync workloads. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

11 Jul, 2011

1 commit

  • First, we can sometimes free the state we're merging, which means anybody who
    calls merge_state() may have the state it passed in free'ed. This is
    problematic because we could end up caching the state, which makes caching
    useless as the state will no longer be part of the tree. So instead of free'ing
    the state we passed into merge_state(), set it's end to the other->end and free
    the other state. This way we are sure to cache the correct state. Also because
    we can merge states together, instead of only using the cache'd state if it's
    start == the start we are looking for, go ahead and use it if the start we are
    looking for is within the range of the cached state. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

10 Jul, 2011

1 commit

  • Pass struct wb_writeback_work all the way down to writeback_sb_inodes(),
    and initialize the struct writeback_control there.

    struct writeback_control is basically designed to control writeback of a
    single file, but we keep abuse it for writing multiple files in
    writeback_sb_inodes() and its callers.

    It immediately clean things up, e.g. suddenly wbc.nr_to_write vs
    work->nr_pages starts to make sense, and instead of saving and restoring
    pages_skipped in writeback_sb_inodes it can always start with a clean
    zero value.

    It also makes a neat IO pattern change: large dirty files are now
    written in the full 4MB writeback chunk size, rather than whatever
    remained quota in wbc->nr_to_write.

    Acked-by: Jan Kara
    Proposed-by: Christoph Hellwig
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     

05 Jun, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
    btrfs: fix uninitialized variable warning
    btrfs: add helper for fs_info->closing
    Btrfs: add mount -o inode_cache
    btrfs: scrub: add explicit plugging
    btrfs: use btrfs_ino to access inode number
    Btrfs: don't save the inode cache if we are deleting this root
    btrfs: false BUG_ON when degraded
    Btrfs: don't save the inode cache in non-FS roots
    Btrfs: make sure we don't overflow the free space cache crc page
    Btrfs: fix uninit variable in the delayed inode code
    btrfs: scrub: don't reuse bios and pages
    Btrfs: leave spinning on lookup and map the leaf
    Btrfs: check for duplicate entries in the free space cache
    Btrfs: don't try to allocate from a block group that doesn't have enough space
    Btrfs: don't always do readahead
    Btrfs: try not to sleep as much when doing slow caching
    Btrfs: kill BTRFS_I(inode)->block_group
    Btrfs: don't look at the extent buffer level 3 times in a row
    Btrfs: map the node block when looking for readahead targets
    Btrfs: set range_start to the right start in count_range_bits
    ...

    Linus Torvalds
     

28 May, 2011

2 commits

  • git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus

    Conflicts:
    fs/btrfs/disk-io.c
    fs/btrfs/extent-tree.c
    fs/btrfs/free-space-cache.c
    fs/btrfs/inode.c
    fs/btrfs/transaction.c

    Signed-off-by: Chris Mason

    Chris Mason
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (58 commits)
    Btrfs: use the device_list_mutex during write_dev_supers
    Btrfs: setup free ino caching in a more asynchronous way
    btrfs scrub: don't coalesce pages that are logically discontiguous
    Btrfs: return -ENOMEM in clear_extent_bit
    Btrfs: add mount -o auto_defrag
    Btrfs: using rcu lock in the reader side of devices list
    Btrfs: drop unnecessary device lock
    Btrfs: fix the race between remove dev and alloc chunk
    Btrfs: fix the race between reading and updating devices
    Btrfs: fix bh leak on __btrfs_open_devices path
    Btrfs: fix unsafe usage of merge_state
    Btrfs: allocate extent state and check the result properly
    fs/btrfs: Add missing btrfs_free_path
    Btrfs: check return value of btrfs_inc_extent_ref()
    Btrfs: return error to caller if read_one_inode() fails
    Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item
    Btrfs: return error code to caller when btrfs_del_item fails
    Btrfs: return error code to caller when btrfs_previous_item fails
    btrfs: fix typo 'testeing' -> 'testing'
    btrfs: typo: 'btrfS' -> 'btrfs'
    ...

    Linus Torvalds
     

27 May, 2011

3 commits

  • The btrfs releasepage function depends on ENOMEM coming
    back when it is called atomic.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
    xen: cleancache shim to Xen Transcendent Memory
    ocfs2: add cleancache support
    ext4: add cleancache support
    btrfs: add cleancache support
    ext3: add cleancache support
    mm/fs: add hooks to support cleancache
    mm: cleancache core ops functions and config
    fs: add field to superblock to support cleancache
    mm/fs: cleancache documentation

    Fix up trivial conflict in fs/btrfs/extent_io.c due to includes

    Linus Torvalds
     
  • This sixth patch of eight in this cleancache series "opts-in"
    cleancache for btrfs. Filesystems must explicitly enable
    cleancache by calling cleancache_init_fs anytime an instance
    of the filesystem is mounted. Btrfs uses its own readpage
    which must be hooked, but all other cleancache hooks are in
    the VFS layer including the matching cleancache_flush_fs hook
    which must be called on unmount.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    [v6-v8: no changes]
    [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
    Signed-off-by: Dan Magenheimer
    Signed-off-by: Chris Mason
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Jan Beulich
    Cc: Andreas Dilger
    Cc: Ted Ts'o
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Nitin Gupta

    Dan Magenheimer
     

24 May, 2011

4 commits

  • Conflicts:
    fs/btrfs/tree-log.c
    fs/btrfs/volumes.c

    Signed-off-by: Chris Mason

    Chris Mason
     
  • merge_state can free the current state if it can be merged with the next node,
    but in set_extent_bit(), after merge_state, we still use the current extent to
    get the next node and cache it into cached_state

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Chris Mason

    Xiao Guangrong
     
  • It doesn't allocate extent_state and check the result properly:
    - in set_extent_bit, it doesn't allocate extent_state if the path is not
    allowed wait

    - in clear_extent_bit, it doesn't check the result after atomic-ly allocate,
    we trigger BUG_ON() if it's fail

    - if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since
    the return value of clear_extent_bit() is ignored by many callers

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Chris Mason

    Xiao Guangrong
     
  • In count_range_bits we are adjusting total_bytes based on the range we are
    searching for, but we don't adjust the range start according to the range we are
    searching for, which makes for weird results. For example, if the range

    [0-8192]

    is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
    of bytes found, but the range_start will be 0, which makes it look like the
    range is [0-4096]. So instead set range_start = max(cur_start, state->start).
    This makes everything come out right. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

23 May, 2011

1 commit


21 May, 2011

2 commits

  • Conflicts:
    fs/btrfs/free-space-cache.c

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Commit e66eed651fd1 ("list: remove prefetching from regular list
    iterators") removed the include of prefetch.h from list.h, which
    uncovered several cases that had apparently relied on that rather
    obscure header file dependency.

    So this fixes things up a bit, using

    grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
    grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

    to guide us in finding files that either need
    inclusion, or have it despite not needing it.

    There are more of them around (mostly network drivers), but this gets
    many core ones.

    Reported-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 May, 2011

1 commit

  • Remove static and global declarations and/or definitions. Reduces size
    of btrfs.ko by ~3.4kB.

    text data bss dec hex filename
    402081 7464 200 409745 64091 btrfs.ko.base
    398620 7144 200 405964 631cc btrfs.ko.remove-all

    Signed-off-by: David Sterba

    David Sterba
     

02 May, 2011

5 commits


26 Apr, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: cleanup error handling in inode.c
    Btrfs: put the right bio if we have an error
    Btrfs: free bitmaps properly when evicting the cache
    Btrfs: Free free_space item properly in btrfs_trim_block_group()
    btrfs: add missing spin_unlock to a rare exit path
    Btrfs: check return value of kmalloc()
    btrfs: fix wrong allocating flag when reading page
    Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()

    Linus Torvalds
     
  • the space cache use extent_readpages() to read free space information,
    so we can not use GFP_KERNEL flag to allocate memory, or it may lead
    to deadlock.

    Signed-off-by: Itaru Kitayama
    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Itaru Kitayama
     

25 Apr, 2011

1 commit

  • There's a potential problem in 32bit system when we exhaust 32bit inode
    numbers and start to allocate big inode numbers, because btrfs uses
    inode->i_ino in many places.

    So here we always use BTRFS_I(inode)->location.objectid, which is an
    u64 variable.

    There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
    inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
    and inode->i_ino will be used in those cases.

    Another reason to make this change is I'm going to use a special inode
    to save free ino cache, and the inode number must be > (u64)-256.

    Signed-off-by: Li Zefan

    Li Zefan
     

19 Apr, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
    Btrfs: fix free space cache leak
    Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
    Btrfs end_bio_extent_readpage should look for locked bits
    Btrfs: don't force chunk allocation in find_free_extent
    Btrfs: Check validity before setting an acl
    Btrfs: Fix incorrect inode nlink in btrfs_link()
    Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
    Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
    Btrfs: make uncache_state unconditional
    btrfs: using cached extent_state in set/unlock combinations
    Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
    Btrfs: fix subvolume mount by name problem when default mount subvolume is set
    fix user annotation in ioctl.c
    Btrfs: check for duplicate iov_base's when doing dio reads
    btrfs: properly handle overlapping areas in memmove_extent_buffer
    Btrfs: fix memory leaks in btrfs_new_inode()
    Btrfs: check for duplicate iov_base's when doing dio reads
    Btrfs: reuse the extent_map we found when calling btrfs_get_extent
    Btrfs: do not use async submit for small DIO io's
    Btrfs: don't split dio bios if we don't have to
    ...

    Linus Torvalds
     

16 Apr, 2011

1 commit


13 Apr, 2011

1 commit

  • The extent_io code can take cached pointers into the extent state trees,
    and these can make lookups much faster in common operations. The
    caching only happens when specific bits are set that prevent merging
    and splitting of the extent state.

    A help function was added to uncache the state, and it was testing
    the same set of conditionals. This can leak in very strange corner
    cases where the lock bit goes away unexpectedly.

    The uncaching should be unconditional. Once we have a ref on the
    extent we should always give it up.

    Signed-off-by: Chris Mason

    Chris Mason
     

12 Apr, 2011

1 commit

  • In several places the sequence (set_extent_uptodate, unlock_extent) is used.
    This leads to a duplicate lookup of the extent state. This patch lets
    set_extent_uptodate return a cached extent_state which can be passed to
    unlock_extent_cached.
    The occurences of the above sequences are updated to use the cache. Only
    end_bio_extent_readpage is updated that it first gets a cached state to
    pass it to the readpage_end_io_hook as the prototype requested and is later
    on being used for set/unlock.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen