12 Nov, 2013

1 commit

  • fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink()
    and inc_nlink(). This doesn't belong in mainline.

    Signed-off-by: Zach Brown
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Zach Brown
     

01 Sep, 2013

2 commits


18 May, 2013

1 commit

  • Btrfs has been pointer tagging bi_private and using bi_bdev
    to store the stripe index and mirror number of failed IOs.

    As bios bubble back up through the call chain, we use these
    to decide if and how to retry our IOs. They are also used
    to count IO failures on a per device basis.

    Recently a bio tracepoint was added lead to crashes because
    we were abusing bi_bdev.

    This commit adds a btrfs bioset, and creates explicit fields
    for the mirror number and stripe index. The plan is to
    extend this structure for all of the fields currently in
    struct btrfs_bio, which will mean one less kmalloc in
    our IO path.

    Signed-off-by: Chris Mason
    Reported-by: Tejun Heo

    Chris Mason
     

07 May, 2013

1 commit

  • Big patch, but all it does is add statics to functions which
    are in fact static, then remove the associated dead-code fallout.

    removed functions:

    btrfs_iref_to_path()
    __btrfs_lookup_delayed_deletion_item()
    __btrfs_search_delayed_insertion_item()
    __btrfs_search_delayed_deletion_item()
    find_eb_for_page()
    btrfs_find_block_group()
    range_straddles_pages()
    extent_range_uptodate()
    btrfs_file_extent_length()
    btrfs_scrub_cancel_devid()
    btrfs_start_transaction_lflush()

    btrfs_print_tree() is left because it is used for debugging.
    btrfs_start_transaction_lflush() and btrfs_reada_detach() are
    left for symmetry.

    ulist.c functions are left, another patch will take care of those.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Josef Bacik

    Eric Sandeen
     

03 Mar, 2013

1 commit

  • tilegx_defconfig:

    fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table':
    fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
    fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default]
    fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Chris Mason

    Geert Uytterhoeven
     

01 Mar, 2013

1 commit

  • The stripe hash table is large, starting with allocation order 4 and can go as
    high as order 7 in case lock debugging is turned on and structure padding
    happens.

    Observed mount failure:

    mount: page allocation failure: order:7, mode:0x200050
    Pid: 8234, comm: mount Tainted: G W 3.8.0-default+ #267
    Call Trace:
    [] warn_alloc_failed+0xf3/0x140
    [] ? __alloc_pages_direct_compact+0x92/0x250
    [] __alloc_pages_nodemask+0x733/0x9d0
    [] ? cache_alloc_refill+0x3f8/0x840
    [] cache_alloc_refill+0x43c/0x840
    [] ? is_kernel_percpu_address+0x4b/0x90
    [] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
    [] kmem_cache_alloc_trace+0x247/0x270
    [] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
    [] open_ctree+0xb2f/0x1f90 [btrfs]
    [] ? string+0x49/0xe0
    [] ? vsnprintf+0x443/0x5d0
    [] btrfs_mount+0x526/0x600 [btrfs]
    [] ? cache_alloc_debugcheck_after+0x4c/0x200
    [] mount_fs+0x20/0xe0
    [] vfs_kern_mount+0x76/0x120
    [] do_mount+0x386/0x980
    [] ? strndup_user+0x5b/0x80
    [] sys_mount+0x90/0xe0
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik

    David Sterba
     

02 Feb, 2013

3 commits

  • Buffered writes and DIRECT_IO writes will often break up
    big contiguous changes to the file into sub-stripe writes.

    This adds a plugging callback to gather those smaller writes full stripe
    writes.

    Example on flash:

    fio job to do 64K writes in batches of 3 (which makes a full stripe):

    With plugging: 450MB/s
    Without plugging: 220MB/s

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The stripe cache allows us to avoid extra read/modify/write cycles
    by caching the pages we read off the disk. Pages are cached when:

    * They are read in during a read/modify/write cycle

    * They are written during a read/modify/write cycle

    * They are involved in a parity rebuild

    Pages are not cached if we're doing a full stripe write. We're
    assuming that a full stripe write won't be followed by another
    partial stripe write any time soon.

    This provides a substantial boost in performance for workloads that
    synchronously modify adjacent offsets in the file, and for the parity
    rebuild use case in general.

    The size of the stripe cache isn't tunable (yet) and is set at 1024
    entries.

    Example on flash: dd if=/dev/zero of=/mnt/xxx bs=4K oflag=direct

    Without the stripe cache -- 2.1MB/s
    With the stripe cache 21MB/s

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This builds on David Woodhouse's original Btrfs raid5/6 implementation.
    The code has changed quite a bit, blame Chris Mason for any bugs.

    Read/modify/write is done after the higher levels of the filesystem have
    prepared a given bio. This means the higher layers are not responsible
    for building full stripes, and they don't need to query for the topology
    of the extents that may get allocated during delayed allocation runs.
    It also means different files can easily share the same stripe.

    But, it does expose us to incorrect parity if we crash or lose power
    while doing a read/modify/write cycle. This will be addressed in a
    later commit.

    Scrub is unable to repair crc errors on raid5/6 chunks.

    Discard does not work on raid5/6 (yet)

    The stripe size is fixed at 64KiB per disk. This will be tunable
    in a later commit.

    Signed-off-by: Chris Mason

    David Woodhouse