16 Dec, 2011

1 commit


10 Dec, 2011

1 commit

  • btrfs_end_bio checks the number of errors on a bio against the max
    number of errors allowed before sending any EIOs up to the higher
    levels.

    If we got enough copies of the bio done for a given raid level, it is
    supposed to clear the bio error flag and return success.

    We have pointers to the original bio sent down by the higher layers and
    pointers to any cloned bios we made for raid purposes. If the original
    bio happens to be the one that got an io error, but not the last one to
    finish, it might not have the BIO_UPTODATE bit set.

    Then, when the last bio does finish, we'll call bio_end_io on the
    original bio. It won't have the uptodate bit set and we'll end up
    sending EIO to the higher layers.

    We already had a check for this, it just was conditional on getting the
    IO error on the very last bio. Make the check unconditional so we eat
    the EIOs properly.

    Signed-off-by: Chris Mason

    Chris Mason
     

08 Dec, 2011

1 commit

  • If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
    a readonly device to a btrfs filesystem, and btrfs will write to
    that device, emitting kernel errors:

    [ 3109.833692] lost page write due to I/O error on loop2
    [ 3109.833720] lost page write due to I/O error on loop2
    ...

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     

11 Nov, 2011

1 commit


06 Nov, 2011

3 commits


21 Oct, 2011

1 commit

  • Fix a bug introduced by 20b45077. We have to return EINVAL on mount
    failure, but doing that too early in the sequence leaves all of the
    devices opened exclusively. This also fixes an issue where under some
    scenarios only a second mount -o degraded command would
    succeed.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

20 Oct, 2011

1 commit

  • One of the things that kills us is the fact that our ENOSPC reservations are
    horribly over the top in most normal cases. There isn't too much that can be
    done about this because when we are completely full we really need them to work
    like this so we don't under reserve. However if there is plenty of unallocated
    chunks on the disk we can use that to gauge how much we can overcommit. So this
    patch adds chunk free space accounting so we always know how much unallocated
    space we have. Then if we fail to make a reservation within our allocated
    space, check to see if we can overcommit. In the normal flushing case (like
    with delalloc metadata reservations) we'll take the free space and divide it by
    2 if our metadata profile is setup for DUP or any of those, and then divide it
    by 8 to make sure we don't overcommit too much. Then if we're in a non-flushing
    case (we really need this reservation now!) we only limit ourselves to half of
    the free space. This makes this fio test

    [torrent]
    filename=torrent-test
    rw=randwrite
    size=4g
    ioengine=sync
    directory=/mnt/btrfs-test

    go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
    file system. This doesn't seem to break my other enospc tests, but could really
    use some more testing as this is a super scary change. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

02 Oct, 2011

1 commit


29 Sep, 2011

2 commits

  • The error correction code wants to make sure that only the bad mirror is
    rewritten. Thus, we need to know which mirror is the bad one. I did not
    find a more apropriate field than bi_bdev. But I think using this is fine,
    because it is modified by the block layer, anyway, and should not be read
    after the bio returned.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • btrfs_bio is a bio abstraction able to split and not complete after the last
    bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio
    tracks the mirror_num used to read data which can be used for error
    correction purposes.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     

17 Aug, 2011

3 commits

  • sync_pending is uninitialized before it be used, fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • When balancing, we'll first try to shrink devices for some space,
    but if it is working on a full multi-disk partition with raid protection,
    we may encounter a bug, that is, while shrinking, total_bytes may be less
    than bytes_used, and btrfs may allocate a dev extent that accesses out of
    device's bounds.

    Then we will not be able to write or read the data which stores at the end
    of the device, and get the followings:

    device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5
    Btrfs detected SSD devices, enabling SSD mode
    btrfs: relocating block group 476315648 flags 9
    btrfs: found 4 extents
    attempt to access beyond end of device
    sdb5: rw=145, want=546176, limit=546147
    attempt to access beyond end of device
    sdb5: rw=145, want=546304, limit=546147
    attempt to access beyond end of device
    sdb5: rw=145, want=546432, limit=546147
    attempt to access beyond end of device
    sdb5: rw=145, want=546560, limit=546147
    attempt to access beyond end of device

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    liubo
     
  • We have a problem where if a user specifies discard but doesn't actually support
    it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
    because this gets called (in a fashion) from the tree log recovery code, which
    has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
    replay. So instead detect wether our devices support discard when we're adding
    them and then don't issue discards if we know that the device doesn't support
    it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
    we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

06 Aug, 2011

1 commit

  • Btrfs does bio submissions from a worker thread, and each device
    has a list of high priority bios and regular priority bios.

    Synchronous writes go to the high priority thread while async writes
    go to regular list. This commit brings back an explicit unplug
    any time we switch from high to regular priority, which makes it
    easier for the block layer to give us low latencies.

    Signed-off-by: Chris Mason

    Chris Mason
     

02 Aug, 2011

1 commit


28 Jul, 2011

1 commit

  • This patch was originally from Tejun Heo. lockdep complains about the btrfs
    locking because we sometimes take btree locks from two different trees at the
    same time. The current classes are based only on level in the btree, which
    isn't enough information for lockdep to figure out if the lock is safe.

    This patch makes a class for each type of tree, and lumps all the FS trees that
    actually have files and directories into the same class.

    Signed-off-by: Chris Mason

    Chris Mason
     

26 Jul, 2011

1 commit

  • I also removed the BUG_ON from error return of find_next_chunk in
    init_first_rw_device(). It turns out that the only caller of
    init_first_rw_device() also BUGS on any nonzero return so no actual behavior
    change has occurred here.

    do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk()
    which can now return -ENOMEM. Instead of setting space_info->full on any
    error from btrfs_alloc_chunk() I catch and return every error value _except_
    -ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

15 Jul, 2011

1 commit

  • Dealing with this seems trivial - the only caller of btrfs_balance() is
    btrfs_ioctl() which passes the error code directly back to userspace. There
    also isn't much state to unwind (if I'm wrong about this point, we can
    always safely move the allocation to the top of btrfs_balance() anyway).

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     

07 Jul, 2011

1 commit


11 Jun, 2011

1 commit


04 Jun, 2011

1 commit


24 May, 2011

8 commits


23 May, 2011

2 commits


13 May, 2011

3 commits

  • In a multi device setup, the chunk allocator currently always allocates
    chunks on the devices in the same order. This leads to a very uneven
    distribution, especially with RAID1 or RAID10 and an uneven number of
    devices.
    This patch always sorts the devices before allocating, and allocates the
    stripes on the devices with the most available space, as long as there
    is enough space available. In a low space situation, it first tries to
    maximize striping.
    The patch also simplifies the allocator and reduces the checks for
    corner cases.
    The simplification is done by several means. First, it defines the
    properties of each RAID type upfront. These properties are used afterwards
    instead of differentiating cases in several places.
    Second, the old allocator defined a minimum stripe size for each block
    group type, tried to find a large enough chunk, and if this fails just
    allocates a smaller one. This is now done in one step. The largest possible
    chunk (up to max_chunk_size) is searched and allocated.
    Because we now have only one pass, the allocation of the map (struct
    map_lookup) is moved down to the point where the number of stripes is
    already known. This way we avoid reallocation of the map.
    We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.

    Arne Jansen
     
  • currently alloc_start is disregarded if the requested
    chunk size is bigger than (device size - alloc_start),
    but smaller than the device size.
    The only situation where I see this could have made sense
    was when a chunk equal the size of the device has been
    requested. This was possible as the allocator failed to
    take alloc_start into account when calculating the request
    chunk size. As this gets fixed by this patch, the workaround
    is not necessary anymore.

    Arne Jansen
     
  • this function won't be used here anymore, so move it super.c where it is
    used for df-calculation

    Arne Jansen
     

12 May, 2011

1 commit

  • This adds an initial implementation for scrub. It works quite
    straightforward. The usermode issues an ioctl for each device in the
    fs. For each device, it enumerates the allocated device chunks. For
    each chunk, the contained extents are enumerated and the data checksums
    fetched. The extents are read sequentially and the checksums verified.
    If an error occurs (checksum or EIO), a good copy is searched for. If
    one is found, the bad copy will be rewritten.
    All enumerations happen from the commit roots. During a transaction
    commit, the scrubs get paused and afterwards continue from the new
    roots.

    This commit is based on the series originally posted to linux-btrfs
    with some improvements that resulted from comments from David Sterba,
    Ilya Dryomov and Jan Schmidt.

    Signed-off-by: Arne Jansen

    Arne Jansen
     

06 May, 2011

1 commit

  • Remove static and global declarations and/or definitions. Reduces size
    of btrfs.ko by ~3.4kB.

    text data bss dec hex filename
    402081 7464 200 409745 64091 btrfs.ko.base
    398620 7144 200 405964 631cc btrfs.ko.remove-all

    Signed-off-by: David Sterba

    David Sterba
     

02 May, 2011

2 commits