16 Dec, 2011

1 commit

  • Al pointed out we have some random problems with the way we account for
    num_workers_starting in the async thread stuff. First of all we need to make
    sure to decrement num_workers_starting if we fail to start the worker, so make
    __btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
    doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
    failed to create a worker. Also check_pending_worker_creates needs to call
    __btrfs_start_work in it's work function since it already increments
    num_workers_starting.

    People only start one worker at a time, so get rid of the num_workers argument
    everywhere, and make btrfs_queue_worker a void since it will always succeed.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

01 Dec, 2011

1 commit


20 Nov, 2011

1 commit

  • This patch casts to unsigned long before casting to a pointer and fixes
    the following warnings:
    fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Chris Mason

    Jeff Mahoney
     

11 Nov, 2011

1 commit

  • Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
    dm based targets accept only one page per bio, thus making scrub always
    fails. This patch just submits the current bio when an error is encountered
    and starts a new one.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     

06 Nov, 2011

5 commits


02 Oct, 2011

1 commit

  • Scrub uses a simple tree-enumeration to bring the relevant portions
    of the extent- and csum-tree into the page cache before starting the
    scrub-I/O. This is now replaced by using the new readahead-API.
    During readahead the scrub is being accounted as paused, so it won't
    hold off transaction commits.

    This change raises the average disk bandwith utilisation on my test
    volume from 70% to 90%. On another volume, the time for a test run
    went down from 89s to 43s.

    Changes v5:
    - reada1/2 are now of type struct reada_control *

    Signed-off-by: Arne Jansen

    Arne Jansen
     

29 Sep, 2011

7 commits

  • This ties nodatasum fixup in scrub together with raid repair patches. While
    both series are working fine alone, scrub will report uncorrectable errors
    if they occur in a nodatasum extent *and* the page is in the page cache.

    Previously, we would have triggered readpage to find good data and do the
    repair. However, readpage wouldn't read anything in the case where the page
    is up to date in the cache. So, we simply take that good data we have and
    call repair_io_failure directly (unless the page in the cache is dirty).

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • btrfs_bio is a bio abstraction able to split and not complete after the last
    bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio
    tracks the mirror_num used to read data which can be used for error
    correction purposes.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • This removes a FIXME comment and introduces the first part of nodatasum
    fixup: It gets the corresponding inode for a logical address and triggers a
    regular readpage for the corrupted sector.

    Once we have on-the-fly error correction our error will be automatically
    corrected. The correction code is expected to clear the newly introduced
    EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead
    of "uncorrectable" eventually.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • the rest of the code uses int mirror_num, and so should scrub

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
    did not use mirror_num for anything important and that error went unnoticed.
    The nodatasum fixup patch of this set depends on a correct mirror_num.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • While scrubbing, we may encounter various errors. Previously, a logical
    address was printed to the log only. Now, all paths belonging to that
    address are resolved and printed separately. That should work for hardlinks
    as well as reflinks.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • In normal operation, scrub is reading data sequentially in large portions.
    In case of an i/o error, we try to find the corrupted area(s) by issuing
    page sized read requests. With this commit we increment the
    unverified_errors counter if all of the small size requests succeed.

    Userland patches carrying such conspicous events to the administrator should
    already be around.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     

10 Jun, 2011

3 commits


04 Jun, 2011

3 commits

  • wrap checking of filesystem 'closing' flag and fix a few missing memory
    barriers.

    Signed-off-by: David Sterba

    David Sterba
     
  • With the removal of the implicit plugging scrub ends up doing more and
    smaller I/O than necessary. This patch adds explicit plugging per chunk.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • The current scrub implementation reuses bios and pages as often as possible,
    allocating them only on start and releasing them when finished. This leads
    to more problems with the block layer than it's worth. The elevator gets
    confused when there are more pages added to the bio than bi_size suggests.
    This patch completely rips out the reuse of bios and pages and allocates
    them freshly for each submit.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Maosn

    Arne Jansen
     

27 May, 2011

1 commit


23 May, 2011

1 commit


12 May, 2011

3 commits

  • setting the readonly flag prevents writes in case an error is detected

    Signed-off-by: Arne Jansen

    Arne Jansen
     
  • btrfs scrub - make fixups sync, don't reuse fixup bios

    Fixups are already sync for csum failures, this patch makes them sync
    for EIO case as well.

    Fixups are now sharing pages with the parent sbio - instead of
    allocating a separate page to do a fixup we grab the page from the sbio
    buffer.

    Fixup bios are no longer reused.

    struct fixup is no longer needed, instead pass [sbio pointer, index].

    Originally this was added to look at the possibility of sharing the code
    between drive swap and scrub, but it actually fixes a serious bug in
    scrub code where errors that could be corrected were ignored and
    reported as uncorrectable.

    btrfs scrub - restore bios properly after media errors

    The current code reallocates a bio after a media error. This is a
    temporary measure introduced in v3 after a serious problem related to
    bio reuse was found in v2 of scrub patchset.

    Basically we did not reset bv_offset and bv_len fields of the bio_vec
    structure. They are changed in case I/O error happens, for example, at
    offset 512 or 1024 into the page. Also bi_flags field wasn't properly
    setup before reusing the bio.

    Signed-off-by: Arne Jansen

    Ilya Dryomov
     
  • This adds an initial implementation for scrub. It works quite
    straightforward. The usermode issues an ioctl for each device in the
    fs. For each device, it enumerates the allocated device chunks. For
    each chunk, the contained extents are enumerated and the data checksums
    fetched. The extents are read sequentially and the checksums verified.
    If an error occurs (checksum or EIO), a good copy is searched for. If
    one is found, the bad copy will be rewritten.
    All enumerations happen from the commit roots. During a transaction
    commit, the scrubs get paused and afterwards continue from the new
    roots.

    This commit is based on the series originally posted to linux-btrfs
    with some improvements that resulted from comments from David Sterba,
    Ilya Dryomov and Jan Schmidt.

    Signed-off-by: Arne Jansen

    Arne Jansen