29 Jun, 2014

1 commit


10 Jun, 2014

1 commit

  • The btrfs compression wrappers translated errors from workspace
    allocation to either -ENOMEM or -1. The compression type workspace
    allocators are already returning a ERR_PTR(-ENOMEM). Just return that
    and get rid of the magical -1.

    This helps a future patch return errors from the compression wrappers.

    Signed-off-by: Zach Brown
    Reviewed-by: David Sterba
    Signed-off-by: Chris Mason

    Zach Brown
     

04 Apr, 2014

1 commit

  • shmem mappings already contain exceptional entries where swap slot
    information is remembered.

    To be able to store eviction information for regular page cache, prepare
    every site dealing with the radix trees directly to handle entries other
    than pages.

    The common lookup functions will filter out non-page entries and return
    NULL for page cache holes, just as before. But provide a raw version of
    the API which returns non-page entries as well, and switch shmem over to
    use it.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

10 Feb, 2014

1 commit

  • Pull btrfs fixes from Chris Mason:
    "This is a small collection of fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix data corruption when reading/updating compressed extents
    Btrfs: don't loop forever if we can't run because of the tree mod log
    btrfs: reserve no transaction units in btrfs_ioctl_set_features
    btrfs: commit transaction after setting label and features
    Btrfs: fix assert screwup for the pending move stuff

    Linus Torvalds
     

09 Feb, 2014

1 commit

  • When using a mix of compressed file extents and prealloc extents, it
    is possible to fill a page of a file with random, garbage data from
    some unrelated previous use of the page, instead of a sequence of zeroes.

    A simple sequence of steps to get into such case, taken from the test
    case I made for xfstests, is:

    _scratch_mkfs
    _scratch_mount "-o compress-force=lzo"
    $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
    $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
    $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

    This results in the following file items in the fs tree:

    item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
    inode generation 6 transid 6 size 542872 block group 0 mode 100600
    item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
    inode ref index 2 namelen 6 name: foobar
    item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
    extent data disk byte 0 nr 0 gen 6
    extent data offset 0 nr 24576 ram 266240
    extent compression 0
    item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
    prealloc data disk byte 12849152 nr 241664 gen 6
    prealloc data offset 0 nr 241664
    item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
    extent data disk byte 12845056 nr 4096 gen 6
    extent data offset 0 nr 20480 ram 20480
    extent compression 2
    item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
    prealloc data disk byte 13090816 nr 405504 gen 6
    prealloc data offset 0 nr 258048

    The on disk extent at offset 266240 (which corresponds to 1 single disk block),
    contains 5 compressed chunks of file data. Each of the first 4 compress 4096
    bytes of file data, while the last one only compresses 3024 bytes of file data.
    Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
    1072 bytes) should always return zeroes (our next extent is a prealloc one).

    The solution here is the compression code path to zero the remaining (untouched)
    bytes of the last page it uncompressed data into, as the information about how
    much space the file data consumes in the last page is not known in the upper layer
    fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
    the remainder of the page but only if it corresponds to the last page of the inode
    and if the inode's size is not a multiple of the page size.

    This would cause not only returning random data on reads, but also permanently
    storing random data when updating parts of the region that should be zeroed.
    For the example above, it means updating a single byte in the region [285648 ; 286720[
    would store that byte correctly but also store random data on disk.

    A test case for xfstests follows soon.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     

31 Jan, 2014

1 commit

  • Pull btrfs updates from Chris Mason:
    "This is a pretty big pull, and most of these changes have been
    floating in btrfs-next for a long time. Filipe's properties work is a
    cool building block for inheriting attributes like compression down on
    a per inode basis.

    Jeff Mahoney kicked in code to export filesystem info into sysfs.

    Otherwise, lots of performance improvements, cleanups and bug fixes.

    Looks like there are still a few other small pending incrementals, but
    I wanted to get the bulk of this in first"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
    Btrfs: fix spin_unlock in check_ref_cleanup
    Btrfs: setup inode location during btrfs_init_inode_locked
    Btrfs: don't use ram_bytes for uncompressed inline items
    Btrfs: fix btrfs_search_slot_for_read backwards iteration
    Btrfs: do not export ulist functions
    Btrfs: rework ulist with list+rb_tree
    Btrfs: fix memory leaks on walking backrefs failure
    Btrfs: fix send file hole detection leading to data corruption
    Btrfs: add a reschedule point in btrfs_find_all_roots()
    Btrfs: make send's file extent item search more efficient
    Btrfs: fix to catch all errors when resolving indirect ref
    Btrfs: fix protection between walking backrefs and root deletion
    btrfs: fix warning while merging two adjacent extents
    Btrfs: fix infinite path build loops in incremental send
    btrfs: undo sysfs when open_ctree() fails
    Btrfs: fix snprintf usage by send's gen_unique_name
    btrfs: fix defrag 32-bit integer overflow
    btrfs: sysfs: list the NO_HOLES feature
    btrfs: sysfs: don't show reserved incompat feature
    btrfs: call permission checks earlier in ioctls and return EPERM
    ...

    Linus Torvalds
     

29 Jan, 2014

1 commit


24 Nov, 2013

2 commits

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     
  • With immutable biovecs we don't want code accessing bi_io_vec directly -
    the uses this patch changes weren't incorrect since they all own the
    bio, but it makes the code harder to audit for no good reason - also,
    this will help with multipage bvecs later.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: Jaegeuk Kim
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust

    Kent Overstreet
     

12 Nov, 2013

2 commits


01 Sep, 2013

2 commits

  • u64 is "unsigned long long" on all architectures now, so there's no need to
    cast it when formatting it using the "ll" length modifier.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Geert Uytterhoeven
     
  • We want this for btrfs_extent_same. Basically readpage and friends do their
    own extent locking but for the purposes of dedupe, we want to have both
    files locked down across a set of readpage operations (so that we can
    compare data). Introduce this variant and a flag which can be set for
    extent_read_full_page() to indicate that we are already locked.

    Partial credit for this patch goes to Gabriel de Perthuis
    as I have included a fix from him to the original patch which avoids a
    deadlock on compressed extents.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Mark Fasheh
     

07 May, 2013

2 commits

  • Big patch, but all it does is add statics to functions which
    are in fact static, then remove the associated dead-code fallout.

    removed functions:

    btrfs_iref_to_path()
    __btrfs_lookup_delayed_deletion_item()
    __btrfs_search_delayed_insertion_item()
    __btrfs_search_delayed_deletion_item()
    find_eb_for_page()
    btrfs_find_block_group()
    range_straddles_pages()
    extent_range_uptodate()
    btrfs_file_extent_length()
    btrfs_scrub_cancel_devid()
    btrfs_start_transaction_lflush()

    btrfs_print_tree() is left because it is used for debugging.
    btrfs_start_transaction_lflush() and btrfs_reada_detach() are
    left for symmetry.

    ulist.c functions are left, another patch will take care of those.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Josef Bacik

    Eric Sandeen
     
  • Argument 'root' is no more used in btrfs_csum_data().

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik

    Liu Bo
     

02 Feb, 2013

1 commit


13 Dec, 2012

1 commit

  • With the addition of the device replace procedure, it is possible
    for btrfs_map_bio(READ) to report an error. This happens when the
    specific mirror is requested which is located on the target disk,
    and the copy operation has not yet copied this block. Hence the
    block cannot be read and this error state is indicated by
    returning EIO.
    Some background information follows now. A new mirror is added
    while the device replace procedure is running.
    btrfs_get_num_copies() returns one more, and
    btrfs_map_bio(GET_READ_MIRROR) adds one more mirror if a disk
    location is involved that was already handled by the device
    replace copy operation. The assigned mirror num is the highest
    mirror number, e.g. the value 3 in case of RAID1.
    If btrfs_map_bio() is invoked with mirror_num == 0 (i.e., select
    any mirror), the copy on the target drive is never selected
    because that disk shall be able to perform the write requests as
    quickly as possible. The parallel execution of read requests would
    only slow down the disk copy procedure. Second case is that
    btrfs_map_bio() is called with mirror_num > 0. This is done from
    the repair code only. In this case, the highest mirror num is
    assigned to the target disk, since it is used last. And when this
    mirror is not available because the copy procedure has not yet
    handled this area, an error is returned. Everywhere in the code
    the handling of such errors is added now.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     

09 Oct, 2012

1 commit


29 Aug, 2012

1 commit

  • We need a barrir before calling waitqueue_active otherwise we will miss
    wakeups. So in places that do atomic_dec(); then atomic_read() use
    atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
    and then add an explicit memory barrier everywhere else that need them.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

14 Apr, 2012

1 commit

  • Pull the minimal btrfs branch from Chris Mason:
    "We have a use-after-free in there, along with errors when mount -o
    discard is enabled, and a BUG_ON(we should compile with UP more
    often)."

    * 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: use commit root when loading free space cache
    Btrfs: fix use-after-free in __btrfs_end_transaction
    Btrfs: check return value of bio_alloc() properly
    Btrfs: remove lock assert from get_restripe_target()
    Btrfs: fix eof while discarding extents
    Btrfs: fix uninit variable in repair_eb_io_failure
    Revert "Btrfs: increase the global block reserve estimates"

    Linus Torvalds
     

13 Apr, 2012

1 commit


31 Mar, 2012

1 commit

  • Pull btrfs fixes and features from Chris Mason:
    "We've merged in the error handling patches from SuSE. These are
    already shipping in the sles kernel, and they give btrfs the ability
    to abort transactions and go readonly on errors. It involves a lot of
    churn as they clarify BUG_ONs, and remove the ones we now properly
    deal with.

    Josef reworked the way our metadata interacts with the page cache.
    page->private now points to the btrfs extent_buffer object, which
    makes everything faster. He changed it so we write an whole extent
    buffer at a time instead of allowing individual pages to go down,,
    which will be important for the raid5/6 code (for the 3.5 merge
    window ;)

    Josef also made us more aggressive about dropping pages for metadata
    blocks that were freed due to COW. Overall, our metadata caching is
    much faster now.

    We've integrated my patch for metadata bigger than the page size.
    This allows metadata blocks up to 64KB in size. In practice 16K and
    32K seem to work best. For workloads with lots of metadata, this cuts
    down the size of the extent allocation tree dramatically and fragments
    much less.

    Scrub was updated to support the larger block sizes, which ended up
    being a fairly large change (thanks Stefan Behrens).

    We also have an assortment of fixes and updates, especially to the
    balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
    the defragging code (Liu Bo)."

    Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
    removal of the second argument to k[un]map_atomic() in commit
    7ac687d9e047.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
    Btrfs: update the checks for mixed block groups with big metadata blocks
    Btrfs: update to the right index of defragment
    Btrfs: do not bother to defrag an extent if it is a big real extent
    Btrfs: add a check to decide if we should defrag the range
    Btrfs: fix recursive defragment with autodefrag option
    Btrfs: fix the mismatch of page->mapping
    Btrfs: fix race between direct io and autodefrag
    Btrfs: fix deadlock during allocating chunks
    Btrfs: show useful info in space reservation tracepoint
    Btrfs: don't use crc items bigger than 4KB
    Btrfs: flush out and clean up any block device pages during mount
    btrfs: disallow unequal data/metadata blocksize for mixed block groups
    Btrfs: enhance superblock sanity checks
    Btrfs: change scrub to support big blocks
    Btrfs: minor cleanup in scrub
    Btrfs: introduce common define for max number of mirrors
    Btrfs: fix infinite loop in btrfs_shrink_device()
    Btrfs: fix memory leak in resolver code
    Btrfs: allow dup for data chunks in mixed mode
    Btrfs: validate target profiles only if we are going to use them
    ...

    Linus Torvalds
     

22 Mar, 2012

3 commits


20 Mar, 2012

1 commit


17 Feb, 2012

1 commit


06 Nov, 2011

1 commit

  • fs_info has now ~9kb, more than fits into one page. This will cause
    mount failure when memory is too fragmented. Top space consumers are
    super block structures super_copy and super_for_commit, ~2.8kb each.
    Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

    Add a wrapper for freeing fs_info and all of it's dynamically allocated
    members.

    Signed-off-by: David Sterba

    David Sterba
     

02 Aug, 2011

1 commit


23 May, 2011

1 commit


02 May, 2011

1 commit


25 Apr, 2011

1 commit

  • There's a potential problem in 32bit system when we exhaust 32bit inode
    numbers and start to allocate big inode numbers, because btrfs uses
    inode->i_ino in many places.

    So here we always use BTRFS_I(inode)->location.objectid, which is an
    u64 variable.

    There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
    inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
    and inode->i_ino will be used in those cases.

    Another reason to make this change is I'm going to use a special inode
    to save free ino cache, and the inode number must be > (u64)-256.

    Signed-off-by: Li Zefan

    Li Zefan
     

28 Mar, 2011

2 commits


06 Feb, 2011

1 commit


29 Jan, 2011

1 commit


22 Dec, 2010

3 commits

  • Add a common function to copy decompressed data from working buffer
    to bio pages.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • Lzo is a much faster compression algorithm than gzib, so would allow
    more users to enable transparent compression, and some users can
    choose from compression ratio and speed for different applications

    Usage:

    # mount -t btrfs -o compress[=] dev /mnt
    or
    # mount -t btrfs -o compress-force[=] dev /mnt

    "-o compress" without argument is still allowed for compatability.

    Compatibility:

    If we mount a filesystem with lzo compression, it will not be able be
    mounted in old kernels. One reason is, otherwise btrfs will directly
    dump compressed data, which sits in inline extent, to user.

    Performance:

    The test copied a linux source tarball (~400M) from an ext4 partition
    to the btrfs partition, and then extracted it.

    (time in second)
    lzo zlib nocompress
    copy: 10.6 21.7 14.9
    extract: 70.1 94.4 66.6

    (data size in MB)
    lzo zlib nocompress
    copy: 185.87 108.69 394.49
    extract: 193.80 132.36 381.21

    Changelog:

    v1 -> v2:
    - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
    - Add incompability flag.
    - Fix error handling in compress code.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • Make the code aware of compression type, instead of always assuming
    zlib compression.

    Also make the zlib workspace function as common code for all
    compression types.

    Signed-off-by: Li Zefan

    Li Zefan
     

22 Nov, 2010

1 commit