09 Oct, 2012

1 commit


29 Aug, 2012

1 commit

  • We need a barrir before calling waitqueue_active otherwise we will miss
    wakeups. So in places that do atomic_dec(); then atomic_read() use
    atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
    and then add an explicit memory barrier everywhere else that need them.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

14 Apr, 2012

1 commit

  • Pull the minimal btrfs branch from Chris Mason:
    "We have a use-after-free in there, along with errors when mount -o
    discard is enabled, and a BUG_ON(we should compile with UP more
    often)."

    * 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: use commit root when loading free space cache
    Btrfs: fix use-after-free in __btrfs_end_transaction
    Btrfs: check return value of bio_alloc() properly
    Btrfs: remove lock assert from get_restripe_target()
    Btrfs: fix eof while discarding extents
    Btrfs: fix uninit variable in repair_eb_io_failure
    Revert "Btrfs: increase the global block reserve estimates"

    Linus Torvalds
     

13 Apr, 2012

1 commit


31 Mar, 2012

1 commit

  • Pull btrfs fixes and features from Chris Mason:
    "We've merged in the error handling patches from SuSE. These are
    already shipping in the sles kernel, and they give btrfs the ability
    to abort transactions and go readonly on errors. It involves a lot of
    churn as they clarify BUG_ONs, and remove the ones we now properly
    deal with.

    Josef reworked the way our metadata interacts with the page cache.
    page->private now points to the btrfs extent_buffer object, which
    makes everything faster. He changed it so we write an whole extent
    buffer at a time instead of allowing individual pages to go down,,
    which will be important for the raid5/6 code (for the 3.5 merge
    window ;)

    Josef also made us more aggressive about dropping pages for metadata
    blocks that were freed due to COW. Overall, our metadata caching is
    much faster now.

    We've integrated my patch for metadata bigger than the page size.
    This allows metadata blocks up to 64KB in size. In practice 16K and
    32K seem to work best. For workloads with lots of metadata, this cuts
    down the size of the extent allocation tree dramatically and fragments
    much less.

    Scrub was updated to support the larger block sizes, which ended up
    being a fairly large change (thanks Stefan Behrens).

    We also have an assortment of fixes and updates, especially to the
    balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
    the defragging code (Liu Bo)."

    Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
    removal of the second argument to k[un]map_atomic() in commit
    7ac687d9e047.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
    Btrfs: update the checks for mixed block groups with big metadata blocks
    Btrfs: update to the right index of defragment
    Btrfs: do not bother to defrag an extent if it is a big real extent
    Btrfs: add a check to decide if we should defrag the range
    Btrfs: fix recursive defragment with autodefrag option
    Btrfs: fix the mismatch of page->mapping
    Btrfs: fix race between direct io and autodefrag
    Btrfs: fix deadlock during allocating chunks
    Btrfs: show useful info in space reservation tracepoint
    Btrfs: don't use crc items bigger than 4KB
    Btrfs: flush out and clean up any block device pages during mount
    btrfs: disallow unequal data/metadata blocksize for mixed block groups
    Btrfs: enhance superblock sanity checks
    Btrfs: change scrub to support big blocks
    Btrfs: minor cleanup in scrub
    Btrfs: introduce common define for max number of mirrors
    Btrfs: fix infinite loop in btrfs_shrink_device()
    Btrfs: fix memory leak in resolver code
    Btrfs: allow dup for data chunks in mixed mode
    Btrfs: validate target profiles only if we are going to use them
    ...

    Linus Torvalds
     

22 Mar, 2012

3 commits


20 Mar, 2012

1 commit


17 Feb, 2012

1 commit


06 Nov, 2011

1 commit

  • fs_info has now ~9kb, more than fits into one page. This will cause
    mount failure when memory is too fragmented. Top space consumers are
    super block structures super_copy and super_for_commit, ~2.8kb each.
    Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

    Add a wrapper for freeing fs_info and all of it's dynamically allocated
    members.

    Signed-off-by: David Sterba

    David Sterba
     

02 Aug, 2011

1 commit


23 May, 2011

1 commit


02 May, 2011

1 commit


25 Apr, 2011

1 commit

  • There's a potential problem in 32bit system when we exhaust 32bit inode
    numbers and start to allocate big inode numbers, because btrfs uses
    inode->i_ino in many places.

    So here we always use BTRFS_I(inode)->location.objectid, which is an
    u64 variable.

    There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
    inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
    and inode->i_ino will be used in those cases.

    Another reason to make this change is I'm going to use a special inode
    to save free ino cache, and the inode number must be > (u64)-256.

    Signed-off-by: Li Zefan

    Li Zefan
     

28 Mar, 2011

2 commits


06 Feb, 2011

1 commit


29 Jan, 2011

1 commit


22 Dec, 2010

3 commits

  • Add a common function to copy decompressed data from working buffer
    to bio pages.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • Lzo is a much faster compression algorithm than gzib, so would allow
    more users to enable transparent compression, and some users can
    choose from compression ratio and speed for different applications

    Usage:

    # mount -t btrfs -o compress[=] dev /mnt
    or
    # mount -t btrfs -o compress-force[=] dev /mnt

    "-o compress" without argument is still allowed for compatability.

    Compatibility:

    If we mount a filesystem with lzo compression, it will not be able be
    mounted in old kernels. One reason is, otherwise btrfs will directly
    dump compressed data, which sits in inline extent, to user.

    Performance:

    The test copied a linux source tarball (~400M) from an ext4 partition
    to the btrfs partition, and then extracted it.

    (time in second)
    lzo zlib nocompress
    copy: 10.6 21.7 14.9
    extract: 70.1 94.4 66.6

    (data size in MB)
    lzo zlib nocompress
    copy: 185.87 108.69 394.49
    extract: 193.80 132.36 381.21

    Changelog:

    v1 -> v2:
    - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
    - Add incompability flag.
    - Fix error handling in compress code.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • Make the code aware of compression type, instead of always assuming
    zlib compression.

    Also make the zlib workspace function as common code for all
    compression types.

    Signed-off-by: Li Zefan

    Li Zefan
     

22 Nov, 2010

1 commit


30 Oct, 2010

1 commit

  • These are all the cases where a variable is set, but not read which are
    not bugs as far as I can see, but simply leftovers.

    Still needs more review.

    Found by gcc 4.6's new warnings

    Signed-off-by: Andi Kleen
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Andi Kleen
     

06 Apr, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: add check for changed leaves in setup_leaf_for_split
    Btrfs: create snapshot references in same commit as snapshot
    Btrfs: fix small race with delalloc flushing waitqueue's
    Btrfs: use add_to_page_cache_lru, use __page_cache_alloc
    Btrfs: fix chunk allocate size calculation
    Btrfs: kill max_extent mount option
    Btrfs: fail to mount if we have problems reading the block groups
    Btrfs: check btrfs_get_extent return for IS_ERR()
    Btrfs: handle kmalloc() failure in inode lookup ioctl
    Btrfs: dereferencing freed memory
    Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk()
    Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree()
    Btrfs: Remove unnecessary finish_wait() in wait_current_trans()
    Btrfs: add NULL check for do_walk_down()
    Btrfs: remove duplicate include in ioctl.c

    Fix trivial conflict in fs/btrfs/compression.c due to slab.h include
    cleanups.

    Linus Torvalds
     
  • Pagecache pages should be allocated with __page_cache_alloc, so they
    obey pagecache memory policies.

    add_to_page_cache_lru is exported, so it should be used. Benefits over
    using a private pagevec: neater code, 128 bytes fewer stack used, percpu
    lru ordering is preserved, and finally don't need to flush pagevec
    before returning so batching may be shared with other LRU insertions.

    Signed-off-by: Nick Piggin :
    Signed-off-by: Chris Mason

    Nick Piggin
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

15 Mar, 2010

1 commit


12 Sep, 2009

2 commits


13 Jul, 2009

1 commit

  • * Remove smp_lock.h from files which don't need it (including some headers!)
    * Add smp_lock.h to files which do need it
    * Make smp_lock.h include conditional in hardirq.h
    It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

    This will make hardirq.h inclusion cheaper for every PREEMPT=n config
    (which includes allmodconfig/allyesconfig, BTW)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

10 Jun, 2009

1 commit

  • Add support for the standard attributes set via chattr and read via
    lsattr. Currently we store the attributes in the flags value in
    the btrfs inode, but I wonder whether we should split it into two so
    that we don't have to keep converting between the two formats.

    Remove the btrfs_clear_flag/btrfs_set_flag/btrfs_test_flag macros
    as they were confusing the existing code and got in the way of the
    new additions.

    Also add the FS_IOC_GETVERSION ioctl for getting i_generation as it's
    trivial.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Chris Mason

    Christoph Hellwig
     

21 Jan, 2009

1 commit


06 Jan, 2009

1 commit


12 Dec, 2008

1 commit

  • Checksums on data can be disabled by mount option, so it's
    possible some data extents don't have checksums or have
    invalid checksums. This causes trouble for data relocation.
    This patch contains following things to make data relocation
    work.

    1) make nodatasum/nodatacow mount option only affects new
    files. Checksums and COW on data are only controlled by the
    inode flags.

    2) check the existence of checksum in the nodatacow checker.
    If checksums exist, force COW the data extent. This ensure that
    checksum for a given block is either valid or does not exist.

    3) update data relocation code to properly handle the case
    of checksum missing.

    Signed-off-by: Yan Zheng

    Yan Zheng
     

09 Dec, 2008

1 commit

  • Btrfs stores checksums for each data block. Until now, they have
    been stored in the subvolume trees, indexed by the inode that is
    referencing the data block. This means that when we read the inode,
    we've probably read in at least some checksums as well.

    But, this has a few problems:

    * The checksums are indexed by logical offset in the file. When
    compression is on, this means we have to do the expensive checksumming
    on the uncompressed data. It would be faster if we could checksum
    the compressed data instead.

    * If we implement encryption, we'll be checksumming the plain text and
    storing that on disk. This is significantly less secure.

    * For either compression or encryption, we have to get the plain text
    back before we can verify the checksum as correct. This makes the raid
    layer balancing and extent moving much more expensive.

    * It makes the front end caching code more complex, as we have touch
    the subvolume and inodes as we cache extents.

    * There is potentitally one copy of the checksum in each subvolume
    referencing an extent.

    The solution used here is to store the extent checksums in a dedicated
    tree. This allows us to index the checksums by phyiscal extent
    start and length. It means:

    * The checksum is against the data stored on disk, after any compression
    or encryption is done.

    * The checksum is stored in a central location, and can be verified without
    following back references, or reading inodes.

    This makes compression significantly faster by reducing the amount of
    data that needs to be checksummed. It will also allow much faster
    raid management code in general.

    The checksums are indexed by a key with a fixed objectid (a magic value
    in ctree.h) and offset set to the starting byte of the extent. This
    allows us to copy the checksum items into the fsync log tree directly (or
    any other tree), without having to invent a second format for them.

    Signed-off-by: Chris Mason

    Chris Mason
     

20 Nov, 2008

2 commits


11 Nov, 2008

2 commits