25 May, 2020

1 commit

  • The inode lookup starting at btrfs_iget takes the full location key,
    while only the objectid is used to match the inode, because the lookup
    happens inside the given root thus the inode number is unique.
    The entire location key is properly set up in btrfs_init_locked_inode.

    Simplify the helpers and pass only inode number, renaming it to 'ino'
    instead of 'objectid'. This allows to remove temporary variables key,
    saving some stack space.

    Signed-off-by: David Sterba

    David Sterba
     

24 Mar, 2020

1 commit

  • Currently the non-prefixed version is a simple wrapper used to hide
    the 4th argument of the prefixed version. This doesn't bring much value
    in practice and only makes the code harder to follow by adding another
    level of indirection. Rectify this by removing the __ prefix and
    have only one public function to release bytes from a block reservation.
    No semantic changes.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     

18 Nov, 2019

2 commits


09 Sep, 2019

1 commit

  • btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
    it doesn't take into account any splitting at the levels, because
    truncate will never split nodes. However truncate _and_ changing will
    never split nodes, so rename btrfs_calc_trunc_metadata_size to
    btrfs_calc_metadata_size. Also btrfs_calc_trans_metadata_size is purely
    for inserting items, so rename this to btrfs_calc_insert_metadata_size.
    Making these clearer will help when I start using them differently in
    upcoming patches.

    Reviewed-by: Nikolay Borisov
    Signed-off-by: Josef Bacik
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Josef Bacik
     

02 Jul, 2019

2 commits

  • gcc sometimes can't determine whether a variable has been initialized
    when both the initialization and the use are conditional:

    fs/btrfs/props.c: In function 'inherit_props':
    fs/btrfs/props.c:389:4: error: 'num_bytes' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    btrfs_block_rsv_release(fs_info, trans->block_rsv,

    This code is fine. Unfortunately, I cannot think of a good way to
    rephrase it in a way that makes gcc understand this, so I add a bogus
    initialization the way one should not.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: David Sterba
    [ gcc 8 and 9 don't emit the warning ]
    Signed-off-by: David Sterba

    Arnd Bergmann
     
  • Nikolay reported the following KASAN splat when running btrfs/048:

    [ 1843.470920] ==================================================================
    [ 1843.471971] BUG: KASAN: slab-out-of-bounds in strncmp+0x66/0xb0
    [ 1843.472775] Read of size 1 at addr ffff888111e369e2 by task btrfs/3979

    [ 1843.473904] CPU: 3 PID: 3979 Comm: btrfs Not tainted 5.2.0-rc3-default #536
    [ 1843.475009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 1843.476322] Call Trace:
    [ 1843.476674] dump_stack+0x7c/0xbb
    [ 1843.477132] ? strncmp+0x66/0xb0
    [ 1843.477587] print_address_description+0x114/0x320
    [ 1843.478256] ? strncmp+0x66/0xb0
    [ 1843.478740] ? strncmp+0x66/0xb0
    [ 1843.479185] __kasan_report+0x14e/0x192
    [ 1843.479759] ? strncmp+0x66/0xb0
    [ 1843.480209] kasan_report+0xe/0x20
    [ 1843.480679] strncmp+0x66/0xb0
    [ 1843.481105] prop_compression_validate+0x24/0x70
    [ 1843.481798] btrfs_xattr_handler_set_prop+0x65/0x160
    [ 1843.482509] __vfs_setxattr+0x71/0x90
    [ 1843.483012] __vfs_setxattr_noperm+0x84/0x130
    [ 1843.483606] vfs_setxattr+0xac/0xb0
    [ 1843.484085] setxattr+0x18c/0x230
    [ 1843.484546] ? vfs_setxattr+0xb0/0xb0
    [ 1843.485048] ? __mod_node_page_state+0x1f/0xa0
    [ 1843.485672] ? _raw_spin_unlock+0x24/0x40
    [ 1843.486233] ? __handle_mm_fault+0x988/0x1290
    [ 1843.486823] ? lock_acquire+0xb4/0x1e0
    [ 1843.487330] ? lock_acquire+0xb4/0x1e0
    [ 1843.487842] ? mnt_want_write_file+0x3c/0x80
    [ 1843.488442] ? debug_lockdep_rcu_enabled+0x22/0x40
    [ 1843.489089] ? rcu_sync_lockdep_assert+0xe/0x70
    [ 1843.489707] ? __sb_start_write+0x158/0x200
    [ 1843.490278] ? mnt_want_write_file+0x3c/0x80
    [ 1843.490855] ? __mnt_want_write+0x98/0xe0
    [ 1843.491397] __x64_sys_fsetxattr+0xba/0xe0
    [ 1843.492201] ? trace_hardirqs_off_thunk+0x1a/0x1c
    [ 1843.493201] do_syscall_64+0x6c/0x230
    [ 1843.493988] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 1843.495041] RIP: 0033:0x7fa7a8a7707a
    [ 1843.495819] Code: 48 8b 0d 21 de 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 be 00 00 00 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee dd 2b 00 f7 d8 64 89 01 48
    [ 1843.499203] RSP: 002b:00007ffcb73bca38 EFLAGS: 00000202 ORIG_RAX: 00000000000000be
    [ 1843.500210] RAX: ffffffffffffffda RBX: 00007ffcb73bda9d RCX: 00007fa7a8a7707a
    [ 1843.501170] RDX: 00007ffcb73bda9d RSI: 00000000006dc050 RDI: 0000000000000003
    [ 1843.502152] RBP: 00000000006dc050 R08: 0000000000000000 R09: 0000000000000000
    [ 1843.503109] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffcb73bda91
    [ 1843.504055] R13: 0000000000000003 R14: 00007ffcb73bda82 R15: ffffffffffffffff

    [ 1843.505268] Allocated by task 3979:
    [ 1843.505771] save_stack+0x19/0x80
    [ 1843.506211] __kasan_kmalloc.constprop.5+0xa0/0xd0
    [ 1843.506836] setxattr+0xeb/0x230
    [ 1843.507264] __x64_sys_fsetxattr+0xba/0xe0
    [ 1843.507886] do_syscall_64+0x6c/0x230
    [ 1843.508429] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    [ 1843.509558] Freed by task 0:
    [ 1843.510188] (stack is not available)

    [ 1843.511309] The buggy address belongs to the object at ffff888111e369e0
    which belongs to the cache kmalloc-8 of size 8
    [ 1843.514095] The buggy address is located 2 bytes inside of
    8-byte region [ffff888111e369e0, ffff888111e369e8)
    [ 1843.516524] The buggy address belongs to the page:
    [ 1843.517561] page:ffff88813f478d80 refcount:1 mapcount:0 mapping:ffff88811940c300 index:0xffff888111e373b8 compound_mapcount: 0
    [ 1843.519993] flags: 0x4404000010200(slab|head)
    [ 1843.520951] raw: 0004404000010200 ffff88813f48b008 ffff888119403d50 ffff88811940c300
    [ 1843.522616] raw: ffff888111e373b8 000000000016000f 00000001ffffffff 0000000000000000
    [ 1843.524281] page dumped because: kasan: bad access detected

    [ 1843.525936] Memory state around the buggy address:
    [ 1843.526975] ffff888111e36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 1843.528479] ffff888111e36900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 1843.530138] >ffff888111e36980: fc fc fc fc fc fc fc fc fc fc fc fc 02 fc fc fc
    [ 1843.531877] ^
    [ 1843.533287] ffff888111e36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 1843.534874] ffff888111e36a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 1843.536468] ==================================================================

    This is caused by supplying a too short compression value ('lz') in the
    test-case and comparing it to 'lzo' with strncmp() and a length of 3.
    strncmp() read past the 'lz' when looking for the 'o' and thus caused an
    out-of-bounds read.

    Introduce a new check 'btrfs_compress_is_valid_type()' which not only
    checks the user-supplied value against known compression types, but also
    employs checks for too short values.

    Reported-by: Nikolay Borisov
    Fixes: 272e5326c783 ("btrfs: prop: fix vanished compression property after failed set")
    CC: stable@vger.kernel.org # 5.1+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Johannes Thumshirn
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Johannes Thumshirn
     

09 May, 2019

1 commit

  • We're now reserving an extra items worth of space for property
    inheritance. We only have one property at the moment so this covers us,
    but if we add more in the future this will allow us to not get bitten by
    the extra space reservation. If we do add more properties in the future
    we should re-visit how we calculate the space reservation needs by the
    callers.

    Reviewed-by: Filipe Manana
    Signed-off-by: Josef Bacik
    [ refreshed on top of prop/xattr cleanups ]
    Signed-off-by: David Sterba

    Josef Bacik
     

30 Apr, 2019

11 commits

  • Since now the trans argument is never NULL in btrfs_set_prop we don't
    have to check. So delete it and use btrfs_setxattr that makes use of
    that.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • The last consumer of btrfs_set_prop_trans() was taken away by the patch
    ("btrfs: start transaction in xattr_handler_set_prop") so now this
    function can be deleted.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • Make btrfs_set_prop() a non-static function, so that it can be called
    from btrfs_ioctl_setflags(). We need btrfs_set_prop() instead of
    btrfs_set_prop_trans() so that we can use the transaction which is
    already started in the current thread.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • In preparation to merge multiple transactions when setting the
    compression flags, split btrfs_set_props() validation part outside of
    it.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • Previous patch made sure that btrfs_setxattr_trans() is called only when
    transaction NULL. Clean up btrfs_setxattr_trans() and drop the
    parameter.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • When the caller has already created the transaction handle,
    btrfs_setxattr() will use it. Also adds assert in btrfs_setxattr().

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • Rename btrfs_setxattr() to btrfs_setxattr_trans(), so that do_setxattr()
    can be renamed to btrfs_setxattr().
    Preparatory patch, no functional changes.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • When an inode inherits property from its parent, we call btrfs_set_prop().
    btrfs_set_prop() does an elaborate checks, which is not required in the
    context of inheriting a property. Instead just open-code only the required
    items from btrfs_set_prop() and then call btrfs_setxattr() directly. So
    now the only user of btrfs_set_prop() is gone, (except for the wraper
    function btrfs_set_prop_trans()).

    Reviewed-by: Nikolay Borisov
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • btrfs_set_prop() takes transaction pointer as the first argument,
    however in ioctl.c for the purpose of setting the compression property,
    we call btrfs_set_prop() with NULL transaction pointer. Down in
    the call chain btrfs_setxattr() starts transaction to update the
    attribute and also to update the inode.

    So for clarity, create btrfs_set_prop_trans() with no transaction
    pointer as argument, in preparation to start transaction here instead of
    doing it down the call chain at btrfs_setxattr().

    Also now the btrfs_set_prop() is a static function.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • Drop forward declaration of the functions:

    - prop_compression_validate
    - prop_compression_apply
    - prop_compression_extract

    No functional changes.

    Reviewed-by: Nikolay Borisov
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • btrfs_set_prop() is a redirect to __btrfs_set_prop() with the
    transaction handle equal to NULL. __btrfs_set_prop() in turn passes
    this to do_setxattr() which then transaction is actually created.

    Instead merge __btrfs_set_prop() to btrfs_set_prop(), and update the
    caller with NULL argument.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     

04 Apr, 2019

2 commits

  • The compression property resets to NULL, instead of the old value if we
    fail to set the new compression parameter.

    $ btrfs prop get /btrfs compression
    compression=lzo
    $ btrfs prop set /btrfs compression zli
    ERROR: failed to set compression for /btrfs: Invalid argument
    $ btrfs prop get /btrfs compression

    This is because the compression property ->validate() is successful for
    'zli' as the strncmp() used the length passed from the userspace.

    Fix it by using the expected string length in strncmp().

    Fixes: 63541927c8d1 ("Btrfs: add support for inode properties")
    Fixes: 5c1aab1dd544 ("btrfs: Add zstd support")
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • We let pass zstd compression parameter even if it is not fully valid.
    For example:

    $ btrfs prop set /btrfs compression zst
    $ btrfs prop get /btrfs compression
    compression=zst

    zlib and lzo are fine.

    Fix it by checking the correct prefix length.

    Fixes: 5c1aab1dd544 ("btrfs: Add zstd support")
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     

17 May, 2018

1 commit

  • Incompat flag of LZO/ZSTD compression should be set at:

    1. mount time (-o compress/compress-force)
    2. when defrag is done
    3. when property is set

    Currently 3. is missing and this commit adds this.

    This could lead to a filesystem that uses ZSTD but is not marked as
    such. If a kernel without a ZSTD support encounteres a ZSTD compressed
    extent, it will handle that but this could be confusing to the user.

    Typically the filesystem is mounted with the ZSTD option, but the
    discrepancy can arise when a filesystem is never mounted with ZSTD and
    then the property on some file is set (and some new extents are
    written). A simple mount with -o compress=zstd will fix that up on an
    unpatched kernel.

    Same goes for LZO, but this has been around for a very long time
    (2.6.37) so it's unlikely that a pre-LZO kernel would be used.

    Fixes: 5c1aab1dd544 ("btrfs: Add zstd support")
    CC: stable@vger.kernel.org # 4.14+
    Signed-off-by: Tomohiro Misono
    Reviewed-by: Anand Jain
    Reviewed-by: David Sterba
    [ add user visible impact ]
    Signed-off-by: David Sterba

    Misono Tomohiro
     

12 Apr, 2018

1 commit


26 Mar, 2018

2 commits

  • Reviewed-by: Nikolay Borisov
    Signed-off-by: David Sterba

    David Sterba
     
  • The custom crc32 init code was introduced in
    14a958e678cd ("Btrfs: fix btrfs boot when compiled as built-in") to
    enable using btrfs as a built-in. However, later as pointed out by
    60efa5eb2e88 ("Btrfs: use late_initcall instead of module_init") this
    wasn't enough and finally btrfs was switched to late_initcall which
    comes after the generic crc32c implementation is initiliased. The
    latter commit superseeded the former. Now that we don't have to
    maintain our own code let's just remove it and switch to using the
    generic implementation.

    Despite touching a lot of files the patch is really simple. Here is the gist of
    the changes:

    1. Select LIBCRC32C rather than the low-level modules.
    2. s/btrfs_crc32c/crc32c/g
    3. replace hash.h with linux/crc32c.h
    4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly.

    I've tested this with btrfs being both a module and a built-in and xfstest
    doesn't complain.

    Does seem to fix the longstanding problem of not automatically selectiong
    the crc32c module when btrfs is used. Possibly there is a workaround in
    dracut.

    The modinfo confirms that now all the module dependencies are there:

    before:
    depends: zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate

    after:
    depends: libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    [ add more info to changelog from mails ]
    Signed-off-by: David Sterba

    Nikolay Borisov
     

22 Jan, 2018

2 commits


15 Sep, 2017

1 commit

  • Pull zstd support from Chris Mason:
    "Nick Terrell's patch series to add zstd support to the kernel has been
    floating around for a while. After talking with Dave Sterba, Herbert
    and Phillip, we decided to send the whole thing in as one pull
    request.

    zstd is a big win in speed over zlib and in compression ratio over
    lzo, and the compression team here at FB has gotten great results
    using it in production. Nick will continue to update the kernel side
    with new improvements from the open source zstd userland code.

    Nick has a number of benchmarks for the main zstd code in his lib/zstd
    commit:

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
    of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
    Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
    `silesia.tar` [3], which is 211,988,480 B large. Run the following
    commands for the benchmark:

    sudo modprobe zstd_compress_test
    sudo mknod zstd_compress_test c 245 0
    sudo cp silesia.tar zstd_compress_test

    The time is reported by the time of the userland `cp`.
    The MB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Adjusted MB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    The memory reported is the amount of memory the compressor
    requests.

    | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none | 11988480 | 0.100 | 1 | 2119.88 | - | - |
    | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
    | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
    | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
    | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
    | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
    | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
    | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
    | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
    | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
    | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |

    I benchmarked zstd decompression using the same method on the same
    machine. The benchmark file is located in the upstream zstd repo
    under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
    memory reported is the amount of memory required to decompress
    data compressed with the given compression level. If you know the
    maximum size of your input, you can reduce the memory usage of
    decompression irrespective of the compression level.

    | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none | 0.025 | 8479.54 | - | - |
    | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
    | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
    | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
    | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
    | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
    | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
    | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
    | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |

    I ran a long series of tests and benchmarks on the btrfs side and the
    gains are very similar to the core benchmarks Nick ran"

    * 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    squashfs: Add zstd support
    btrfs: Add zstd support
    lib: Add zstd modules
    lib: Add xxhash module

    Linus Torvalds
     

16 Aug, 2017

3 commits

  • This is a minimal patch intended to be backported to older kernels.
    We're going to extend the string specifying the compression method and
    this would fail on kernels before that change (the string is compared
    exactly).

    Relax the string matching only to the prefix, ie. ignoring anything that
    goes after "zlib" or "lzo", regardless of th format extension we decide
    to use. This applies to the mount options and properties.

    That way, patched old kernels could be booted on systems already
    utilizing the new compression spec.

    Applicable since commit 63541927c8d11, v3.14.

    Signed-off-by: David Sterba

    David Sterba
     
  • This is preparatory for separating inode compression requested by defrag
    and set via properties. This will fix a usability bug when defrag will
    reset compression type to NONE. If the file has compression set via
    property, it will not apply anymore (until next mount or reset through
    command line).

    We're going to fix that by adding another variable just for the defrag
    call and won't touch the property. The defrag will have higher priority
    when deciding whether to compress the data.

    Signed-off-by: David Sterba

    David Sterba
     
  • Add zstd compression and decompression support to BtrFS. zstd at its
    fastest level compresses almost as well as zlib, while offering much
    faster compression and decompression, approaching lzo speeds.

    I benchmarked btrfs with zstd compression against no compression, lzo
    compression, and zlib compression. I benchmarked two scenarios. Copying
    a set of files to btrfs, and then reading the files. Copying a tarball
    to btrfs, extracting it to btrfs, and then reading the extracted files.
    After every operation, I call `sync` and include the sync time.
    Between every pair of operations I unmount and remount the filesystem
    to avoid caching. The benchmark files can be found in the upstream
    zstd source repository under
    `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
    [1] [2].

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD.

    The first compression benchmark is copying 10 copies of the unzipped
    Silesia corpus [3] into a BtrFS filesystem mounted with
    `-o compress-force=Method`. The decompression benchmark times how long
    it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
    measured by comparing the output of `df` and `du`. See the benchmark file
    [1] for details. I benchmarked multiple zstd compression levels, although
    the patch uses zstd level 1.

    | Method | Ratio | Compression MB/s | Decompression speed |
    |---------|-------|------------------|---------------------|
    | None | 0.99 | 504 | 686 |
    | lzo | 1.66 | 398 | 442 |
    | zlib | 2.58 | 65 | 241 |
    | zstd 1 | 2.57 | 260 | 383 |
    | zstd 3 | 2.71 | 174 | 408 |
    | zstd 6 | 2.87 | 70 | 398 |
    | zstd 9 | 2.92 | 43 | 406 |
    | zstd 12 | 2.93 | 21 | 408 |
    | zstd 15 | 3.01 | 11 | 354 |

    The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
    measures the compression ratio, extracts the tar, and deletes the tar.
    Then it measures the compression ratio again, and `tar`s the extracted
    files into `/dev/null`. See the benchmark file [2] for details.

    | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
    |--------|-----------|---------------|----------|------------|----------|
    | None | 0.97 | 0.78 | 0.981 | 5.501 | 8.807 |
    | lzo | 2.06 | 1.38 | 1.631 | 8.458 | 8.585 |
    | zlib | 3.40 | 1.86 | 7.750 | 21.544 | 11.744 |
    | zstd 1 | 3.57 | 1.85 | 2.579 | 11.479 | 9.389 |

    [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz

    zstd source repository: https://github.com/facebook/zstd

    Signed-off-by: Nick Terrell
    Signed-off-by: Chris Mason

    Nick Terrell
     

22 Jun, 2017

1 commit


14 Feb, 2017

1 commit

  • Currently btrfs_ino takes a struct inode and this causes a lot of
    internal btrfs functions which consume this ino to take a VFS inode,
    rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
    of VFS structs into the internals of btrfs first it's necessary to
    eliminate all uses of struct inode for the purpose of inode. This patch
    does that by using BTRFS_I to convert an inode to btrfs_inode. With
    this problem eliminated subsequent patches will start eliminating the
    passing of struct inode altogether, eventually resulting in a lot cleaner
    code.

    Signed-off-by: Nikolay Borisov
    [ fix btrfs_get_extent tracepoint prototype ]
    Signed-off-by: David Sterba

    Nikolay Borisov
     

06 Dec, 2016

2 commits


26 Jul, 2016

1 commit

  • We just need a superblock, but we look it up using two different
    roots depending on the call site. Let's just use a superblock
    pointer initialized at the outset.

    This is mostly for Coccinelle not to choke on my root push up set.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     

12 Mar, 2016

1 commit


22 Oct, 2015

1 commit

  • This patch eliminates the last item of prop_handlers array which is used
    to check end of array and instead uses ARRAY_SIZE macro.
    Though this is a very tiny optimization, using ARRAY_SIZE macro is a
    good practice to iterate array.

    Reviewed-by: David Sterba
    Signed-off-by: Byongho Lee
    Signed-off-by: David Sterba

    Byongho Lee
     

17 Feb, 2015

1 commit


29 Jan, 2014

1 commit

  • This change adds infrastructure to allow for generic properties for
    inodes. Properties are name/value pairs that can be associated with
    inodes for different purposes. They are stored as xattrs with the
    prefix "btrfs."

    Properties can be inherited - this means when a directory inode has
    inheritable properties set, these are added to new inodes created
    under that directory. Further, subvolumes can also have properties
    associated with them, and they can be inherited from their parent
    subvolume. Naturally, directory properties have priority over subvolume
    properties (in practice a subvolume property is just a regular
    property associated with the root inode, objectid 256, of the
    subvolume's fs tree).

    This change also adds one specific property implementation, named
    "compression", whose values can be "lzo" or "zlib" and it's an
    inheritable property.

    The corresponding changes to btrfs-progs were also implemented.
    A patch with xfstests for this feature will follow once there's
    agreement on this change/feature.

    Further, the script at the bottom of this commit message was used to
    do some benchmarks to measure any performance penalties of this feature.

    Basically the tests correspond to:

    Test 1 - create a filesystem and mount it with compress-force=lzo,
    then sequentially create N files of 64Kb each, measure how long it took
    to create the files, unmount the filesystem, mount the filesystem and
    perform an 'ls -lha' against the test directory holding the N files, and
    report the time the command took.

    Test 2 - create a filesystem and don't use any compression option when
    mounting it - instead set the compression property of the subvolume's
    root to 'lzo'. Then create N files of 64Kb, and report the time it took.
    The unmount the filesystem, mount it again and perform an 'ls -lha' like
    in the former test. This means every single file ends up with a property
    (xattr) associated to it.

    Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
    compression property, have no real effect other than adding more work
    when inheriting properties and taking more btree leaf space.

    Test 4 - same as test 3 but with 10 properties per file.

    Results (in seconds, and averages of 5 runs each), for different N
    numbers of files follow.

    * Without properties (test 1)

    file creation time ls -lha time
    10 000 files 3.49 0.76
    100 000 files 47.19 8.37
    1 000 000 files 518.51 107.06

    * With 1 property (compression property set to lzo - test 2)

    file creation time ls -lha time
    10 000 files 3.63 0.93
    100 000 files 48.56 9.74
    1 000 000 files 537.72 125.11

    * With 4 properties (test 3)

    file creation time ls -lha time
    10 000 files 3.94 1.20
    100 000 files 52.14 11.48
    1 000 000 files 572.70 142.13

    * With 10 properties (test 4)

    file creation time ls -lha time
    10 000 files 4.61 1.35
    100 000 files 58.86 13.83
    1 000 000 files 656.01 177.61

    The increased latencies with properties are essencialy because of:

    *) When creating an inode, we now synchronously write 1 more item
    (an xattr item) for each property inherited from the parent dir
    (or subvolume). This could be done in an asynchronous way such
    as we do for dir intex items (delayed-inode.c), which could help
    reduce the file creation latency;

    *) With properties, we now have larger fs trees. For this particular
    test each xattr item uses 75 bytes of leaf space in the fs tree.
    This could be less by using a new item for xattr items, instead of
    the current btrfs_dir_item, since we could cut the 'location' and
    'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
    total of 26 bytes per xattr item) from the btrfs_dir_item type.

    Also tried batching the xattr insertions (ignoring proper hash
    collision handling, since it didn't exist) when creating files that
    inherit properties from their parent inode/subvolume, but the end
    results were (surprisingly) essentially the same.

    Test script:

    $ cat test.pl
    #!/usr/bin/perl -w

    use strict;
    use Time::HiRes qw(time);
    use constant NUM_FILES => 10_000;
    use constant FILE_SIZES => (64 * 1024);
    use constant DEV => '/dev/sdb4';
    use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
    use constant TEST_DIR => (MNT_POINT . '/testdir');

    system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";

    # following line for testing without properties
    #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";

    # following 2 lines for testing with properties
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
    system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";

    system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
    my ($t1, $t2);

    $t1 = time();
    for (my $i = 1; $i autoflush(1);
    for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
    print $f ('A' x 4096) or die "Error writing to file!";
    }
    close($f);
    }
    $t2 = time();
    print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";

    $t1 = time();
    system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
    $t2 = time();
    print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana