22 Jun, 2017

1 commit

  • Originally, verify_dir_item verifies name_len of dir_item with fixed
    values but not item boundary.
    If corrupted name_len was not bigger than the fixed value, for example
    255, the function will think the dir_item is fine. And then reading
    beyond boundary will cause crash.

    Example:
    1. Corrupt one dir_item name_len to be 255.
    2. Run 'ls -lar /mnt/test/ > /dev/null'
    dmesg:
    [ 48.451449] BTRFS info (device vdb1): disk space caching is enabled
    [ 48.451453] BTRFS info (device vdb1): has skinny extents
    [ 48.489420] general protection fault: 0000 [#1] SMP
    [ 48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq
    [ 48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5
    [ 48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
    [ 48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000
    [ 48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs]
    [ 48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202
    [ 48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000
    [ 48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368
    [ 48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000
    [ 48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288
    [ 48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20
    [ 48.490008] FS: 00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
    [ 48.490008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0
    [ 48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 48.490008] Call Trace:
    [ 48.490008] btrfs_real_readdir+0x3b7/0x4a0 [btrfs]
    [ 48.490008] iterate_dir+0x181/0x1b0
    [ 48.490008] SyS_getdents+0xa7/0x150
    [ 48.490008] ? fillonedir+0x150/0x150
    [ 48.490008] entry_SYSCALL_64_fastpath+0x18/0xad
    [ 48.490008] RIP: 0033:0x7fef5032546b
    [ 48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
    [ 48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b
    [ 48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003
    [ 48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000
    [ 48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38
    [ 48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e
    [ 48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98
    [ 48.499455] ---[ end trace 321920d8e8339505 ]---

    Fix it by adding a parameter @slot and check name_len with item boundary
    by calling btrfs_is_name_len_valid.

    Signed-off-by: Su Yue
    rev
    Signed-off-by: David Sterba

    Su Yue
     

14 Feb, 2017

2 commits

  • This goes as a separate patch because fixing that inside the patches
    caused too many many conflicts.

    Signed-off-by: David Sterba

    David Sterba
     
  • Currently btrfs_ino takes a struct inode and this causes a lot of
    internal btrfs functions which consume this ino to take a VFS inode,
    rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
    of VFS structs into the internals of btrfs first it's necessary to
    eliminate all uses of struct inode for the purpose of inode. This patch
    does that by using BTRFS_I to convert an inode to btrfs_inode. With
    this problem eliminated subsequent patches will start eliminating the
    passing of struct inode altogether, eventually resulting in a lot cleaner
    code.

    Signed-off-by: Nikolay Borisov
    [ fix btrfs_get_extent tracepoint prototype ]
    Signed-off-by: David Sterba

    Nikolay Borisov
     

06 Dec, 2016

3 commits


28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

28 May, 2016

1 commit


18 May, 2016

1 commit

  • The btrfs_{set,remove}xattr inode operations check for a read-only root
    (btrfs_root_readonly) before calling into generic_{set,remove}xattr. If
    this check is moved into __btrfs_setxattr, we can get rid of
    btrfs_{set,remove}xattr.

    This patch applies to mainline, I would like to keep it together with
    the other xattr cleanups if possible, though. Could you please review?

    Thanks,
    Andreas

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

11 Apr, 2016

1 commit


02 Mar, 2016

1 commit

  • In the listxattrs handler, we were not listing all the xattrs that are
    packed in the same btree item, which happens when multiple xattrs have
    a name that when crc32c hashed produce the same checksum value.

    Fix this by processing them all.

    The following test case for xfstests reproduces the issue:

    seq=`basename $0`
    seqres=$RESULT_DIR/$seq
    echo "QA output created by $seq"
    tmp=/tmp/$$
    status=1 # failure is the default!
    trap "_cleanup; exit \$status" 0 1 2 3 15

    _cleanup()
    {
    cd /
    rm -f $tmp.*
    }

    # get standard environment, filters and checks
    . ./common/rc
    . ./common/filter
    . ./common/attr

    # real QA test starts here
    _supported_fs generic
    _supported_os Linux
    _require_scratch
    _require_attrs

    rm -f $seqres.full

    _scratch_mkfs >>$seqres.full 2>&1
    _scratch_mount

    # Create our test file with a few xattrs. The first 3 xattrs have a name
    # that when given as input to a crc32c function result in the same checksum.
    # This made btrfs list only one of the xattrs through listxattrs system call
    # (because it packs xattrs with the same name checksum into the same btree
    # item).
    touch $SCRATCH_MNT/testfile
    $SETFATTR_PROG -n user.foobar -v 123 $SCRATCH_MNT/testfile
    $SETFATTR_PROG -n user.WvG1c1Td -v qwerty $SCRATCH_MNT/testfile
    $SETFATTR_PROG -n user.J3__T_Km3dVsW_ -v hello $SCRATCH_MNT/testfile
    $SETFATTR_PROG -n user.something -v pizza $SCRATCH_MNT/testfile
    $SETFATTR_PROG -n user.ping -v pong $SCRATCH_MNT/testfile

    # Now call getfattr with --dump, which calls the listxattrs system call.
    # It should list all the xattrs we have set before.
    $GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/testfile | _filter_scratch

    status=0
    exit

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

18 Feb, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_fs_time() instead.

    Signed-off-by: Deepa Dinamani
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: linux-btrfs@vger.kernel.org
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Deepa Dinamani
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

19 Jan, 2016

1 commit

  • Pull btrfs updates from Chris Mason:
    "This has our usual assortment of fixes and cleanups, but the biggest
    change included is Omar Sandoval's free space tree. It's not the
    default yet, mounting -o space_cache=v2 enables it and sets a readonly
    compat bit. The tree can actually be deleted and regenerated if there
    are any problems, but it has held up really well in testing so far.

    For very large filesystems (30T+) our existing free space caching code
    can end up taking a huge amount of time during commits. The new tree
    based code is faster and less work overall to update as the commit
    progresses.

    Omar worked on this during the summer and we'll hammer on it in
    production here at FB over the next few months"

    * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (73 commits)
    Btrfs: fix fitrim discarding device area reserved for boot loader's use
    Btrfs: Check metadata redundancy on balance
    btrfs: statfs: report zero available if metadata are exhausted
    btrfs: preallocate path for snapshot creation at ioctl time
    btrfs: allocate root item at snapshot ioctl time
    btrfs: do an allocation earlier during snapshot creation
    btrfs: use smaller type for btrfs_path locks
    btrfs: use smaller type for btrfs_path lowest_level
    btrfs: use smaller type for btrfs_path reada
    btrfs: cleanup, use enum values for btrfs_path reada
    btrfs: constify static arrays
    btrfs: constify remaining structs with function pointers
    btrfs tests: replace whole ops structure for free space tests
    btrfs: use list_for_each_entry* in backref.c
    btrfs: use list_for_each_entry_safe in free-space-cache.c
    btrfs: use list_for_each_entry* in check-integrity.c
    Btrfs: use linux/sizes.h to represent constants
    btrfs: cleanup, remove stray return statements
    btrfs: zero out delayed node upon allocation
    btrfs: pass proper enum type to start_transaction()
    ...

    Linus Torvalds
     

11 Jan, 2016

1 commit


07 Jan, 2016

1 commit


07 Dec, 2015

1 commit


03 Dec, 2015

1 commit


10 Nov, 2015

1 commit

  • When listing a inode's xattrs we have a time window where we race against
    a concurrent operation for adding a new hard link for our inode that makes
    us not return any xattr to user space. In order for this to happen, the
    first xattr of our inode needs to be at slot 0 of a leaf and the previous
    leaf must still have room for an inode ref (or extref) item, and this can
    happen because an inode's listxattrs callback does not lock the inode's
    i_mutex (nor does the VFS does it for us), but adding a hard link to an
    inode makes the VFS lock the inode's i_mutex before calling the inode's
    link callback.

    If we have the following leafs:

    Leaf X (has N items) Leaf Y

    [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ]
    slot N - 2 slot N - 1 slot 0

    The race illustrated by the following sequence diagram is possible:

    CPU 1 CPU 2

    btrfs_listxattr()

    searches for key (257 XATTR_ITEM 0)

    gets path with path->nodes[0] == leaf X
    and path->slots[0] == N

    because path->slots[0] is >=
    btrfs_header_nritems(leaf X), it calls
    btrfs_next_leaf()

    btrfs_next_leaf()
    releases the path

    adds key (257 INODE_REF 666)
    to the end of leaf X (slot N),
    and leaf X now has N + 1 items

    searches for the key (257 INODE_REF 256),
    with path->keep_locks == 1, because that
    is the last key it saw in leaf X before
    releasing the path

    ends up at leaf X again and it verifies
    that the key (257 INODE_REF 256) is no
    longer the last key in leaf X, so it
    returns with path->nodes[0] == leaf X
    and path->slots[0] == N, pointing to
    the new item with key (257 INODE_REF 666)

    btrfs_listxattr's loop iteration sees that
    the type of the key pointed by the path is
    different from the type BTRFS_XATTR_ITEM_KEY
    and so it breaks the loop and stops looking
    for more xattr items
    --> the application doesn't get any xattr
    listed for our inode

    So fix this by breaking the loop only if the key's type is greater than
    BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.

    Cc: stable@vger.kernel.org
    Signed-off-by: Filipe Manana

    Filipe Manana
     

27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

16 Apr, 2015

1 commit


27 Mar, 2015

1 commit

  • Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
    works:

    $ touch file
    $ setfattr -n user. -v 1 file
    $ getfattr -d file
    user.="1"

    ie. the missing attribute name after the namespace.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291
    Reported-by: William Douglas
    CC: # 2.6.29+
    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     

03 Mar, 2015

1 commit


21 Nov, 2014

1 commit

  • Replacing a xattr consists of doing a lookup for its existing value, delete
    the current value from the respective leaf, release the search path and then
    finally insert the new value. This leaves a time window where readers (getxattr,
    listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
    so this has security implications.

    This change also fixes 2 other existing issues which were:

    *) Deleting the old xattr value without verifying first if the new xattr will
    fit in the existing leaf item (in case multiple xattrs are packed in the
    same item due to name hash collision);

    *) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
    exist but we have have an existing item that packs muliple xattrs with
    the same name hash as the input xattr. In this case we should return ENOSPC.

    A test case for xfstests follows soon.

    Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
    implementation.

    Reported-by: Alexandre Oliva
    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

18 Sep, 2014

1 commit


31 Jan, 2014

1 commit

  • Pull btrfs updates from Chris Mason:
    "This is a pretty big pull, and most of these changes have been
    floating in btrfs-next for a long time. Filipe's properties work is a
    cool building block for inheriting attributes like compression down on
    a per inode basis.

    Jeff Mahoney kicked in code to export filesystem info into sysfs.

    Otherwise, lots of performance improvements, cleanups and bug fixes.

    Looks like there are still a few other small pending incrementals, but
    I wanted to get the bulk of this in first"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
    Btrfs: fix spin_unlock in check_ref_cleanup
    Btrfs: setup inode location during btrfs_init_inode_locked
    Btrfs: don't use ram_bytes for uncompressed inline items
    Btrfs: fix btrfs_search_slot_for_read backwards iteration
    Btrfs: do not export ulist functions
    Btrfs: rework ulist with list+rb_tree
    Btrfs: fix memory leaks on walking backrefs failure
    Btrfs: fix send file hole detection leading to data corruption
    Btrfs: add a reschedule point in btrfs_find_all_roots()
    Btrfs: make send's file extent item search more efficient
    Btrfs: fix to catch all errors when resolving indirect ref
    Btrfs: fix protection between walking backrefs and root deletion
    btrfs: fix warning while merging two adjacent extents
    Btrfs: fix infinite path build loops in incremental send
    btrfs: undo sysfs when open_ctree() fails
    Btrfs: fix snprintf usage by send's gen_unique_name
    btrfs: fix defrag 32-bit integer overflow
    btrfs: sysfs: list the NO_HOLES feature
    btrfs: sysfs: don't show reserved incompat feature
    btrfs: call permission checks earlier in ioctls and return EPERM
    ...

    Linus Torvalds
     

29 Jan, 2014

1 commit

  • This change adds infrastructure to allow for generic properties for
    inodes. Properties are name/value pairs that can be associated with
    inodes for different purposes. They are stored as xattrs with the
    prefix "btrfs."

    Properties can be inherited - this means when a directory inode has
    inheritable properties set, these are added to new inodes created
    under that directory. Further, subvolumes can also have properties
    associated with them, and they can be inherited from their parent
    subvolume. Naturally, directory properties have priority over subvolume
    properties (in practice a subvolume property is just a regular
    property associated with the root inode, objectid 256, of the
    subvolume's fs tree).

    This change also adds one specific property implementation, named
    "compression", whose values can be "lzo" or "zlib" and it's an
    inheritable property.

    The corresponding changes to btrfs-progs were also implemented.
    A patch with xfstests for this feature will follow once there's
    agreement on this change/feature.

    Further, the script at the bottom of this commit message was used to
    do some benchmarks to measure any performance penalties of this feature.

    Basically the tests correspond to:

    Test 1 - create a filesystem and mount it with compress-force=lzo,
    then sequentially create N files of 64Kb each, measure how long it took
    to create the files, unmount the filesystem, mount the filesystem and
    perform an 'ls -lha' against the test directory holding the N files, and
    report the time the command took.

    Test 2 - create a filesystem and don't use any compression option when
    mounting it - instead set the compression property of the subvolume's
    root to 'lzo'. Then create N files of 64Kb, and report the time it took.
    The unmount the filesystem, mount it again and perform an 'ls -lha' like
    in the former test. This means every single file ends up with a property
    (xattr) associated to it.

    Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
    compression property, have no real effect other than adding more work
    when inheriting properties and taking more btree leaf space.

    Test 4 - same as test 3 but with 10 properties per file.

    Results (in seconds, and averages of 5 runs each), for different N
    numbers of files follow.

    * Without properties (test 1)

    file creation time ls -lha time
    10 000 files 3.49 0.76
    100 000 files 47.19 8.37
    1 000 000 files 518.51 107.06

    * With 1 property (compression property set to lzo - test 2)

    file creation time ls -lha time
    10 000 files 3.63 0.93
    100 000 files 48.56 9.74
    1 000 000 files 537.72 125.11

    * With 4 properties (test 3)

    file creation time ls -lha time
    10 000 files 3.94 1.20
    100 000 files 52.14 11.48
    1 000 000 files 572.70 142.13

    * With 10 properties (test 4)

    file creation time ls -lha time
    10 000 files 4.61 1.35
    100 000 files 58.86 13.83
    1 000 000 files 656.01 177.61

    The increased latencies with properties are essencialy because of:

    *) When creating an inode, we now synchronously write 1 more item
    (an xattr item) for each property inherited from the parent dir
    (or subvolume). This could be done in an asynchronous way such
    as we do for dir intex items (delayed-inode.c), which could help
    reduce the file creation latency;

    *) With properties, we now have larger fs trees. For this particular
    test each xattr item uses 75 bytes of leaf space in the fs tree.
    This could be less by using a new item for xattr items, instead of
    the current btrfs_dir_item, since we could cut the 'location' and
    'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
    total of 26 bytes per xattr item) from the btrfs_dir_item type.

    Also tried batching the xattr insertions (ignoring proper hash
    collision handling, since it didn't exist) when creating files that
    inherit properties from their parent inode/subvolume, but the end
    results were (surprisingly) essentially the same.

    Test script:

    $ cat test.pl
    #!/usr/bin/perl -w

    use strict;
    use Time::HiRes qw(time);
    use constant NUM_FILES => 10_000;
    use constant FILE_SIZES => (64 * 1024);
    use constant DEV => '/dev/sdb4';
    use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
    use constant TEST_DIR => (MNT_POINT . '/testdir');

    system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";

    # following line for testing without properties
    #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";

    # following 2 lines for testing with properties
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
    system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";

    system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
    my ($t1, $t2);

    $t1 = time();
    for (my $i = 1; $i autoflush(1);
    for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
    print $f ('A' x 4096) or die "Error writing to file!";
    }
    close($f);
    }
    $t2 = time();
    print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";

    $t1 = time();
    system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
    $t2 = time();
    print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     

26 Jan, 2014

1 commit


07 May, 2013

1 commit

  • Big patch, but all it does is add statics to functions which
    are in fact static, then remove the associated dead-code fallout.

    removed functions:

    btrfs_iref_to_path()
    __btrfs_lookup_delayed_deletion_item()
    __btrfs_search_delayed_insertion_item()
    __btrfs_search_delayed_deletion_item()
    find_eb_for_page()
    btrfs_find_block_group()
    range_straddles_pages()
    extent_range_uptodate()
    btrfs_file_extent_length()
    btrfs_scrub_cancel_devid()
    btrfs_start_transaction_lflush()

    btrfs_print_tree() is left because it is used for debugging.
    btrfs_start_transaction_lflush() and btrfs_reada_detach() are
    left for symmetry.

    ulist.c functions are left, another patch will take care of those.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Josef Bacik

    Eric Sandeen
     

17 Dec, 2012

3 commits


30 May, 2012

1 commit

  • We've been keeping around the inode sequence number in hopes that somebody
    would use it, but nobody uses it and people actually use i_version which
    serves the same purpose, so use i_version where we used the incore inode's
    sequence number and that way the sequence is updated properly across the
    board, and not just in file write. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

17 Jan, 2012

1 commit

  • A user reported a problem where things like open with O_CREAT would take up to
    30 seconds when he had nfs activity on the same mount. This is because all of
    our quick metadata operations, like create, symlink etc all do
    btrfs_end_transaction_throttle, which if the transaction is blocked will wait
    for the commit to complete before it returns. This adds a ridiculous amount of
    latency and isn't really needed. The normal btrfs_end_transaction will mark the
    transaction as blocked and wake the transaction kthread up if it thinks the
    transaction needs to end (this being in the running out of global reserve space
    scenario), and this is all that is really needed since we've already done
    everything we're going to do, we just need to return. This should help people
    with the latency they were seeing when using synchronous heavy workloads.
    Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

07 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
    Btrfs: check for a null fs root when writing to the backup root log
    Btrfs: fix race during transaction joins
    Btrfs: fix a potential btrfs_bio leak on scrub fixups
    Btrfs: rename btrfs_bio multi -> bbio for consistency
    Btrfs: stop leaking btrfs_bios on readahead
    Btrfs: stop the readahead threads on failed mount
    Btrfs: fix extent_buffer leak in the metadata IO error handling
    Btrfs: fix the new inspection ioctls for 32 bit compat
    Btrfs: fix delayed insertion reservation
    Btrfs: ClearPageError during writepage and clean_tree_block
    Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
    Btrfs: make a delayed_block_rsv for the delayed item insertion
    Btrfs: add a log of past tree roots
    btrfs: separate superblock items out of fs_info
    Btrfs: use the global reserve when truncating the free space cache inode
    Btrfs: release metadata from global reserve if we have to fallback for unlink
    Btrfs: make sure to flush queued bios if write_cache_pages waits
    Btrfs: fix extent pinning bugs in the tree log
    Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
    Btrfs: don't wait as long for more batches during SSD log commit
    ...

    Linus Torvalds
     

25 Oct, 2011

1 commit

  • * 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits)
    TOMOYO: Fix incomplete read after seek.
    Smack: allow to access /smack/access as normal user
    TOMOYO: Fix unused kernel config option.
    Smack: fix: invalid length set for the result of /smack/access
    Smack: compilation fix
    Smack: fix for /smack/access output, use string instead of byte
    Smack: domain transition protections (v3)
    Smack: Provide information for UDS getsockopt(SO_PEERCRED)
    Smack: Clean up comments
    Smack: Repair processing of fcntl
    Smack: Rule list lookup performance
    Smack: check permissions from user space (v2)
    TOMOYO: Fix quota and garbage collector.
    TOMOYO: Remove redundant tasklist_lock.
    TOMOYO: Fix domain transition failure warning.
    TOMOYO: Remove tomoyo_policy_memory_lock spinlock.
    TOMOYO: Simplify garbage collector.
    TOMOYO: Fix make namespacecheck warnings.
    target: check hex2bin result
    encrypted-keys: check hex2bin result
    ...

    Linus Torvalds
     

20 Oct, 2011

1 commit

  • Recently I changed the xattr stuff to unconditionally set the xattr first in
    case the xattr didn't exist yet. This has introduced a regression when setting
    an xattr that already exists with a large value. If we find the key we are
    looking for split_leaf will assume that we're extending that item. The problem
    is the size we pass down to btrfs_search_slot includes the size of the item
    already, so if we have the largest xattr we can possibly have plus the size of
    the xattr item plus the xattr item that btrfs_search_slot we'd overflow the
    leaf. Thankfully this is not what we're doing, but split_leaf doesn't know this
    so it just returns EOVERFLOW. So in the xattr code we need to check and see if
    we got back EOVERFLOW and treat it like EEXIST since that's really what
    happened. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

11 Sep, 2011

1 commit


09 Aug, 2011

1 commit


19 Jul, 2011

1 commit

  • This patch changes the security_inode_init_security API by adding a
    filesystem specific callback to write security extended attributes.
    This change is in preparation for supporting the initialization of
    multiple LSM xattrs and the EVM xattr. Initially the callback function
    walks an array of xattrs, writing each xattr separately, but could be
    optimized to write multiple xattrs at once.

    For existing security_inode_init_security() calls, which have not yet
    been converted to use the new callback function, such as those in
    reiserfs and ocfs2, this patch defines security_old_inode_init_security().

    Signed-off-by: Mimi Zohar

    Mimi Zohar