Eric Lee / smarc-fsl-linux-kernel

22 Jun, 2017

1 commit

e79a33270 btrfs: Check name_len with boundary in verify dir_item ... Browse Code »

Originally, verify_dir_item verifies name_len of dir_item with fixed
values but not item boundary.
If corrupted name_len was not bigger than the fixed value, for example
255, the function will think the dir_item is fine. And then reading
beyond boundary will cause crash.

Example:
1. Corrupt one dir_item name_len to be 255.
2. Run 'ls -lar /mnt/test/ > /dev/null'
dmesg:
[ 48.451449] BTRFS info (device vdb1): disk space caching is enabled
[ 48.451453] BTRFS info (device vdb1): has skinny extents
[ 48.489420] general protection fault: 0000 [#1] SMP
[ 48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq
[ 48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5
[ 48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
[ 48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000
[ 48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs]
[ 48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202
[ 48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000
[ 48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368
[ 48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000
[ 48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288
[ 48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20
[ 48.490008] FS: 00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
[ 48.490008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0
[ 48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 48.490008] Call Trace:
[ 48.490008] btrfs_real_readdir+0x3b7/0x4a0 [btrfs]
[ 48.490008] iterate_dir+0x181/0x1b0
[ 48.490008] SyS_getdents+0xa7/0x150
[ 48.490008] ? fillonedir+0x150/0x150
[ 48.490008] entry_SYSCALL_64_fastpath+0x18/0xad
[ 48.490008] RIP: 0033:0x7fef5032546b
[ 48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
[ 48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b
[ 48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003
[ 48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000
[ 48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38
[ 48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e
[ 48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98
[ 48.499455] ---[ end trace 321920d8e8339505 ]---

Fix it by adding a parameter @slot and check name_len with item boundary
by calling btrfs_is_name_len_valid.

Signed-off-by: Su Yue
rev
Signed-off-by: David Sterba

Su Yue
2017-06-22 01:16:04 +0800

14 Feb, 2017

2 commits

f85b7379c btrfs: fix over-80 lines introduced by previous cleanups ... Browse Code »

This goes as a separate patch because fixing that inside the patches
caused too many many conflicts.

Signed-off-by: David Sterba

David Sterba
2017-02-14 22:50:57 +0800
4a0cc7ca6 btrfs: Make btrfs_ino take a struct btrfs_inode ... Browse Code »

Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.

Signed-off-by: Nikolay Borisov
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba

Nikolay Borisov
2017-02-14 22:50:51 +0800

06 Dec, 2016

3 commits

3a45bb207 btrfs: remove root parameter from transaction commit/end routines ... Browse Code »

Now we only use the root parameter to print the root objectid in
a tracepoint. We can use the root parameter from the transaction
handle for that. It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:07:00 +0800
2ff7e61e0 btrfs: take an fs_info directly when the root is not used otherwise ... Browse Code »

There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer. Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:59 +0800
da17066c4 btrfs: pull node/sector/stripe sizes out of root and into fs_info ... Browse Code »

We track the node sizes per-root, but they never vary from the values
in the superblock. This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:58 +0800

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

28 May, 2016

1 commit

593012268 switch xattr_handler->set() to passing dentry and inode separately ... Browse Code »

preparation for similar switch in ->setxattr() (see the next commit for
rationale).

Signed-off-by: Al Viro

Al Viro
2016-05-28 03:39:43 +0800

18 May, 2016

1 commit

e0d46f5c6 btrfs: Switch to generic xattr handlers ... Browse Code »

The btrfs_{set,remove}xattr inode operations check for a read-only root
(btrfs_root_readonly) before calling into generic_{set,remove}xattr. If
this check is moved into __btrfs_setxattr, we can get rid of
btrfs_{set,remove}xattr.

This patch applies to mainline, I would like to keep it together with
the other xattr cleanups if possible, though. Could you please review?

Thanks,
Andreas

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2016-05-18 07:17:09 +0800

11 Apr, 2016

1 commit

b296821a7 xattr_handler: pass dentry and inode as separate arguments of ->get() ... Browse Code »

... and do not assume they are already attached to each other

Signed-off-by: Al Viro

Al Viro
2016-04-11 08:48:24 +0800

02 Mar, 2016

1 commit

daac7ba61 Btrfs: fix listxattrs not listing all xattrs packed in the same item ... Browse Code »

In the listxattrs handler, we were not listing all the xattrs that are
packed in the same btree item, which happens when multiple xattrs have
a name that when crc32c hashed produce the same checksum value.

Fix this by processing them all.

The following test case for xfstests reproduces the issue:

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
cd /
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/attr

# real QA test starts here
_supported_fs generic
_supported_os Linux
_require_scratch
_require_attrs

rm -f $seqres.full

_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount

# Create our test file with a few xattrs. The first 3 xattrs have a name
# that when given as input to a crc32c function result in the same checksum.
# This made btrfs list only one of the xattrs through listxattrs system call
# (because it packs xattrs with the same name checksum into the same btree
# item).
touch $SCRATCH_MNT/testfile
$SETFATTR_PROG -n user.foobar -v 123 $SCRATCH_MNT/testfile
$SETFATTR_PROG -n user.WvG1c1Td -v qwerty $SCRATCH_MNT/testfile
$SETFATTR_PROG -n user.J3__T_Km3dVsW_ -v hello $SCRATCH_MNT/testfile
$SETFATTR_PROG -n user.something -v pizza $SCRATCH_MNT/testfile
$SETFATTR_PROG -n user.ping -v pong $SCRATCH_MNT/testfile

# Now call getfattr with --dump, which calls the listxattrs system call.
# It should list all the xattrs we have set before.
$GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/testfile | _filter_scratch

status=0
exit

Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2016-03-02 00:23:41 +0800

18 Feb, 2016

1 commit

04b285f35 btrfs: Replace CURRENT_TIME by current_fs_time() ... Browse Code »

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani
Cc: Chris Mason
Cc: Josef Bacik
Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Deepa Dinamani
2016-02-18 18:46:03 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

19 Jan, 2016

1 commit

c1a198d92 Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"This has our usual assortment of fixes and cleanups, but the biggest
change included is Omar Sandoval's free space tree. It's not the
default yet, mounting -o space_cache=v2 enables it and sets a readonly
compat bit. The tree can actually be deleted and regenerated if there
are any problems, but it has held up really well in testing so far.

For very large filesystems (30T+) our existing free space caching code
can end up taking a huge amount of time during commits. The new tree
based code is faster and less work overall to update as the commit
progresses.

Omar worked on this during the summer and we'll hammer on it in
production here at FB over the next few months"

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (73 commits)
Btrfs: fix fitrim discarding device area reserved for boot loader's use
Btrfs: Check metadata redundancy on balance
btrfs: statfs: report zero available if metadata are exhausted
btrfs: preallocate path for snapshot creation at ioctl time
btrfs: allocate root item at snapshot ioctl time
btrfs: do an allocation earlier during snapshot creation
btrfs: use smaller type for btrfs_path locks
btrfs: use smaller type for btrfs_path lowest_level
btrfs: use smaller type for btrfs_path reada
btrfs: cleanup, use enum values for btrfs_path reada
btrfs: constify static arrays
btrfs: constify remaining structs with function pointers
btrfs tests: replace whole ops structure for free space tests
btrfs: use list_for_each_entry* in backref.c
btrfs: use list_for_each_entry_safe in free-space-cache.c
btrfs: use list_for_each_entry* in check-integrity.c
Btrfs: use linux/sizes.h to represent constants
btrfs: cleanup, remove stray return statements
btrfs: zero out delayed node upon allocation
btrfs: pass proper enum type to start_transaction()
...

Linus Torvalds
2016-01-19 04:44:40 +0800

11 Jan, 2016

1 commit

a3058101c Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kda… ... Browse Code »

…ve/linux into for-linus-4.5

Chris Mason
2016-01-11 21:59:32 +0800

07 Jan, 2016

1 commit

e4058b54d btrfs: cleanup, use enum values for btrfs_path reada ... Browse Code »

Replace the integers by enums for better readability. The value 2 does
not have any meaning since a717531942f488209dded30f6bc648167bcefa72
"Btrfs: do less aggressive btree readahead" (2009-01-22).

Signed-off-by: David Sterba

David Sterba
2016-01-07 22:01:15 +0800

07 Dec, 2015

1 commit

9172abbcd btrfs: Use xattr handler infrastructure ... Browse Code »

Use the VFS xattr handler infrastructure and get rid of similar code in
the filesystem.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: David Sterba
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-07 10:34:14 +0800

03 Dec, 2015

1 commit

39a27ec10 btrfs: use GFP_KERNEL for xattr and acl allocations ... Browse Code »

We don't have to use GFP_NOFS in context of ACL or XATTR actions, not
possible to loop through the allocator and it's safe to fail with
ENOMEM.

Signed-off-by: David Sterba

David Sterba
2015-12-03 22:03:44 +0800

10 Nov, 2015

1 commit

f1cd1f0b7 Btrfs: fix race when listing an inode's xattrs ... Browse Code »

When listing a inode's xattrs we have a time window where we race against
a concurrent operation for adding a new hard link for our inode that makes
us not return any xattr to user space. In order for this to happen, the
first xattr of our inode needs to be at slot 0 of a leaf and the previous
leaf must still have room for an inode ref (or extref) item, and this can
happen because an inode's listxattrs callback does not lock the inode's
i_mutex (nor does the VFS does it for us), but adding a hard link to an
inode makes the VFS lock the inode's i_mutex before calling the inode's
link callback.

If we have the following leafs:

Leaf X (has N items) Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ]
slot N - 2 slot N - 1 slot 0

The race illustrated by the following sequence diagram is possible:

CPU 1 CPU 2

btrfs_listxattr()

searches for key (257 XATTR_ITEM 0)

gets path with path->nodes[0] == leaf X
and path->slots[0] == N

because path->slots[0] is >=
btrfs_header_nritems(leaf X), it calls
btrfs_next_leaf()

btrfs_next_leaf()
releases the path

adds key (257 INODE_REF 666)
to the end of leaf X (slot N),
and leaf X now has N + 1 items

searches for the key (257 INODE_REF 256),
with path->keep_locks == 1, because that
is the last key it saw in leaf X before
releasing the path

ends up at leaf X again and it verifies
that the key (257 INODE_REF 256) is no
longer the last key in leaf X, so it
returns with path->nodes[0] == leaf X
and path->slots[0] == N, pointing to
the new item with key (257 INODE_REF 666)

btrfs_listxattr's loop iteration sees that
the type of the key pointed by the path is
different from the type BTRFS_XATTR_ITEM_KEY
and so it breaks the loop and stops looking
for more xattr items
--> the application doesn't get any xattr
listed for our inode

So fix this by breaking the loop only if the key's type is greater than
BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana

Filipe Manana
2015-11-10 02:34:40 +0800

27 Apr, 2015

1 commit

9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800

16 Apr, 2015

1 commit

2b0143b5c VFS: normal filesystems (and lustre): d_inode() annotations ... Browse Code »

that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:57 +0800

27 Mar, 2015

1 commit

3c3b04d10 btrfs: don't accept bare namespace as a valid xattr ... Browse Code »

Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
works:

$ touch file
$ setfattr -n user. -v 1 file
$ getfattr -d file
user.="1"

ie. the missing attribute name after the namespace.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291
Reported-by: William Douglas
CC: # 2.6.29+
Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2015-03-27 09:10:24 +0800

03 Mar, 2015

1 commit

5cdf83edb Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr ... Browse Code »

The return value from btrfs_lookup_xattr() can be a pointer encoding an
error, therefore deal with it. This fixes commit 5f5bc6b1e2d5
("Btrfs: make xattr replace operations atomic").

Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2015-03-03 06:04:45 +0800

21 Nov, 2014

1 commit

5f5bc6b1e Btrfs: make xattr replace operations atomic ... Browse Code »

Replacing a xattr consists of doing a lookup for its existing value, delete
the current value from the respective leaf, release the search path and then
finally insert the new value. This leaves a time window where readers (getxattr,
listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
so this has security implications.

This change also fixes 2 other existing issues which were:

*) Deleting the old xattr value without verifying first if the new xattr will
fit in the existing leaf item (in case multiple xattrs are packed in the
same item due to name hash collision);

*) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
exist but we have have an existing item that packs muliple xattrs with
the same name hash as the input xattr. In this case we should return ENOSPC.

A test case for xfstests follows soon.

Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
implementation.

Reported-by: Alexandre Oliva
Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-11-21 09:20:07 +0800

18 Sep, 2014

1 commit

962a298f3 btrfs: kill the key type accessor helpers ... Browse Code »

btrfs_set_key_type and btrfs_key_type are used inconsistently along with
open coded variants. Other members of btrfs_key are accessed directly
without any helpers anyway.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2014-09-18 04:37:12 +0800

31 Jan, 2014

1 commit

e7651b819 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"This is a pretty big pull, and most of these changes have been
floating in btrfs-next for a long time. Filipe's properties work is a
cool building block for inheriting attributes like compression down on
a per inode basis.

Jeff Mahoney kicked in code to export filesystem info into sysfs.

Otherwise, lots of performance improvements, cleanups and bug fixes.

Looks like there are still a few other small pending incrementals, but
I wanted to get the bulk of this in first"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
Btrfs: fix spin_unlock in check_ref_cleanup
Btrfs: setup inode location during btrfs_init_inode_locked
Btrfs: don't use ram_bytes for uncompressed inline items
Btrfs: fix btrfs_search_slot_for_read backwards iteration
Btrfs: do not export ulist functions
Btrfs: rework ulist with list+rb_tree
Btrfs: fix memory leaks on walking backrefs failure
Btrfs: fix send file hole detection leading to data corruption
Btrfs: add a reschedule point in btrfs_find_all_roots()
Btrfs: make send's file extent item search more efficient
Btrfs: fix to catch all errors when resolving indirect ref
Btrfs: fix protection between walking backrefs and root deletion
btrfs: fix warning while merging two adjacent extents
Btrfs: fix infinite path build loops in incremental send
btrfs: undo sysfs when open_ctree() fails
Btrfs: fix snprintf usage by send's gen_unique_name
btrfs: fix defrag 32-bit integer overflow
btrfs: sysfs: list the NO_HOLES feature
btrfs: sysfs: don't show reserved incompat feature
btrfs: call permission checks earlier in ioctls and return EPERM
...

Linus Torvalds
2014-01-31 12:08:20 +0800

29 Jan, 2014

1 commit

63541927c Btrfs: add support for inode properties ... Browse Code »

This change adds infrastructure to allow for generic properties for
inodes. Properties are name/value pairs that can be associated with
inodes for different purposes. They are stored as xattrs with the
prefix "btrfs."

Properties can be inherited - this means when a directory inode has
inheritable properties set, these are added to new inodes created
under that directory. Further, subvolumes can also have properties
associated with them, and they can be inherited from their parent
subvolume. Naturally, directory properties have priority over subvolume
properties (in practice a subvolume property is just a regular
property associated with the root inode, objectid 256, of the
subvolume's fs tree).

This change also adds one specific property implementation, named
"compression", whose values can be "lzo" or "zlib" and it's an
inheritable property.

The corresponding changes to btrfs-progs were also implemented.
A patch with xfstests for this feature will follow once there's
agreement on this change/feature.

Further, the script at the bottom of this commit message was used to
do some benchmarks to measure any performance penalties of this feature.

Basically the tests correspond to:

Test 1 - create a filesystem and mount it with compress-force=lzo,
then sequentially create N files of 64Kb each, measure how long it took
to create the files, unmount the filesystem, mount the filesystem and
perform an 'ls -lha' against the test directory holding the N files, and
report the time the command took.

Test 2 - create a filesystem and don't use any compression option when
mounting it - instead set the compression property of the subvolume's
root to 'lzo'. Then create N files of 64Kb, and report the time it took.
The unmount the filesystem, mount it again and perform an 'ls -lha' like
in the former test. This means every single file ends up with a property
(xattr) associated to it.

Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
compression property, have no real effect other than adding more work
when inheriting properties and taking more btree leaf space.

Test 4 - same as test 3 but with 10 properties per file.

Results (in seconds, and averages of 5 runs each), for different N
numbers of files follow.

* Without properties (test 1)

file creation time ls -lha time
10 000 files 3.49 0.76
100 000 files 47.19 8.37
1 000 000 files 518.51 107.06

* With 1 property (compression property set to lzo - test 2)

file creation time ls -lha time
10 000 files 3.63 0.93
100 000 files 48.56 9.74
1 000 000 files 537.72 125.11

* With 4 properties (test 3)

file creation time ls -lha time
10 000 files 3.94 1.20
100 000 files 52.14 11.48
1 000 000 files 572.70 142.13

* With 10 properties (test 4)

file creation time ls -lha time
10 000 files 4.61 1.35
100 000 files 58.86 13.83
1 000 000 files 656.01 177.61

The increased latencies with properties are essencialy because of:

*) When creating an inode, we now synchronously write 1 more item
(an xattr item) for each property inherited from the parent dir
(or subvolume). This could be done in an asynchronous way such
as we do for dir intex items (delayed-inode.c), which could help
reduce the file creation latency;

*) With properties, we now have larger fs trees. For this particular
test each xattr item uses 75 bytes of leaf space in the fs tree.
This could be less by using a new item for xattr items, instead of
the current btrfs_dir_item, since we could cut the 'location' and
'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
total of 26 bytes per xattr item) from the btrfs_dir_item type.

Also tried batching the xattr insertions (ignoring proper hash
collision handling, since it didn't exist) when creating files that
inherit properties from their parent inode/subvolume, but the end
results were (surprisingly) essentially the same.

Test script:

$ cat test.pl
#!/usr/bin/perl -w

use strict;
use Time::HiRes qw(time);
use constant NUM_FILES => 10_000;
use constant FILE_SIZES => (64 * 1024);
use constant DEV => '/dev/sdb4';
use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
use constant TEST_DIR => (MNT_POINT . '/testdir');

system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";

# following line for testing without properties
#system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";

# following 2 lines for testing with properties
system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";

system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
my ($t1, $t2);

$t1 = time();
for (my $i = 1; $i autoflush(1);
for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
print $f ('A' x 4096) or die "Error writing to file!";
}
close($f);
}
$t2 = time();
print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
system("umount", DEV) == 0 or die "umount failed!";
system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";

$t1 = time();
system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
$t2 = time();
print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
system("umount", DEV) == 0 or die "umount failed!";

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2014-01-29 05:20:24 +0800

26 Jan, 2014

1 commit

996a710d4 btrfs: use generic posix ACL infrastructure ... Browse Code »

Also don't bother to set up a .get_acl method for symlinks as we do not
support access control (ACLs or even mode bits) for symlinks in Linux.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2014-01-26 12:58:18 +0800

07 May, 2013

1 commit

48a3b6366 btrfs: make static code static & remove dead code ... Browse Code »

Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.

Signed-off-by: Eric Sandeen
Signed-off-by: Josef Bacik

Eric Sandeen
2013-05-07 03:55:23 +0800

17 Dec, 2012

3 commits

e99761514 Btrfs: only log the inode item if we can get away with it ... Browse Code »

Currently we copy all the file information into the log, inode item, the
refs, xattrs etc. Except most of this doesn't change from fsync to fsync,
just the inode item changes. So set a flag if an xattr changes or a link is
added, and otherwise only log the inode item. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-12-17 09:46:21 +0800
01e6deb25 Btrfs: don't add a NULL extended attribute ... Browse Code »

Passing a null extended attribute value means to remove the attribute,
but we don't have to add a new NULL extended attribute.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-12-17 09:46:15 +0800
db2254bce Btrfs: fix an while-loop of listxattr ... Browse Code »

If we found an invalid xattr dir item, we'd better try the next one instead.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-12-17 09:46:07 +0800

30 May, 2012

1 commit

0c4d2d95d Btrfs: use i_version instead of our own sequence ... Browse Code »

We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-05-30 22:23:27 +0800

17 Jan, 2012

1 commit

7ad85bb76 Btrfs: do not use btrfs_end_transaction_throttle everywhere ... Browse Code »

A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount. This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns. This adds a ridiculous amount of
latency and isn't really needed. The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return. This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-17 04:28:54 +0800

07 Nov, 2011

1 commit

6a6662ced Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
Btrfs: check for a null fs root when writing to the backup root log
Btrfs: fix race during transaction joins
Btrfs: fix a potential btrfs_bio leak on scrub fixups
Btrfs: rename btrfs_bio multi -> bbio for consistency
Btrfs: stop leaking btrfs_bios on readahead
Btrfs: stop the readahead threads on failed mount
Btrfs: fix extent_buffer leak in the metadata IO error handling
Btrfs: fix the new inspection ioctls for 32 bit compat
Btrfs: fix delayed insertion reservation
Btrfs: ClearPageError during writepage and clean_tree_block
Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
Btrfs: make a delayed_block_rsv for the delayed item insertion
Btrfs: add a log of past tree roots
btrfs: separate superblock items out of fs_info
Btrfs: use the global reserve when truncating the free space cache inode
Btrfs: release metadata from global reserve if we have to fallback for unlink
Btrfs: make sure to flush queued bios if write_cache_pages waits
Btrfs: fix extent pinning bugs in the tree log
Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
Btrfs: don't wait as long for more batches during SSD log commit
...

Linus Torvalds
2011-11-07 12:03:41 +0800

25 Oct, 2011

1 commit

36b8d186e Merge branch 'next' of git://selinuxproject.org/~jmorris/linux-security ... Browse Code »

* 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits)
TOMOYO: Fix incomplete read after seek.
Smack: allow to access /smack/access as normal user
TOMOYO: Fix unused kernel config option.
Smack: fix: invalid length set for the result of /smack/access
Smack: compilation fix
Smack: fix for /smack/access output, use string instead of byte
Smack: domain transition protections (v3)
Smack: Provide information for UDS getsockopt(SO_PEERCRED)
Smack: Clean up comments
Smack: Repair processing of fcntl
Smack: Rule list lookup performance
Smack: check permissions from user space (v2)
TOMOYO: Fix quota and garbage collector.
TOMOYO: Remove redundant tasklist_lock.
TOMOYO: Fix domain transition failure warning.
TOMOYO: Remove tomoyo_policy_memory_lock spinlock.
TOMOYO: Simplify garbage collector.
TOMOYO: Fix make namespacecheck warnings.
target: check hex2bin result
encrypted-keys: check hex2bin result
...

Linus Torvalds
2011-10-25 15:45:31 +0800

20 Oct, 2011

1 commit

ed3ee9f44 Btrfs: fix regression in re-setting a large xattr ... Browse Code »

Recently I changed the xattr stuff to unconditionally set the xattr first in
case the xattr didn't exist yet. This has introduced a regression when setting
an xattr that already exists with a large value. If we find the key we are
looking for split_leaf will assume that we're extending that item. The problem
is the size we pass down to btrfs_search_slot includes the size of the item
already, so if we have the largest xattr we can possibly have plus the size of
the xattr item plus the xattr item that btrfs_search_slot we'd overflow the
leaf. Thankfully this is not what we're doing, but split_leaf doesn't know this
so it just returns EOVERFLOW. So in the xattr code we need to check and see if
we got back EOVERFLOW and treat it like EEXIST since that's really what
happened. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:56 +0800

11 Sep, 2011

1 commit

4815053ab btrfs: xattr: fix attribute removal ... Browse Code »

An attribute is not removed by 'setfattr -x attr file' and remains
visible in attr list. This makes xfstests/062 pass again.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-09-11 22:52:25 +0800

09 Aug, 2011

1 commit

5a2f3a02a Merge branch 'next-evm' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/ima-2.6 into next ... Browse Code »

Conflicts:
fs/attr.c

Resolve conflict manually.

Signed-off-by: James Morris

James Morris
2011-08-09 08:31:03 +0800

19 Jul, 2011

1 commit

9d8f13ba3 security: new security_inode_init_security API adds function callback ... Browse Code »

This patch changes the security_inode_init_security API by adding a
filesystem specific callback to write security extended attributes.
This change is in preparation for supporting the initialization of
multiple LSM xattrs and the EVM xattr. Initially the callback function
walks an array of xattrs, writing each xattr separately, but could be
optimized to write multiple xattrs at once.

For existing security_inode_init_security() calls, which have not yet
been converted to use the new callback function, such as those in
reiserfs and ocfs2, this patch defines security_old_inode_init_security().

Signed-off-by: Mimi Zohar

Mimi Zohar
2011-07-19 00:29:38 +0800