Doug / smarc-fsl-linux-kernel | Embedian Git Server

11 Oct, 2013

3 commits

14927d954 Btrfs: insert orphan roots into fs radix tree ... Browse Code »

Now we don't drop all the deleted snapshots/subvolumes before the space
balance. It means we have to relocate the space which is held by the dead
snapshots/subvolumes. So we must into them into fs radix tree, or we would
forget to commit the change of them when doing transaction commit, and it
would corrupt the metadata.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Miao Xie
2013-10-11 09:30:53 +0800
7bf811a59 Btrfs: limit delalloc pages outside of find_delalloc_range ... Browse Code »

Liu fixed part of this problem and unfortunately I steered him in slightly the
wrong direction and so didn't completely fix the problem. The problem is we
limit the size of the delalloc range we are looking for to max bytes and then we
try to lock that range. If we fail to lock the pages in that range we will
shrink the max bytes to a single page and re loop. However if our first page is
inside of the delalloc range then we will end up limiting the end of the range
to a period before our first page. This is illustrated below

[0 -------- delalloc range --------- 256mb]
[page]

So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
and then we will notice that delalloc_start < *start and adjust it up, but not
adjust delalloc_end up, so things go sideways. To fix this we need to not limit
the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
way we don't end up with this confusion. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-10-11 09:27:56 +0800
4871c1588 Btrfs: use right root when checking for hash collision ... Browse Code »

btrfs_rename was using the root of the old dir instead of the root of the new
dir when checking for a hash collision, so if you tried to move a file into a
subvol it would freak out because it would see the file you are trying to move
in its current root. This fixes the bug where this would fail

btrfs subvol create test1
btrfs subvol create test2
mv test1 test2.

Thanks to Chris Murphy for catching this,

Cc: stable@vger.kernel.org
Reported-by: Chris Murphy
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-10-11 09:27:45 +0800

05 Oct, 2013

4 commits

1357272fc Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing ... Browse Code »

free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents. Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens
Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik

Ilya Dryomov
2013-10-05 04:02:14 +0800
964fb15ac Btrfs: eliminate races in worker stopping code ... Browse Code »

The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop. The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

btrfs_stop_workers(): check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
working list
- removes the first worker from the
working list
- releases the lock to wait for
its kthread's completion
- grabs the lock
- if aworker is on the working list,
moves aworker from the working list
to the idle list
- releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
working list
......
btrfs_stop_workers returns, aworker is on the idle list
FS is umounted, memory is freed
......
aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba
Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik

Ilya Dryomov
2013-10-05 04:02:13 +0800
385fe0bed Btrfs: fix crash of compressed writes ... Browse Code »

The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca1cf7a974231b759197a1aebcf39c2a,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G O 3.11.0+ #8
[...]
RIP: 0010:[] [] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731] ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731] 0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731] ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731] [] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731] [] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731] [] ? detach_if_pending+0x22/0x4b
[ 4934.248731] [] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731] [] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731] [] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731] [] kthread+0x8d/0x95
[ 4934.248731] [] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] [] ret_from_fork+0x7c/0xb0
[ 4934.248731] [] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP [] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731] RSP
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-10-05 04:02:11 +0800
60e7cd3a4 Btrfs: fix transid verify errors when recovering log tree ... Browse Code »

If we crash with a log, remount and recover that log, and then crash before we
can commit another transaction we will get transid verify errors on the next
mount. This is because we were not zero'ing out the log when we committed the
transaction after recovery. This is ok as long as we commit another transaction
at some point in the future, but if you abort or something else goes wrong you
can end up in this weird state because the recovery stuff says that the tree log
should have a generation+1 of the super generation, which won't be the case of
the transaction that was started for recovery. Fix this by removing the check
and _always_ zero out the log portion of the super when we commit a transaction.
This fixes the transid verify issues I was seeing with my force errors tests.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-10-05 04:02:09 +0800

21 Sep, 2013

26 commits

94aebfb2e Btrfs: create the uuid tree on remount rw ... Browse Code »

Users have been complaining of the uuid tree stuff warning that there is no uuid
root when trying to do snapshot operations. This is because if you mount -o ro
we will not create the uuid tree. But then if you mount -o rw,remount we will
still not create it and then any subsequent snapshot/subvol operations you try
to do will fail gloriously. Fix this by creating the uuid_root on remount rw if
it was not already there. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:50:43 +0800
cbf8b8ca3 btrfs: change extent-same to copy entire argument struct ... Browse Code »

btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data
back to it's argument struct. Unfortunately, not all architectures provide
__put_user_unaligned(), so compiles break on them if btrfs is selected.

Instead, just copy the whole struct in / out at the start and end of
operations, respectively.

Signed-off-by: Mark Fasheh
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Mark Fasheh
2013-09-21 23:05:31 +0800
93fd63c2f Btrfs: dir_inode_operations should use btrfs_update_time also ... Browse Code »

Commit 2bc5565286121d2a77ccd728eb3484dff2035b58 (Btrfs: don't update atime on
RO subvolumes) ensures that the access time of an inode is not updated when
the inode lives in a read-only subvolume.
However, if a directory on a read-only subvolume is accessed, the atime is
updated. This results in a write operation to a read-only subvolume. I
believe that access times should never be updated on read-only subvolumes.

To reproduce:

# mkfs.btrfs -f /dev/dm-3
(...)
# mount /dev/dm-3 /mnt
# btrfs subvol create /mnt/sub
Create subvolume '/mnt/sub'
# mkdir /mnt/sub/dir
# echo "abc" > /mnt/sub/dir/file
# btrfs subvol snapshot -r /mnt/sub /mnt/rosnap
Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap'
# stat /mnt/rosnap/dir
File: `/mnt/rosnap/dir'
Size: 8 Blocks: 0 IO Block: 4096 directory
Device: 16h/22d Inode: 257 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-09-11 07:21:49.389157126 -0400
Modify: 2013-09-11 07:22:02.330156079 -0400
Change: 2013-09-11 07:22:02.330156079 -0400
# ls /mnt/rosnap/dir
file
# stat /mnt/rosnap/dir
File: `/mnt/rosnap/dir'
Size: 8 Blocks: 0 IO Block: 4096 directory
Device: 16h/22d Inode: 257 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-09-11 07:22:56.797151670 -0400
Modify: 2013-09-11 07:22:02.330156079 -0400
Change: 2013-09-11 07:22:02.330156079 -0400

Reported-by: Koen De Wit
Signed-off-by: Guangyu Sun
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Guangyu Sun
2013-09-21 23:05:30 +0800
5138cccf3 btrfs: Add btrfs: prefix to kernel log output ... Browse Code »

The kernel log entries for device label %s and device fsid %pU
are missing the btrfs: prefix. Add those here.

Signed-off-by: Frank Holton
Reviewed-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Frank Holton
2013-09-21 23:05:30 +0800
6ef3de9c9 btrfs: refuse to remount read-write after abort ... Browse Code »

It's still possible to flip the filesystem into RW mode after it's
remounted RO due to an abort. There are lots of places that check for
the superblock error bit and will not write data, but we should not let
the filesystem appear read-write.

Signed-off-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

David Sterba
2013-09-21 23:05:30 +0800
1cecf579d Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0 ... Browse Code »

This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default
subvolume by passing a subvolume id of 0.

Signed-off-by: chandan
Reviewed-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

chandan
2013-09-21 23:05:29 +0800
a0634be56 Btrfs: don't leak transaction in btrfs_sync_file() ... Browse Code »

In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns
a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would
return without ending/freeing the transaction.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2013-09-21 23:05:29 +0800
a724b4369 Btrfs: add the missing mutex unlock in write_all_supers() ... Browse Code »

The BUG() was replaced by btrfs_error() and return -EIO with the
patch "get rid of one BUG() in write_all_supers()", but the missing
mutex_unlock() was overlooked.

The 0-DAY kernel build service from Intel reported the missing
unlock which was found by the coccinelle tool:

fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374

Signed-off-by: Stefan Behrens
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Stefan Behrens
2013-09-21 23:05:28 +0800
f4ab9ea70 Btrfs: iput inode on allocation failure ... Browse Code »

We don't do the iput when we fail to allocate our delayed delalloc work in
__start_delalloc_inodes, fix this.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:28 +0800
363e4d354 Btrfs: remove space_info->reservation_progress ... Browse Code »

This isn't used for anything anymore, just remove it.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:27 +0800
f0de181c9 Btrfs: kill delay_iput arg to the wait_ordered functions ... Browse Code »

This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it. However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:27 +0800
c4fbb4300 Btrfs: fix worst case calculator for space usage ... Browse Code »

Forever ago I made the worst case calculator say that we could potentially split
into 3 blocks for every level on the way down, which isn't right. If we split
we're only going to get two new blocks, the one we originally cow'ed and the new
one we're going to split. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:27 +0800
14575aef4 Revert "Btrfs: rework the overcommit logic to be based on the total size" ... Browse Code »

This reverts commit 70afa3998c9baed4186df38988246de1abdab56d. It is causing
performance issues and wasn't actually correct. There were problems with the
way we flushed delalloc and that was the real cause of the early enospc.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:26 +0800
652f25a29 Btrfs: improve replacing nocow extents ... Browse Code »

Various people have hit a deadlock when running btrfs/011. This is because when
replacing nocow extents we will take the i_mutex to make sure nobody messes with
the file while we are replacing the extent. The problem is we are already
holding a transaction open, which is a locking inversion, so instead we need to
save these inodes we find and then process them outside of the transaction.

Further we can't just lock the inode and assume we are good to go. We need to
lock the extent range and then read back the extent cache for the inode to make
sure the extent really still points at the physical block we want. If it
doesn't we don't have to copy it. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:26 +0800
d555438b6 Btrfs: drop dir i_size when adding new names on replay ... Browse Code »

So if we have dir_index items in the log that means we also have the inode item
as well, which means that the inode's i_size is correct. However when we
process dir_index'es we call btrfs_add_link() which will increase the
directory's i_size for the new entry. To fix this we need to just set the dir
items i_size to 0, and then as we find dir_index items we adjust the i_size.
btrfs_add_link() will do it for new entries, and if the entry already exists we
can just add the name_len to the i_size ourselves. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:25 +0800
dd8e72177 Btrfs: replay dir_index items before other items ... Browse Code »

A user reported a bug where his log would not replay because he was getting
-EEXIST back. This was because he had a file moved into a directory that was
logged. What happens is the file had a lower inode number, and so it is
processed first when replaying the log, and so we add the inode ref in for the
directory it was moved to. But then we process the directories DIR_INDEX item
and try to add the inode ref for that inode and it fails because we already
added it when we replayed the inode. To solve this problem we need to just
process any DIR_INDEX items we have in the log first so this all is taken care
of, and then we can replay the rest of the items. With this patch my reproducer
can remount the file system properly instead of erroring out. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:25 +0800
a5874ce6c Btrfs: check roots last log commit when checking if an inode has been logged ... Browse Code »

Liu introduced a local copy of the last log commit for an inode to make sure we
actually log an inode even if a log commit has already taken place. In order to
make sure we didn't relog the same inode multiple times he set this local copy
to the current trans when we log the inode, because usually we log the inode and
then sync the log. The exception to this is during rename, we will relog an
inode if the name changed and it is already in the log. The problem with this
is then we go to sync the inode, and our check to see if the inode has already
been logged is tripped and we don't sync the log. To fix this we need to _also_
check against the roots last log commit, because it could be less than what is
in our local copy of the log commit. This fixes a bug where we rename a file
into a directory and then fsync the directory and then on remount the directory
is no longer there. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:24 +0800
de2b530bf Btrfs: actually log directory we are fsync()'ing ... Browse Code »

If you just create a directory and then fsync that directory and then pull the
power plug you will come back up and the directory will not be there. That is
because we won't actually create directories if we've logged files inside of
them since they will be created on replay, but in this check we will set our
logged_trans of our current directory if it happens to be a directory, making us
think it doesn't need to be logged. Fix the logic to only do this to parent
directories. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:24 +0800
573aecafc Btrfs: actually limit the size of delalloc range ... Browse Code »

So forever we have had this thing to limit the amount of delalloc pages we'll
setup to be written out to 128mb. This is because we have to lock all the pages
in this range, so anything above this gets a bit unweildly, and also without a
limit we'll happily allocate gigantic chunks of disk space. Turns out our check
for this wasn't quite right, we wouldn't actually limit the chunk we wanted to
write out, we'd just stop looking for more space after we went over the limit.
So if you do a giant 20gb dd on my box with lots of ram I could get 2gig
extents. This is fine normally, except when you go to relocate these extents
and we can't find enough space to relocate these moster extents, since we have
to be able to allocate exactly the same sized extent to move it around. So fix
this by actually enforcing the limit. With this patch I'm no longer seeing
giant 1.5gb extents. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:24 +0800
a48203988 Btrfs: allocate the free space by the existed max extent size when ENOSPC ... Browse Code »

By the current code, if the requested size is very large, and all the extents
in the free space cache are small, we will waste lots of the cpu time to cut
the requested size in half and search the cache again and again until it gets
down to the size the allocator can return. In fact, we can know the max extent
size in the cache after the first search, so we needn't cut the size in half
repeatedly, and just use the max extent size directly. This way can save
lots of cpu time and make the performance grow up when there are only fragments
in the free space cache.

According to my test, if there are only 4KB free space extents in the fs,
and the total size of those extents are 256MB, we can reduce the execute
time of the following test from 5.4s to 1.4s.
dd if=/dev/zero of= bs=1MB count=1 oflag=sync

Changelog v2 -> v3:
- fix the problem that we skip the block group with the space which is
less than we need.

Changelog v1 -> v2:
- address the problem that we return a wrong start position when searching
the free space in a bitmap.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Miao Xie
2013-09-21 23:05:23 +0800
13fd8da98 btrfs: add lockdep and tracing annotations for uuid tree ... Browse Code »

Signed-off-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

David Sterba
2013-09-21 22:58:56 +0800
79556c3d8 btrfs: show compiled-in config features at module load time ... Browse Code »

We want to know if there are debugging features compiled in, this may
affect performance. The message is printed before the sanity checks.

(This commit message is a copy of David Sterba's commit message when
he introduced btrfs_print_info()).

Signed-off-by: Stefan Behrens
Reviewed-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Stefan Behrens
2013-09-21 22:58:56 +0800
cef219372 Btrfs: more efficient inode tree replace operation ... Browse Code »

Instead of removing the current inode from the red black tree
and then add the new one, just use the red black tree replace
operation, which is more efficient.

Signed-off-by: Filipe David Borba Manana
Reviewed-by: Zach Brown
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2013-09-21 22:58:55 +0800
55e50e458 Btrfs: do not add replace target to the alloc_list ... Browse Code »

If replace was suspended by the umount, replace target device is added
to the fs_devices->alloc_list during a later mount. This is obviously
wrong. ->is_tgtdev_for_dev_replace is supposed to guard against that,
but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized
*after* everything is opened and fs_devices lists are populated. Fix
this by checking the devid instead: for replace targets it's always
equal to BTRFS_DEV_REPLACE_DEVID.

Cc: Stefan Behrens
Signed-off-by: Ilya Dryomov
Reviewed-by: Stefan Behrens
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Ilya Dryomov
2013-09-21 22:58:55 +0800
83d4cfd4d Btrfs: fixup error handling in btrfs_reloc_cow ... Browse Code »

If we failed to actually allocate the correct size of the extent to relocate we
will end up in an infinite loop because we won't return an error, we'll just
move on to the next extent. So fix this up by returning an error, and then fix
all the callers to return an error up the stack rather than BUG_ON()'ing.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 22:58:54 +0800
07f0e62e7 Merge tag 'v3.11' into for-linus ... Browse Code »

Linux 3.11

Chris Mason
2013-09-21 22:44:55 +0800

03 Sep, 2013

4 commits

6e4664525 Linux 3.11 Browse Code »

Linus Torvalds
2013-09-03 04:46:10 +0800
248d296d6 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI fix from James Bottomley:
"This is a bug fix for the pm80xx driver. It turns out that when the
new hardware support was added in 3.10 the IO command size was kept at
the old hard coded value. This means that the driver attaches to some
new cards and then simply hangs the system"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] pm80xx: fix Adaptec 71605H hang

Linus Torvalds
2013-09-03 01:43:13 +0800
e09a1fa9b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 boot fix from Peter Anvin:
"A single very small boot fix for very large memory systems (> 0.5T)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM

Linus Torvalds
2013-09-03 00:55:14 +0800
ac0bc7899 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma ... Browse Code »

Pull slave-dma fix from Vinod Koul:
"A fix for resolving TI_EDMA driver's build error in allmodconfig to
have filter function built in""

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dma/Kconfig: TI_EDMA needs to be boolean

Linus Torvalds
2013-09-03 00:54:06 +0800

01 Sep, 2013

3 commits

d7396f073 Btrfs: optimize key searches in btrfs_search_slot ... Browse Code »

When the binary search returns 0 (exact match), the target key
will necessarily be at slot 0 of all nodes below the current one,
so in this case the binary search is not needed because it will
always return 0, and we waste time doing it, holding node locks
for longer than necessary, etc.

Below follow histograms with the times spent on the current approach of
doing a binary search when the previous binary search returned 0, and
times for the new approach, which directly picks the first item/child
node in the leaf/node.

Current approach:

Count: 6682
Range: 35.000 - 8370.000; Mean: 85.837; Median: 75.000; Stddev: 106.429
Percentiles: 90th: 124.000; 95th: 145.000; 99th: 206.000
35.000 - 61.080: 1235 ################
61.080 - 106.053: 4207 #####################################################
106.053 - 183.606: 1122 ##############
183.606 - 317.341: 111 #
317.341 - 547.959: 6 |
547.959 - 8370.000: 1 |

Approach proposed by this patch:

Count: 6682
Range: 6.000 - 135.000; Mean: 16.690; Median: 16.000; Stddev: 7.160
Percentiles: 90th: 23.000; 95th: 27.000; 99th: 40.000
6.000 - 8.418: 58 #
8.418 - 11.670: 1149 #########################
11.670 - 16.046: 2418 #####################################################
16.046 - 21.934: 2098 ##############################################
21.934 - 29.854: 744 ################
29.854 - 40.511: 154 ###
40.511 - 54.848: 41 #
54.848 - 74.136: 5 |
74.136 - 100.087: 9 |
100.087 - 135.000: 6 |

These samples were captured during a run of the btrfs tests 001, 002 and
004 in the xfstests, with a leaf/node size of 4Kb.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2013-09-01 20:16:42 +0800
45d5fd14d Btrfs: don't use an async starter for most of our workers ... Browse Code »

We only need an async starter if we can't make a GFP_NOFS allocation in our
current path. This is the case for the endio stuff since it happens in IRQ
context, but things like the caching thread workers and the delalloc flushers we
can easily make this allocation and start threads right away. Also change the
worker count for the caching thread pool. Traditionally we limited this to 2
since we took read locks while caching, but nowadays we do this lockless so
there's no reason to limit the number of caching threads. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-01 20:16:41 +0800
7f4f6e0a3 Btrfs: only update disk_i_size as we remove extents ... Browse Code »

This fixes a problem where if we fail a truncate we will leave the i_size set
where we wanted to truncate to instead of where we were able to truncate to.
Fix this by making btrfs_truncate_inode_items do the disk_i_size update as it
removes extents, that way it will always be consistent with where its extents
are. Then if the truncate fails at all we can update the in-ram i_size with
what we have on disk and delete the orphan item. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-01 20:16:40 +0800