Doug / smarc-fsl-linux-kernel | Embedian Git Server

02 Oct, 2011

6 commits

7a26285ee btrfs: use readahead API for scrub ... Browse Code »

Scrub uses a simple tree-enumeration to bring the relevant portions
of the extent- and csum-tree into the page cache before starting the
scrub-I/O. This is now replaced by using the new readahead-API.
During readahead the scrub is being accounted as paused, so it won't
hold off transaction commits.

This change raises the average disk bandwith utilisation on my test
volume from 70% to 90%. On another volume, the time for a test run
went down from 89s to 43s.

Changes v5:
- reada1/2 are now of type struct reada_control *

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:48:45 +0800
4bb31e928 btrfs: hooks for readahead ... Browse Code »

This adds the hooks needed for readahead. In the readpage_end_io_hook,
the extent state is checked for the EXTENT_READAHEAD flag. Only in this
case the readahead hook is called, to keep the impact on non-ra as low
as possible.
Additionally, a hook for a failed IO is added, otherwise readahead would
wait indefinitely for the extent to finish.

Changes for v2:
- eliminate race condition

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:48:44 +0800
7414a03fb btrfs: initial readahead code and prototypes ... Browse Code »

This is the implementation for the generic read ahead framework.

To trigger a readahead, btrfs_reada_add must be called. It will start
a read ahead for the given range [start, end) on tree root. The returned
handle can either be used to wait on the readahead to finish
(btrfs_reada_wait), or to send it to the background (btrfs_reada_detach).

The read ahead works as follows:
On btrfs_reada_add, the root of the tree is inserted into a radix_tree.
reada_start_machine will then search for extents to prefetch and trigger
some reads. When a read finishes for a node, all contained node/leaf
pointers that lie in the given range will also be enqueued. The reads will
be triggered in sequential order, thus giving a big win over a naive
enumeration. It will also make use of multi-device layouts. Each disk
will have its on read pointer and all disks will by utilized in parallel.
Also will no two disks read both sides of a mirror simultaneously, as this
would waste seeking capacity. Instead both disks will read different parts
of the filesystem.
Any number of readaheads can be started in parallel. The read order will be
determined globally, i.e. 2 parallel readaheads will normally finish faster
than the 2 started one after another.

Changes v2:
- protect root->node by transaction instead of node_lock
- fix missed branches:
The readahead had a too simple check to determine if a branch from
a node should be checked or not. It now also records the upper bound
of each node to see if the requested RA range lies within.
- use KERN_CONT to debug output, to avoid line breaks
- defer reada_start_machine to worker to avoid deadlock

Changes v3:
- protect root->node by rcu

Changes v5:
- changed EIO-semantics of reada_tree_block_flagged
- remove spin_lock from reada_control and make elems an atomic_t
- remove unused read_total from reada_control
- kill reada_key_cmp, use btrfs_comp_cpu_keys instead
- use kref-style release functions where possible
- return struct reada_control * instead of void * from btrfs_reada_add

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:48:44 +0800
90519d66a btrfs: state information for readahead ... Browse Code »

Add state information for readahead to btrfs_fs_info and btrfs_device

Changes v2:
- don't wait in radix_trees
- add own set of workers for readahead

Reviewed-by: Josef Bacik
Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:48:30 +0800
ab0fff030 btrfs: add READAHEAD extent buffer flag ... Browse Code »

Add a READAHEAD extent buffer flag.
Add a function to trigger a read with this flag set.

Changes v2:
- use extent buffer flags instead of extent state flags

Changes v5:
- adapt to changed read_extent_buffer_pages interface
- don't return eb from reada_tree_block_flagged if it has CORRUPT flag set

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:47:57 +0800
bb82ab88d btrfs: add an extra wait mode to read_extent_buffer_pages ... Browse Code »

read_extent_buffer_pages currently has two modes, either trigger a read
without waiting for anything, or wait for the I/O to finish. The former
also bails when it's unable to lock the page. This patch now adds an
additional parameter to allow it to block on page lock, but don't wait
for completion.

Changes v5:
- merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
WAIT_PAGE_LOCK

Change v6:
- fix bug introduced in v5

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:47:55 +0800

01 Oct, 2011

2 commits

286d6e70a Merge branch 'btrfs-3.0' into for-linus Browse Code »

Chris Mason
2011-10-01 03:26:09 +0800
b6316429a Btrfs: force a page fault if we have a shorty copy on a page boundary ... Browse Code »

A user reported a problem where ceph was getting into 100% cpu usage while doing
some writing. It turns out it's because we were doing a short write on a not
uptodate page, which means we'd fall back at one page at a time and fault the
page in. The problem is our position is on the page boundary, so our fault in
logic wasn't actually reading the page, so we'd just spin forever or until the
page got read in by somebody else. This will force a readpage if we end up
doing a short copy. Alexandre could reproduce this easily with ceph and reports
it fixes his problem. I also wrote a reproducer that no longer hangs my box
with this patch. Thanks,

Reported-and-tested-by: Alexandre Oliva
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-10-01 03:23:54 +0800

21 Sep, 2011

2 commits

0a7a0519d Merge branch 'btrfs-3.0' into for-linus Browse Code »

Chris Mason
2011-09-21 02:49:29 +0800
b6f3409b2 Btrfs: reserve sufficient space for ioctl clone ... Browse Code »

Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We
need to reserve space for:

- adjusting the old extent (possibly splitting it)
- adding the new extent
- updating the inode

Signed-off-by: Sage Weil
Signed-off-by: Chris Mason

Sage Weil
2011-09-21 02:48:51 +0800

18 Sep, 2011

7 commits

a66e7cc62 Btrfs: only clear the need lookup flag after the dentry is setup ... Browse Code »

We can race with readdir and the RCU path walking stuff. This is because we
clear the need lookup flag before actually instantiating the inode. This will
lead the RCU path walk stuff to find a dentry it thinks is valid without a
d_inode attached. So instead unhash the dentry when we first start the lookup,
and then clear the flag after we've instantiated the dentry so we're garunteed
to either try the slow lookup, or have the d_inode set properly.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-09-18 22:34:03 +0800
48802c8ae BTRFS: Fix lseek return value for error ... Browse Code »

The recent reworking of btrfs' lseek lead to incorrect
values being returned. This adds checks for seeking
beyond EOF in SEEK_HOLE and makes sure the error
values come back correct.

Andi Kleen also sent in similar patches.

Signed-off-by: Jie Liu
Reported-by: Andi Kleen
Signed-off-by: Chris Mason

Jeff Liu
2011-09-18 22:34:02 +0800
2cf4ce7c2 Merge branch 'btrfs-3.0' into for-linus Browse Code »

Chris Mason
2011-09-18 22:31:44 +0800
dde820fbf Btrfs: don't change inode flag of the dest clone file ... Browse Code »

The dst file will have the same inode flags with dst file after
file clone, and I think it's unexpected.

For example, the dst file will suddenly become immutable after
getting some share of data with src file, if the src is immutable.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-18 22:20:46 +0800
0e7b824c4 Btrfs: don't make a file partly checksummed through file clone ... Browse Code »

To reproduce the bug:

# mount /dev/sda7 /mnt
# dd if=/dev/zero of=/mnt/src bs=4K count=1
# umount /mnt

# mount -o nodatasum /dev/sda7 /mnt
# dd if=/dev/zero of=/mnt/dst bs=4K count=1
# clone_range -s 4K -l 4K /mnt/src /mnt/dst

# echo 3 > /proc/sys/vm/drop_caches
# cat /mnt/dst
# dmesg
...
btrfs no csum found for inode 258 start 0
btrfs csum failed ino 258 off 0 csum 2566472073 private 0

It's because part of the file is checksummed and the other part is not,
and then btrfs will complain checksum is not found when we read the file.

Disallow file clone if src and dst file have different checksum flag,
so we ensure a file is completely checksummed or unchecksummed.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-18 22:20:46 +0800
71ef07861 Btrfs: fix pages truncation in btrfs_ioctl_clone() ... Browse Code »

It's a bug in commit f81c9cdc567cd3160ff9e64868d9a1a7ee226480
(Btrfs: truncate pages from clone ioctl target range)

We should pass the dest range to the truncate function, but not the
src range.

Also move the function before locking extent state.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-18 22:20:46 +0800
3765fefae btrfs: fix d_off in the first dirent ... Browse Code »

Since the d_off in the first dirent for "." (that originates from
the 4th argument "offset" of filldir() for the 2nd dirent for "..")
is wrongly assigned in btrfs_real_readdir(), telldir returns same
offset for different locations.

| # mkfs.btrfs /dev/sdb1
| # mount /dev/sdb1 fs0
| # cd fs0
| # touch file0 file1
| # ../test
| telldir: 0
| readdir: d_off = 2, d_name = "."
| telldir: 2
| readdir: d_off = 2, d_name = ".."
| telldir: 2
| readdir: d_off = 3, d_name = "file0"
| telldir: 3
| readdir: d_off = 2147483647, d_name = "file1"
| telldir: 2147483647

To fix this problem, pass filp->f_pos (which is loff_t) instead.

| # ../test
| telldir: 0
| readdir: d_off = 1, d_name = "."
| telldir: 1
| readdir: d_off = 2, d_name = ".."
| telldir: 2
| readdir: d_off = 3, d_name = "file0"
:

At the moment the "offset" for "." is unused because there is no
preceding dirent, however it is better to pass filp->f_pos to follow
grammatical usage.

Signed-off-by: Hidetoshi Seto
Signed-off-by: Chris Mason

Hidetoshi Seto
2011-09-18 22:20:46 +0800

13 Sep, 2011

8 commits

b6fd41e29 Linux 3.1-rc6 Browse Code »

Linus Torvalds
2011-09-13 05:02:02 +0800
8cb3ed17c Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: Remove duplicate "return" statement
drm/nv04/crtc: Bail out if FB is not bound to crtc
drm/nouveau: fix nv04_sgdma_bind on non-"4kB pages" archs
drm/nouveau: properly handle allocation failure in nouveau_sgdma_populate
drm/nouveau: fix oops on pre-semaphore hardware
drm/nv50/crtc: Bail out if FB is not bound to crtc
drm/radeon/kms: fix DP detect and EDID fetch for DP bridges

Linus Torvalds
2011-09-13 04:49:07 +0800
4c7527821 Merge branch 'fixes' of git://git.linaro.org/people/arnd/arm-soc ... Browse Code »

* 'fixes' of git://git.linaro.org/people/arnd/arm-soc:
ARM: CSR: add missing sentinels to of_device_id tables
ARM: cns3xxx: Fix newly introduced warnings in the PCIe code
ARM: cns3xxx: Fix compile error caused by hardware.h removed
ARM: davinci: fix cache flush build error
ARM: davinci: correct MDSTAT_STATE_MASK
ARM: davinci: da850 EVM: read mac address from SPI flash
OMAP: omap_device: fix !CONFIG_SUSPEND case in _noirq handlers
OMAP2430: hwmod: musb: add missing terminator to omap2430_usbhsotg_addrs[]
OMAP3: clock: indicate that gpt12_fck and wdt1_fck are in the WKUP clockdomain
OMAP4: clock: fix compile warning
OMAP4: clock: re-enable previous clockdomain enable/disable sequence
OMAP: clockdomain: Wait for powerdomain to be ON when using clockdomain force wakeup
OMAP: powerdomains: Make all powerdomain target states as ON at init

Linus Torvalds
2011-09-13 02:51:35 +0800
14d01ff53 ioctl: register LTTng ioctl ... Browse Code »

The LTTng 2.0 kernel tracer (stand-alone module package, available at
http://lttng.org) uses the 0xF6 ioctl range for tracer control and
transport operations.

Signed-off-by: Mathieu Desnoyers
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2011-09-13 02:50:56 +0800
0b001b2ed Merge branch 'for-linus' of git://github.com/chrismason/linux ... Browse Code »

* 'for-linus' of git://github.com/chrismason/linux:
Btrfs: add dummy extent if dst offset excceeds file end in
Btrfs: calc file extent num_bytes correctly in file clone
btrfs: xattr: fix attribute removal
Btrfs: fix wrong nbytes information of the inode
Btrfs: fix the file extent gap when doing direct IO
Btrfs: fix unclosed transaction handle in btrfs_cont_expand
Btrfs: fix misuse of trans block rsv
Btrfs: reset to appropriate block rsv after orphan operations
Btrfs: skip locking if searching the commit root in csum lookup
btrfs: fix warning in iput for bad-inode
Btrfs: fix an oops when deleting snapshots

Linus Torvalds
2011-09-13 02:47:49 +0800
5dfcc87fd fuse: fix memory leak ... Browse Code »

kmemleak is reporting that 32 bytes are being leaked by FUSE:

unreferenced object 0xe373b270 (size 32):
comm "fusermount", pid 1207, jiffies 4294707026 (age 2675.187s)
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x27/0x50
[] kmem_cache_alloc+0xc5/0x180
[] fuse_alloc_forget+0x1e/0x20
[] fuse_alloc_inode+0xb0/0xd0
[] alloc_inode+0x1c/0x80
[] iget5_locked+0x8f/0x1a0
[] fuse_iget+0x72/0x1a0
[] fuse_get_root_inode+0x8a/0x90
[] fuse_fill_super+0x3ef/0x590
[] mount_nodev+0x3f/0x90
[] fuse_mount+0x15/0x20
[] mount_fs+0x1c/0xc0
[] vfs_kern_mount+0x41/0x90
[] do_kern_mount+0x39/0xd0
[] do_mount+0x2e5/0x660
[] sys_mount+0x66/0xa0

This leak report is consistent and happens once per boot on
3.1.0-rc5-dirty.

This happens if a FORGET request is queued after the fuse device was
released.

Reported-by: Sitsofe Wheeler
Signed-off-by: Miklos Szeredi
Tested-by: Sitsofe Wheeler
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-09-13 02:47:10 +0800
24114504c fuse: fix flock breakage ... Browse Code »

Commit 37fb3a30b4 ("fuse: fix flock") added in 3.1-rc4 caused flock() to
fail with ENOSYS with the kernel ABI version 7.16 or earlier.

Fix by falling back to testing FUSE_POSIX_LOCKS for ABI versions 7.16
and earlier.

Reported-by: Martin Ziegler
Signed-off-by: Miklos Szeredi
Tested-by: Martin Ziegler
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-09-13 02:47:10 +0800
15ce92861 Merge branch 'for_3.1/pm-fixes-2' of git://gitorious.org/khilman/linux-omap-pm into fixes Browse Code »

Arnd Bergmann
2011-09-13 02:30:22 +0800

12 Sep, 2011

3 commits

d035953e5 Merge branch 'sirf/fixes' into fixes Browse Code »

Arnd Bergmann
2011-09-12 22:59:37 +0800
87adf1c66 Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus ... Browse Code »

* 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus:
[media] vp7045: fix buffer setup
[media] nuvoton-cir: simplify raw IR sample handling
[media] [Resend] viacam: Don't explode if pci_find_bus() returns NULL
[media] v4l2: Fix documentation of the codec device controls
[media] gspca - sonixj: Fix the darkness of sensor om6802 in 320x240
[media] gspca - sonixj: Fix wrong register mask for sensor om6802
[media] gspca - ov519: Fix LED inversion of some ov519 webcams
[media] pwc: precedence bug in pwc_init_controls()

Linus Torvalds
2011-09-12 05:58:47 +0800
14f69ec70 Merge branch 'for-linus' of git://openrisc.net/~jonas/linux ... Browse Code »

* 'for-linus' of git://openrisc.net/~jonas/linux:
Add missing DMA ops
openrisc: don't use pt_regs in struct sigcontext

Linus Torvalds
2011-09-12 05:55:43 +0800

11 Sep, 2011

12 commits

d525e8ab0 Btrfs: add dummy extent if dst offset excceeds file end in ... Browse Code »

You can see there's no file extent with range [0, 4096]. Check this by
btrfsck:

# btrfsck /dev/sda7
root 5 inode 258 errors 100
...

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-11 22:52:25 +0800
d72c0842f Btrfs: calc file extent num_bytes correctly in file clone ... Browse Code »

num_bytes should be 4096 not 12288.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-11 22:52:25 +0800
4815053ab btrfs: xattr: fix attribute removal ... Browse Code »

An attribute is not removed by 'setfattr -x attr file' and remains
visible in attr list. This makes xfstests/062 pass again.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-09-11 22:52:25 +0800
a39f75214 Btrfs: fix wrong nbytes information of the inode ... Browse Code »

If we write some data into the data hole of the file(no preallocation for this
hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
the other element--disk_i_size needn't be updated. At this condition, we must
update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
return 1).

# mkfs.btrfs /dev/sdb1
# mount /dev/sdb1 /mnt
# touch /mnt/a
# truncate -s 856002 /mnt/a
# dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc
# umount /mnt
# btrfsck /dev/sdb1
root 5 inode 257 errors 400
found 32768 bytes used err is 1

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-09-11 22:52:25 +0800
0c1a98c81 Btrfs: fix the file extent gap when doing direct IO ... Browse Code »

When we write some data to the place that is beyond the end of the file
in direct I/O mode, a data hole will be created. And Btrfs should insert
a file extent item that point to this hole into the fs tree. But unfortunately
Btrfs forgets doing it.

The following is a simple way to reproduce it:
# mkfs.btrfs /dev/sdc2
# mount /dev/sdc2 /test4
# touch /test4/a
# dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
# umount /test4
# btrfsck /dev/sdc2
root 5 inode 257 errors 100

Reported-by: Tsutomu Itoh
Signed-off-by: Miao Xie
Tested-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Miao Xie
2011-09-11 22:52:24 +0800
5b397377e Btrfs: fix unclosed transaction handle in btrfs_cont_expand ... Browse Code »

The function - btrfs_cont_expand() forgot to close the transaction handle before
it jump out the while loop. Fix it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-09-11 22:52:24 +0800
98c9942ac Btrfs: fix misuse of trans block rsv ... Browse Code »

At the beginning of create_pending_snapshot, trans->block_rsv is set
to pending->block_rsv and is used for snapshot things, however, when
it is done, we do not recover it as will.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-09-11 22:52:24 +0800
65450aa64 Btrfs: reset to appropriate block rsv after orphan operations ... Browse Code »

While truncating free space cache, we forget to change trans->block_rsv
back to the original one, but leave it with the orphan_block_rsv, and
then with option inode_cache enable, it leads to countless warnings of
btrfs_alloc_free_block and btrfs_orphan_commit_root:

WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
...
WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-09-11 22:52:24 +0800
ddf23b3fc Btrfs: skip locking if searching the commit root in csum lookup ... Browse Code »

It's not enough to just search the commit root, since we could be cow'ing the
very block we need to search through, which would mean that its locked and we'll
still deadlock. So use path->skip_locking as well. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-09-11 22:52:24 +0800
e0b6d65be btrfs: fix warning in iput for bad-inode ... Browse Code »

iput() shouldn't be called for inodes in I_NEW state.
We need to mark inode as constructed first.

WARNING: at fs/inode.c:1309 iput+0x20b/0x210()
Call Trace:
[] warn_slowpath_common+0x7a/0xb0
[] warn_slowpath_null+0x15/0x20
[] iput+0x20b/0x210
[] btrfs_iget+0x1eb/0x4a0
[] btrfs_run_defrag_inodes+0x136/0x210
[] cleaner_kthread+0x17f/0x1a0
[] ? sub_preempt_count+0x9d/0xd0
[] ? transaction_kthread+0x280/0x280
[] kthread+0x96/0xa0
[] kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x190/0x190
[] ? gs_change+0xb/0xb

Signed-off-by: Sergei Trofimovich
CC: Konstantin Khlebnikov
Tested-by: David Sterba
CC: Josef Bacik
CC: Chris Mason
Signed-off-by: Chris Mason

Sergei Trofimovich
2011-09-11 22:52:24 +0800
14c7cca78 Btrfs: fix an oops when deleting snapshots ... Browse Code »

We can reproduce this oops via the following steps:

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs
$ for ((i=0; ii_ino
to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
while the snapshot's location.objectid remains unchanged.

However, btrfs_ino() does not take this into account, and returns a wrong ino,
and causes the oops.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-09-11 22:52:24 +0800
fc61ccd35 [media] vp7045: fix buffer setup ... Browse Code »

dvb_usb_device_init calls the frontend_attach method of this driver which
uses vp7045_usb_ob. In order to have a buffer ready in vp7045_usb_op, it has to
be allocated before that happens.

Luckily we can use the whole private data as the buffer as it gets separately
allocated on the heap via kzalloc in dvb_usb_device_init and is thus apt for
use via usb_control_msg.

This fixes a
BUG: unable to handle kernel paging request at 0000000000001e78

reported by Tino Keitel and diagnosed by Dan Carpenter.

Cc: stable@kernel.org # For v3.0 and upper
Tested-by: Tino Keitel
Signed-off-by: Florian Mickler
Signed-off-by: Mauro Carvalho Chehab

Florian Mickler
2011-09-11 20:33:41 +0800