Eric Lee / smarc-fsl-linux-kernel

19 Mar, 2010

5 commits

441f4058a Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (30 commits)
Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree
Btrfs: allow treeid==0 in the inode lookup ioctl
Btrfs: return keys for large items to the search ioctl
Btrfs: fix key checks and advance in the search ioctl
Btrfs: buffer results in the space_info ioctl
Btrfs: use __u64 types in ioctl.h
Btrfs: fix search_ioctl key advance
Btrfs: fix gfp flags masking in the compression code
Btrfs: don't look at bio flags after submit_bio
btrfs: using btrfs_stack_device_id() get devid
btrfs: use memparse
Btrfs: add a "df" ioctl for btrfs
Btrfs: cache the extent state everywhere we possibly can V2
Btrfs: cache ordered extent when completing io
Btrfs: cache extent state in find_delalloc_range
Btrfs: change the ordered tree to use a spinlock instead of a mutex
Btrfs: finish read pages in the order they are submitted
btrfs: fix btrfs_mkdir goto for no free objectids
Btrfs: flush data on snapshot creation
Btrfs: make df be a little bit more understandable
...

Linus Torvalds
2010-03-19 07:50:55 +0800
8ad6fcab5 Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree ... Browse Code »

This is used by the inode lookup ioctl to follow all the backrefs up
to the subvol root. But the search being done would sometimes land one
past the last item in the leaf instead of finding the backref.

This changes the search to look for the highest possible backref and hop
back one item. It also fixes a leaked path on failure to find the root.

Signed-off-by: Chris Mason

Chris Mason
2010-03-19 00:23:10 +0800
1b53ac4d1 Btrfs: allow treeid==0 in the inode lookup ioctl ... Browse Code »

When a root id of 0 is sent to the inode lookup ioctl, it will
use the root of the file we're ioctling and pass the root id
back to userland along with the results.

This allows userland to do searches based on that root later on.

Signed-off-by: Chris Mason

Chris Mason
2010-03-19 00:17:05 +0800
90fdde147 Btrfs: return keys for large items to the search ioctl ... Browse Code »

The search ioctl was skipping large items entirely (ones that are too
big for the results buffer). This changes things to at least copy
the item header so that we can send information about the item back to
userland.

Signed-off-by: Chris Mason

Chris Mason
2010-03-19 00:14:54 +0800
abc6e1341 Btrfs: fix key checks and advance in the search ioctl ... Browse Code »

The search ioctl was working well for finding tree roots, but using it for
generic searches requires a few changes to how the keys are advanced.
This treats the search control min fields for objectid, type and offset
more like a key, where we drop the offset to zero once we bump the type,
etc.

The downside of this is that we are changing the min_type and min_offset
fields during the search, and so the ioctl caller needs extra checks to make sure
the keys in the result are the ones it wanted.

This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
things more readable.

Signed-off-by: Chris Mason

Chris Mason
2010-03-19 00:10:08 +0800

17 Mar, 2010

3 commits

7fde62bff Btrfs: buffer results in the space_info ioctl ... Browse Code »

The space_info ioctl was using copy_to_user inside rcu_read_lock. This
commit changes things to copy into a buffer first and then dump the
result down to userland.

Signed-off-by: Chris Mason

Chris Mason
2010-03-17 03:40:10 +0800
ce769a290 Btrfs: use __u64 types in ioctl.h ... Browse Code »

Signed-off-by: Sage Weil
Signed-off-by: Chris Mason

Sage Weil
2010-03-17 02:24:27 +0800
854d2c353 Btrfs: fix search_ioctl key advance ... Browse Code »

key->type is u8, not u64.

fs/btrfs/ioctl.c: In function 'copy_to_sk':
fs/btrfs/ioctl.c:1024: warning: comparison is always true due to limited range of data type

Signed-off-by: Sage Weil
Signed-off-by: Chris Mason

Sage Weil
2010-03-17 02:24:27 +0800

15 Mar, 2010

23 commits

ef5780c01 Btrfs: fix gfp flags masking in the compression code ... Browse Code »

GFP_FS must be masked out, NOFS can't be or'd in.

Signed-off-by: Chris Mason

Nick Piggin
2010-03-15 23:05:57 +0800
5ff7ba3a7 Btrfs: don't look at bio flags after submit_bio ... Browse Code »

After callling submit_bio, the bio can be freed at any time. The
btrfs submission thread helper was checking the bio flags too late,
which might not give the correct answer.

When CONFIG_DEBUG_PAGE_ALLOC is turned on, it can lead to oopsen.

Signed-off-by: Chris Mason

Chris Mason
2010-03-15 23:00:15 +0800
a343832f1 btrfs: using btrfs_stack_device_id() get devid ... Browse Code »

We can use btrfs_stack_device_id() to get dev_item->devid

Signed-off-by: Xiao Guangrong
Signed-off-by: Chris Mason

Xiao Guangrong
2010-03-15 23:00:14 +0800
91748467a btrfs: use memparse ... Browse Code »

Use memparse() instead of its own private implementation.

Signed-off-by: Akinobu Mita
Cc: Chris Mason
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason

Akinobu Mita
2010-03-15 23:00:14 +0800
1406e4327 Btrfs: add a "df" ioctl for btrfs ... Browse Code »

df is a very loaded question in btrfs. This gives us a way to get the per-space
usage information so we can tell exactly what is in use where. This will help
us figure out ENOSPC problems, and help users better understand where their disk
space is going.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:14 +0800
2ac55d41b Btrfs: cache the extent state everywhere we possibly can V2 ... Browse Code »

This patch just goes through and fixes everybody that does

lock_extent()
blah
unlock_extent()

to use

lock_extent_bits()
blah
unlock_extent_cached()

and pass around a extent_state so we only have to do the searches once per
function. This gives me about a 3 mb/s boots on my random write test. I have
not converted some things, like the relocation and ioctl's, since they aren't
heavily used and the relocation stuff is in the middle of being re-written. I
also changed the clear_extent_bit() to only unset the cached state if we are
clearing EXTENT_LOCKED and related stuff, so we can do things like this

lock_extent_bits()
clear delalloc bits
unlock_extent_cached()

without losing our cached state. I tested this thoroughly and turned on
LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out
fine.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:13 +0800
5a1a3df1f Btrfs: cache ordered extent when completing io ... Browse Code »

When finishing io we run btrfs_dec_test_ordered_pending, and then immediately
run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that
already, so we're searching twice when we don't have to. This patch lets us
pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do
complete io on that ordered extent we can just use the one we found then instead
of having to do another btrfs_lookup_ordered_extent. This made my fio job with
the other patch go from 24 mb/s to 29 mb/s.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:13 +0800
c2a128d28 Btrfs: cache extent state in find_delalloc_range ... Browse Code »

This patch makes us cache the extent state we find in find_delalloc_range since
we'll have to lock the extent later on in the function. This will keep us from
re-searching for the rang when we try to lock the extent.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:13 +0800
49958fd7d Btrfs: change the ordered tree to use a spinlock instead of a mutex ... Browse Code »

The ordered tree used to need a mutex, but currently all we use it for is to
protect the rb_tree, and a spin_lock is just fine for that. Using a spin_lock
instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have
less latency, 3445.138 ms instead of 3820.633 ms.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:12 +0800
4125bf761 Btrfs: finish read pages in the order they are submitted ... Browse Code »

The endio is done at reverse order of bio vectors.

That means for a sequential read, the page first submitted will finish
last in a bio. Considering we will do checksum (making cache hot) for
every page, this does introduce delay (and chance to squeeze cache used
soon) for pages submitted at the begining.

I don't observe obvious performance difference with below patch at my
simple test, but seems more natural to finish read in the order they are
submitted.

Signed-off-by: Shaohua Li
Signed-off-by: Chris Mason

Chris Mason
2010-03-15 23:00:12 +0800
0be2e9817 btrfs: fix btrfs_mkdir goto for no free objectids ... Browse Code »

btrfs_mkdir() must jump to the place of ending transaction after
btrfs_find_free_objectid() failed. Or this transaction can't end.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2010-03-15 23:00:11 +0800
0bdb1db29 Btrfs: flush data on snapshot creation ... Browse Code »

Flush any delalloc extents when we create a snapshot, so that recently
written file data is always included in the snapshot.

A later commit will add the ability to snapshot without the flush, but
most people expect flushing.

Signed-off-by: Sage Weil
Signed-off-by: Chris Mason

Sage Weil
2010-03-15 23:00:11 +0800
bd4d10888 Btrfs: make df be a little bit more understandable ... Browse Code »

The way we report df usage is way confusing for everybody, including some other
utilities (bacula for one). So this patch makes df a little bit more
understandable. First we make used actually count the total amount of used
space in all space info's. This will give us a real view of how much disk space
is in use. Second, for blocks available, only count data space. This makes
things like bacula work because it says 0 when you can no longer write anymore
data to the disk. I think this is a nice compromise, since you will end up with
something like the following

[root@alpha ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
148G 30G 111G 21% /
/dev/sda1 194M 116M 68M 64% /boot
tmpfs 985M 12K 985M 1% /dev/shm
/dev/mapper/VolGroup-LogVol02
145G 140G 0 100% /mnt/btrfs-test

Compare this with btrfsctl -i output

[root@alpha btrfs-progs-unstable]# ./btrfsctl -i /mnt/btrfs-test/
Metadata, DUP: total=4.62GB, used=2.46GB
System, DUP: total=8.00MB, used=24.00KB
Data: total=134.80GB, used=134.80GB
Metadata: total=8.00MB, used=0.00
System: total=4.00MB, used=0.00
operation complete

This way we show that there is no more data space to be used, but we have
another 5GB of space left for metadata. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:11 +0800
3a0524dc0 btrfs: Update existing btrfs_device for renaming device ... Browse Code »

When we scan devices in a multi-device filesystem, we memorize the original
name. If the device gets a new name, later scans don't update the
in-kernel structures related to it, and we're not able to mount the
filesystem.

This patch updates device name during scaning.

Signed-off-by: TARUISI Hiroaki
Signed-off-by: Chris Mason

TARUISI Hiroaki
2010-03-15 23:00:10 +0800
1e701a329 Btrfs: add new defrag-range ioctl. ... Browse Code »

The btrfs defrag ioctl was limited to doing the entire file. This
commit adds a new interface that can defrag a specific range inside
the file.

It can also force compression on the file, allowing you to selectively
compress individual files after they were created, even when mount -o
compress isn't turned on.

Signed-off-by: Chris Mason

Chris Mason
2010-03-15 23:00:10 +0800
940100a4a Btrfs: be more selective in the defrag ioctl ... Browse Code »

The btrfs defrag ioctl had some bugs around delalloc accounting, and it
wasn't properly skipping pages that were not in the mapping.

It wasn't properly clearing the page checked flag, which could make the
writeback code ignore the page forever while pinning it as dirty.

This commit fixes those problems and makes defrag a little smarter. It
skips holes and it doesn't waste time defragging large extents. If a
tiny extent comes before a very large extent, it will defrag both of
them to make sure the tiny extent ends up next to something big.

Signed-off-by: Chris Mason

Chris Mason
2010-03-15 23:00:10 +0800
51684082b Btrfs: run the backing dev more often in the submit_bio helper ... Browse Code »

The submit_bio helper thread can decide to loop back around to
service more bios. This commit forces it to unplug first, which helps
reduce the latency seen by submitters.

Signed-off-by: Chris Mason

Chris Mason
2010-03-15 23:00:09 +0800
4849f01d1 Btrfs: make subvolid=0 mount the original default root ... Browse Code »

Since theres not a good way to make sure the user sees the original default root
tree id, and not to mention it's 5 so is way different than any other volume,
just make subvol=0 mount the original default root. This makes it a bit easier
for users to handle in the long run. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:09 +0800
6ef5ed0d3 Btrfs: add ioctl and incompat flag to set the default mount subvol ... Browse Code »

This patch needs to go along with my previous patch. This lets us set the
default dir item's location to whatever root we want to use as our default
mounting subvol. With this we don't have to use mount -o subvol=
anymore to mount a different subvol, we can just set the new one and it will
just magically work. I've done some moderate testing with this, mostly just
switching the default mount around, mounting subvols and the default mount at
the same time and such, everything seems to work. Thanks,

Older kernels would generally be able to still mount the filesystem with the
default subvolume set, but it would result in a different volume being mounted,
which could be an even more unpleasant suprise for users. So if you set your
default subvolume, you can't go back to older kernels. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:08 +0800
73f73415c Btrfs: change how we mount subvolumes ... Browse Code »

This work is in preperation for being able to set a different root as the
default mounting root.

There is currently a problem with how we mount subvolumes. We cannot currently
mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
default subvolume. So say you take a snapshot of the default subvolume and call
it snap1, and then take a snapshot of snap1 and call it snap2, so now you have

/
/snap1
/snap1/snap2

as your available volumes. Currently you can only mount / and /snap1,
you cannot mount /snap1/snap2. To fix this problem instead of passing
subvolid= you must pass in subvolid=, where is
the tree id that gets spit out via the subvolume listing you get from
the subvolume listing patches (btrfs filesystem list). This allows us
to mount /, /snap1 and /snap1/snap2 as the root volume.

In addition to the above, we also now read the default dir item in the
tree root to get the root key that it points to. For now this just
points at what has always been the default subvolme, but later on I plan
to change it to point at whatever root you want to be the new default
root, so you can just set the default mount and not have to mount with
-o subvolid=. I tested this out with the above scenario and it
worked perfectly. Thanks,

mount -o subvol operates inside the selected subvolid. For example:

mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt

/mnt will have the snap1 directory for the subvolume with id
256.

mount -o subvol=snap /dev/xxx /mnt

/mnt will be the snap directory of whatever the default subvolume
is.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 22:58:13 +0800
12534832c Btrfs: make set/get functions for the super compat_ro flags use compat_ro ... Browse Code »

Our set/get functions for compat_ro_flags actually look at compat_flags. This
will mess any attempt to use compat flags up. The fix is obvious. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 22:55:10 +0800
ac8e9819d Btrfs: add search and inode lookup ioctls ... Browse Code »

The search ioctl is a generic tool for doing btree searches from
userland applications. The first user of the search ioctl is a
subvolume listing feature, but we'll also use it to find new
files in a subvolume.

The search ioctl allows you to specify min and max keys to search for,
along with min and max transid. It returns the items along with a
header that includes the item key.

Signed-off-by: Chris Mason

Chris Mason
2010-03-15 22:55:10 +0800
98d377a08 Btrfs: add a function to lookup a directory path by following backrefs ... Browse Code »

This will be used by the inode lookup ioctl.

Signed-off-by: TARUISI Hiroaki
Signed-off-by: Chris Mason

TARUISI Hiroaki
2010-03-15 22:55:09 +0800

09 Mar, 2010

3 commits

51d0f6d1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: kfree correct pointer during mount option parsing
Btrfs: use RB_ROOT to intialize rb_trees instead of setting rb_node to NULL

Linus Torvalds
2010-03-09 06:07:53 +0800
da495ecc0 Btrfs: kfree correct pointer during mount option parsing ... Browse Code »

We kstrdup the options string, but then strsep screws with the pointer,
so when we kfree() it, we're not giving it the right pointer.

Tested-by: Andy Lutomirski

Signed-off-by: Chris Mason

Josef Bacik
2010-03-09 05:26:50 +0800
6bef4d317 Btrfs: use RB_ROOT to intialize rb_trees instead of setting rb_node to NULL ... Browse Code »

btrfs inialize rb trees in quite a number of places by settin rb_node =
NULL; The problem with this is that 17d9ddc72fb8bba0d4f678 in the
linux-next tree adds a new field to that struct which needs to be NULL for
the new rbtree library code to work properly. This patch uses RB_ROOT as
the intializer so all of the relevant fields will be NULL'd. Without the
patch I get a panic.

Signed-off-by: Eric Paris
Acked-by: Venkatesh Pallipadi
Signed-off-by: Chris Mason

Eric Paris
2010-03-09 05:26:50 +0800

08 Mar, 2010

1 commit

52cf25d0a Driver core: Constify struct sysfs_ops in struct kobj_type ... Browse Code »

Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime

* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime

* potentially better optimized code as the compiler
can assume that the const data cannot be changed

* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing

Signed-off-by: Emese Revfy
Acked-by: David Teigland
Acked-by: Matt Domsch
Acked-by: Maciej Sosnowski
Acked-by: Hans J. Koch
Acked-by: Pekka Enberg
Acked-by: Jens Axboe
Acked-by: Stephen Hemminger
Signed-off-by: Greg Kroah-Hartman

Emese Revfy
2010-03-08 09:04:49 +0800

06 Mar, 2010

1 commit

a9185b41a pass writeback_control to ->write_inode ... Browse Code »

This gives the filesystem more information about the writeback that
is happening. Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-03-06 02:25:52 +0800

16 Feb, 2010

1 commit

0813e22d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: btrfs_mark_extent_written uses the wrong slot

Linus Torvalds
2010-02-16 11:56:21 +0800

13 Feb, 2010

1 commit

3f6fae955 Btrfs: btrfs_mark_extent_written uses the wrong slot ... Browse Code »

My test do: fallocate a big file and do write. The file is 512M, but
after file write is done btrfs-debug-tree shows:
item 6 key (257 EXTENT_DATA 0) itemoff 3516 itemsize 53
extent data disk byte 1103101952 nr 536870912
extent data offset 0 nr 399634432 ram 536870912
extent compression 0
Looks like a regression introducted by
6c7d54ac87f338c479d9729e8392eca3f76e11e1, where we set wrong slot.

Signed-off-by: Shaohua Li
Acked-by: Yan Zheng
Signed-off-by: Chris Mason

Shaohua Li
2010-02-13 05:47:19 +0800

05 Feb, 2010

2 commits

adbfbcd12 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: apply updated fallocate i_size fix
Btrfs: do not try and lookup the file extent when finishing ordered io
Btrfs: Fix oopsen when dropping empty tree.
Btrfs: remove BUG_ON() due to mounting bad filesystem
Btrfs: make error return negative in btrfs_sync_file()
Btrfs: fix race between allocate and release extent buffer.

Linus Torvalds
2010-02-05 23:23:03 +0800
23b5c5094 Btrfs: apply updated fallocate i_size fix ... Browse Code »

This version of the i_size fix for fallocate makes sure we only update
the i_size when the current fallocate is really operating outside of
i_size.

Signed-off-by: Chris Mason

Aneesh Kumar K.V
2010-02-05 00:33:03 +0800