Eric Lee / smarc-fsl-linux-kernel

28 Jul, 2011

7 commits

a65917156 Btrfs: stop using highmem for extent_buffers ... Browse Code »

The extent_buffers have a very complex interface where
we use HIGHMEM for metadata and try to cache a kmap mapping
to access the memory.

The next commit adds reader/writer locks, and concurrent use
of this kmap cache would make it even more complex.

This commit drops the ability to use HIGHMEM with extent buffers,
and rips out all of the related code.

Signed-off-by: Chris Mason

Chris Mason
2011-07-28 00:46:45 +0800
199c36eaa Btrfs: fix BUG_ON() caused by ENOSPC when relocating space ... Browse Code »

When we balanced the chunks across the devices, BUG_ON() in
__finish_chunk_alloc() was triggered.

------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:2568!
[SNIP]
Call Trace:
[] btrfs_alloc_chunk+0x8e/0xa0 [btrfs]
[] do_chunk_alloc+0x330/0x3a0 [btrfs]
[] btrfs_reserve_extent+0xb4/0x1f0 [btrfs]
[] btrfs_alloc_free_block+0xdb/0x350 [btrfs]
[] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[] __btrfs_cow_block+0x14d/0x5e0 [btrfs]
[] ? read_block_for_search+0x14d/0x4d0 [btrfs]
[] btrfs_cow_block+0x10b/0x240 [btrfs]
[] btrfs_search_slot+0x49e/0x7a0 [btrfs]
[] btrfs_insert_empty_items+0x8d/0xf0 [btrfs]
[] insert_with_overflow+0x43/0x110 [btrfs]
[] btrfs_insert_dir_item+0xcd/0x1f0 [btrfs]
[] ? map_extent_buffer+0xb0/0xc0 [btrfs]
[] ? rb_insert_color+0x9d/0x160
[] ? inode_tree_add+0xf0/0x150 [btrfs]
[] btrfs_add_link+0xc1/0x1c0 [btrfs]
[] ? security_inode_init_security+0x1c/0x30
[] ? btrfs_init_acl+0x4a/0x180 [btrfs]
[] btrfs_add_nondir+0x2f/0x70 [btrfs]
[] ? btrfs_init_inode_security+0x46/0x60 [btrfs]
[] btrfs_create+0x150/0x1d0 [btrfs]
[] ? generic_permission+0x23/0xb0
[] vfs_create+0xa5/0xc0
[] do_last+0x5fe/0x880
[] path_openat+0xcd/0x3d0
[] do_filp_open+0x49/0xa0
[] ? alloc_fd+0x95/0x160
[] do_sys_open+0x107/0x1e0
[] ? audit_syscall_entry+0x1bf/0x1f0
[] sys_open+0x20/0x30
[] system_call_fastpath+0x16/0x1b
[SNIP]
RIP [] __finish_chunk_alloc+0x20a/0x220 [btrfs]

The reason is:
Task1 Space balance task
do_chunk_alloc()
__finish_chunk_alloc()
update device info
in the chunk tree
alloc system metadata block
relocate system metadata block group
set system metadata block group
readonly, This block group is the
only one that can allocate space. So
there is no free space that can be
allocated now.
find no space and don't try
to alloc new chunk, and then
return ENOSPC
BUG_ON() in __finish_chunk_alloc()
was triggered.

Fix this bug by allocating a new system metadata chunk before relocating the
old one if we find there is no free space which can be allocated after setting
the old block group to be read-only.

Reported-by: Tsutomu Itoh
Signed-off-by: Miao Xie
Tested-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Miao Xie
2011-07-28 00:46:45 +0800
f7aaa06bf Btrfs: tag pages for writeback in sync ... Browse Code »

Everybody else does this, we need to do it too. If we're syncing, we need to
tag the pages we're going to write for writeback so we don't end up writing the
same stuff over and over again if somebody is constantly redirtying our file.
This will keep us from having latencies with heavy sync workloads. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-07-28 00:46:44 +0800
9e0baf60d Btrfs: fix enospc problems with delalloc ... Browse Code »

So I had this brilliant idea to use atomic counters for outstanding and reserved
extents, but this turned out to be a bad idea. Consider this where we have 1
outstanding extent and 1 reserved extent

Reserver Releaser
atomic_dec(outstanding) now 0
atomic_read(outstanding)+1 get 1
atomic_read(reserved) get 1
don't actually reserve anything because
they are the same
atomic_cmpxchg(reserved, 1, 0)
atomic_inc(outstanding)
atomic_add(0, reserved)
free reserved space for 1 extent

Then the reserver now has no actual space reserved for it, and when it goes to
finish the ordered IO it won't have enough space to do it's allocation and you
get those lovely warnings.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-07-28 00:46:44 +0800
a59914280 Btrfs: don't flush delalloc arbitrarily ... Browse Code »

Kill the check to see if we have 512mb of reserved space in delalloc and
shrink_delalloc if we do. This causes unexpected latencies and we have other
logic to see if we need to throttle. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-07-28 00:46:43 +0800
a94733d0b Btrfs: use find_or_create_page instead of grab_cache_page ... Browse Code »

grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we
need GFP_NOFS so we don't deadlock. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-28 00:46:43 +0800
bab39bf99 Btrfs: use a worker thread to do caching ... Browse Code »

A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,

Reported-by: Roman Mamedov
Signed-off-by: Josef Bacik

Josef Bacik
2011-07-28 00:46:25 +0800

11 Jul, 2011

5 commits

df98b6e2c Btrfs: fix how we merge extent states and deal with cached states ... Browse Code »

First, we can sometimes free the state we're merging, which means anybody who
calls merge_state() may have the state it passed in free'ed. This is
problematic because we could end up caching the state, which makes caching
useless as the state will no longer be part of the tree. So instead of free'ing
the state we passed into merge_state(), set it's end to the other->end and free
the other state. This way we are sure to cache the correct state. Also because
we can merge states together, instead of only using the cache'd state if it's
start == the start we are looking for, go ahead and use it if the start we are
looking for is within the range of the cached state. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-11 22:00:48 +0800
2f356126c Btrfs: use the normal checksumming infrastructure for free space cache ... Browse Code »

We used to store the checksums of the space cache directly in the space cache,
however that doesn't work out too well if we have more space than we can fit the
checksums into the first page. So instead use the normal checksumming
infrastructure. There were problems with doing this originally but those
problems don't exist now so this works out fine. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-11 21:58:49 +0800
fdb5effd5 Btrfs: serialize flushers in reserve_metadata_bytes ... Browse Code »

We keep having problems with early enospc, and that's because our method of
making space is inherently racy. The problem is we can have one guy trying to
make space for himself, and in the meantime people come in and steal his
reservation. In order to stop this we make a waitqueue and put anybody who
comes into reserve_metadata_bytes on that waitqueue if somebody is trying to
make more space. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-11 21:58:48 +0800
b5009945b Btrfs: do transaction space reservation before joining the transaction ... Browse Code »

We have to do weird things when handling enospc in the transaction joining code.
Because we've already joined the transaction we cannot commit the transaction
within the reservation code since it will deadlock, so we have to return EAGAIN
and then make sure we don't retry too many times. Instead of doing this, just
do the reservation the normal way before we join the transaction, that way we
can do whatever we want to try and reclaim space, and then if it fails we know
for sure we are out of space and we can return ENOSPC. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-11 21:58:47 +0800
fa09200b8 Btrfs: try to only do one btrfs_search_slot in do_setxattr ... Browse Code »

I've been watching how many btrfs_search_slot()'s we do and I noticed that when
we create a file with selinux enabled we were doing 2 each time we initialize
the security context. That's because we lookup the xattr first so we can delete
it if we're setting a new value to an existing xattr. But in the create case we
don't have any xattrs, so it is completely useless to have the extra lookup. So
re-arrange things so that we only lookup first if we specifically have
XATTR_REPLACE. That way in the basic case we only do 1 search, and in the more
complicated case we do the normal 2 lookups. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-11 21:58:45 +0800

07 Jul, 2011

3 commits

149e2d76b btrfs: fix oops when doing space balance ... Browse Code »

We need to make sure the data relocation inode doesn't go through
the delayed metadata updates, otherwise we get an oops during balance:

kernel BUG at fs/btrfs/relocation.c:4303!
[SNIP]
Call Trace:
[] ? update_ref_for_cow+0x22d/0x330 [btrfs]
[] __btrfs_cow_block+0x451/0x5e0 [btrfs]
[] ? read_block_for_search+0x14d/0x4d0 [btrfs]
[] btrfs_cow_block+0x10b/0x240 [btrfs]
[] btrfs_search_slot+0x49e/0x7a0 [btrfs]
[] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
[] ? mutex_lock+0x1e/0x50
[] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
[] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
[] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
[] btrfs_commit_transaction+0x250/0x850 [btrfs]
[] ? find_get_pages+0x39/0x130
[] ? join_transaction+0x25/0x250 [btrfs]
[] ? wake_up_bit+0x40/0x40
[] prepare_to_relocate+0xda/0xf0 [btrfs]
[] relocate_block_group+0x4b/0x620 [btrfs]
[] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
[] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
[] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
[] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
[] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
[] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[] ? btrfs_previous_item+0xb1/0x150 [btrfs]
[] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[] btrfs_balance+0x21a/0x2b0 [btrfs]
[] btrfs_ioctl+0x798/0xd20 [btrfs]
[] ? handle_mm_fault+0x148/0x270
[] ? do_page_fault+0x1d8/0x4b0
[] do_vfs_ioctl+0x9a/0x540
[] sys_ioctl+0xa1/0xb0
[] system_call_fastpath+0x16/0x1b
[SNIP]
RIP [] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-07-07 06:51:53 +0800
508794eb5 Btrfs: don't panic if we get an error while balancing V2 ... Browse Code »

A user reported an error where if we try to balance an fs after a device has
been removed it will blow up. This is because we get an EIO back and this is
where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks,

Reported-by: Anand Jain
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-07-07 06:46:43 +0800
0942caa37 btrfs: add missing options displayed in mount output ... Browse Code »

There are three missed mount options settable by user which are not
currently displayed in mount output.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-07-07 06:46:43 +0800

27 Jun, 2011

1 commit

2f7e33d43 btrfs: fix inconsonant inode information ... Browse Code »

When iputting the inode, We may leave the delayed nodes if they have some
delayed items that have not been dealt with. So when the inode is read again,
we must look up the relative delayed node, and use the information in it to
initialize the inode. Or we will get inconsonant inode information, it may
cause that the same directory index number is allocated again, and hit the
following oops:

[ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
[ 5447.569766] ------------[ cut here ]------------
[ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
[SNIP]
[ 5447.790721] Call Trace:
[ 5447.793191] [] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
[ 5447.800156] [] btrfs_add_link+0x12b/0x191 [btrfs]
[ 5447.806517] [] btrfs_add_nondir+0x31/0x58 [btrfs]
[ 5447.812876] [] btrfs_create+0xf9/0x197 [btrfs]
[ 5447.818961] [] vfs_create+0x72/0x92
[ 5447.824090] [] do_last+0x22c/0x40b
[ 5447.829133] [] path_openat+0xc0/0x2ef
[ 5447.834438] [] ? __perf_event_task_sched_out+0x24/0x44
[ 5447.841216] [] ? perf_event_task_sched_out+0x59/0x67
[ 5447.847846] [] do_filp_open+0x3d/0x87
[ 5447.853156] [] ? strncpy_from_user+0x43/0x4d
[ 5447.859072] [] ? getname_flags+0x2e/0x80
[ 5447.864636] [] ? do_getname+0x14b/0x173
[ 5447.870112] [] ? audit_getname+0x16/0x26
[ 5447.875682] [] ? spin_lock+0xe/0x10
[ 5447.880882] [] do_sys_open+0x69/0xae
[ 5447.886153] [] sys_open+0x20/0x22
[ 5447.891114] [] system_call_fastpath+0x16/0x1b

Fix it by reusing the old delayed node.

Reported-by: Jim Schutt
Signed-off-by: Miao Xie
Tested-by: Jim Schutt
Signed-off-by: Chris Mason

Miao Xie
2011-06-27 23:34:27 +0800

25 Jun, 2011

3 commits

9b90f5135 Btrfs: make sure to update total_bitmaps when freeing cache V3 ... Browse Code »

A user reported this bug again where we have more bitmaps than we are supposed
to. This is because we failed to load the free space cache, but don't update
the ctl->total_bitmaps counter when we remove entries from the tree. This patch
fixes this problem and we should be good to go again. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-06-25 21:31:06 +0800
e0f540672 Btrfs: fix type mismatch in find_free_extent() ... Browse Code »

data parameter should be u64 because a full-sized chunk flags field is
passed instead of 0/1 for distinguishing data from metadata. All
underlying functions expect u64.

Signed-off-by: Ilya Dryomov
Signed-off-by: Chris Mason

Ilya Dryomov
2011-06-25 21:31:06 +0800
1973f0fae Btrfs: make sure to record the transid in new inodes ... Browse Code »

When we create a new inode, we aren't filling in the
field that records the transaction that last changed this
inode.

If we then go to fsync that inode, it will be skipped because the field
isn't filled in.

Signed-off-by: Chris Mason

Chris Mason
2011-06-25 01:13:29 +0800

18 Jun, 2011

7 commits

e999376f0 Btrfs: avoid delayed metadata items during commits ... Browse Code »

Snapshot creation has two phases. One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.

The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.

This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.

Signed-off-by: Chris Mason

Chris Mason
2011-06-18 04:38:47 +0800
35a30d7ce btrfs: fix uninitialized return value ... Browse Code »

When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.

Signed-off-by: David Sterba
Reviewed-by: Li Zefan
Signed-off-by: Chris Mason

David Sterba
2011-06-18 02:54:18 +0800
19fd29495 btrfs: fix wrong reservation when doing delayed inode operations ... Browse Code »

We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:

[ 9792.654889] ------------[ cut here ]------------
[ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
btrfs_alloc_free_block+0xca/0x27c [btrfs]()
[ 9792.654899] Hardware name: To Be Filled By O.E.M.
[ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 9792.654919] Pid: 2762, comm: rm Tainted: G W 2.6.39+ #1
[ 9792.654920] Call Trace:
[ 9792.654922] [] warn_slowpath_common+0x83/0x9b
[ 9792.654925] [] warn_slowpath_null+0x1a/0x1c
[ 9792.654933] [] btrfs_alloc_free_block+0xca/0x27c [btrfs]
[ 9792.654945] [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.654953] [] __btrfs_cow_block+0xfc/0x30c [btrfs]
[ 9792.654963] [] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs]
[ 9792.654970] [] ? read_block_for_search+0x94/0x368 [btrfs]
[ 9792.654978] [] btrfs_cow_block+0xfe/0x146 [btrfs]
[ 9792.654986] [] btrfs_search_slot+0x14d/0x4b6 [btrfs]
[ 9792.654997] [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.655022] [] btrfs_lookup_inode+0x2f/0x8f [btrfs]
[ 9792.655025] [] ? _cond_resched+0xe/0x22
[ 9792.655027] [] ? mutex_lock+0x29/0x50
[ 9792.655039] [] btrfs_update_delayed_inode+0x72/0x137 [btrfs]
[ 9792.655051] [] btrfs_run_delayed_items+0x90/0xdb [btrfs]
[ 9792.655062] [] btrfs_commit_transaction+0x228/0x654 [btrfs]
[ 9792.655064] [] ? remove_wait_queue+0x3a/0x3a
[ 9792.655075] [] btrfs_evict_inode+0x14d/0x202 [btrfs]
[ 9792.655077] [] evict+0x71/0x111
[ 9792.655079] [] iput+0x12a/0x132
[ 9792.655081] [] do_unlinkat+0x106/0x155
[ 9792.655083] [] ? path_put+0x1f/0x23
[ 9792.655085] [] ? audit_syscall_entry+0x145/0x171
[ 9792.655087] [] ? putname+0x34/0x36
[ 9792.655090] [] sys_unlinkat+0x29/0x2b
[ 9792.655092] [] system_call_fastpath+0x16/0x1b
[ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---

This patch fix it by setting the reservation of the transaction handle to the
correct one.

Reported-by: Josef Bacik
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-06-18 02:54:18 +0800
9fe6a50fb btrfs: Remove unused sysfs code ... Browse Code »

Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.

Signed-off-by: Maarten Lankhorst
Signed-off-by: Chris Mason

Maarten Lankhorst
2011-06-18 02:54:18 +0800
3ed4498ca btrfs: fix dereference of ERR_PTR value ... Browse Code »

smatch reports:

btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing
possible ERR_PTR()

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-06-18 02:54:17 +0800
e038dca80 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/… ... Browse Code »

…btrfs-work into for-linus

Conflicts:
fs/btrfs/transaction.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Chris Mason
2011-06-18 02:16:13 +0800
7585717f3 Btrfs: fix relocation races ... Browse Code »

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation. The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations. The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.

Signed-off-by: Chris Mason

Chris Mason
2011-06-18 01:36:58 +0800

16 Jun, 2011

3 commits

ed0ca1402 Btrfs: set no_trans_join after trying to expand the transaction ... Browse Code »

We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot. This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set. So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13. It cannot be reproduced with this patch. Thanks,

Reported-by: Jim Schutt
Signed-off-by: Josef Bacik

Josef Bacik
2011-06-16 01:24:47 +0800
8351583e3 Btrfs: protect the pending_snapshots list with trans_lock ... Browse Code »

Currently there is nothing protecting the pending_snapshots list on the
transaction. We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption. So protect
this list with the trans_lock. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-06-16 01:24:46 +0800
71d7aed01 Btrfs: fix path leakage on subvol deletion ... Browse Code »

The delayed ref patch accidently removed the btrfs_free_path in
btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-06-16 01:24:45 +0800

13 Jun, 2011

2 commits

f4c440162 Btrfs: drop the delalloc_bytes check in shrink_delalloc ... Browse Code »

Even when delalloc_bytes is zero, we may need to sleep while waiting
for delalloc space.

Signed-off-by: Chris Mason

Chris Mason
2011-06-13 23:30:47 +0800
ac08aedfa Btrfs: check the return value from set_anon_super ... Browse Code »

Al Viro noticed we weren't checking for set_anon_super failures. This
adds the required checks.

Signed-off-by: Chris Mason

Chris Mason
2011-06-13 23:28:50 +0800

11 Jun, 2011

9 commits

30b4caf5d Btrfs: use join_transaction in btrfs_evict_inode() ... Browse Code »

The WARN_ON() in start_transaction() was triggered while balancing.

The cause is btrfs_relocate_chunk() started a transaction and
then called iput() on the inode that stores free space cache,
and iput() called btrfs_start_transaction() again.

Reported-by: Tsutomu Itoh
Signed-off-by: Li Zefan
Reviewed-by: Josef Bacik
Signed-off-by: Chris Mason

Li Zefan
2011-06-11 20:31:55 +0800
22b63a297 Btrfs - use %pU to print fsid ... Browse Code »

Get rid of FIXME comment. Uuids from dmesg are now the same as uuids
given by btrfs-progs.

Signed-off-by: Ilya Dryomov
Signed-off-by: Chris Mason

Ilya Dryomov
2011-06-11 07:02:04 +0800
08d2f347e Btrfs: fix extent state leak on failed nodatasum reads ... Browse Code »

When encountering an EIO while reading from a nodatasum extent, we
insert an error record into the inode's failure tree.
btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd
better clear the failure tree in that case, otherwise the kernel
complains about

BUG extent_state: Objects remaining on kmem_cache_close()

on rmmod.

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2011-06-11 07:00:53 +0800
0e735872f Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/b… ... Browse Code »

…trfs-unstable-arne into for-linus

Chris Mason
2011-06-11 06:58:08 +0800
5be76758f btrfs: fix unlocked access of delalloc_inodes ... Browse Code »

list_splice_init will make delalloc_inodes empty, but without a spinlock
around, this may produce corrupted list head, accessed in many placess,
The race window is very tight and nobody seems to have hit it so far.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-06-11 06:57:11 +0800
027ed2f00 Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() ... Browse Code »

The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so
don't declare the variable on stack.

Signed-off-by: Li Zefan
Reviewed-by: Josef Bacik
Signed-off-by: Chris Mason

Li Zefan
2011-06-11 06:57:10 +0800
9eb9104c6 btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline ... Browse Code »

Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit
builds. This shrinks its size to 128 bytes allowing it to fit into one
fewer cache lines and allows more objects per slab in its kmem_cache.

slabinfo extent_buffer reports :-

before:-
Sizes (bytes) Slabs
----------------------------------
Object : 136 Total : 123
SlabObj: 136 Full : 121
SlabSiz: 4096 Partial: 0
Loss : 0 CpuSlab: 2
Align : 8 Objects: 30

after :-
Object : 128 Total : 4
SlabObj: 128 Full : 2
SlabSiz: 4096 Partial: 0
Loss : 0 CpuSlab: 2
Align : 8 Objects: 32

Signed-off-by: Richard Kennedy
Signed-off-by: Chris Mason

richard kennedy
2011-06-11 06:57:10 +0800
38e880540 Btrfs: clear current->journal_info on async transaction commit ... Browse Code »

Normally current->jouranl_info is cleared by commit_transaction. For an
async snap or subvol creation, though, it runs in a work queue. Clear
it in btrfs_commit_transaction_async() to avoid leaking a non-NULL
journal_info when we return to userspace. When the actual commit runs in
the other thread it won't care that it's current->journal_info is already
NULL.

Signed-off-by: Sage Weil
Tested-by: Jim Schutt
Signed-off-by: Chris Mason

Sage Weil
2011-06-11 04:42:29 +0800
38e878806 Btrfs: make sure to recheck for bitmaps in clusters ... Browse Code »

Josef recently changed the free extent cache to look in
the block group cluster for any bitmaps before trying to
add a new bitmap for the same offset. This avoids BUG_ON()s due
covering duplicate ranges.

But it didn't go quite far enough. A given free range might span
between one or more bitmaps or free space entries. The code has
looping to cover this, but it doesn't check for clustered bitmaps
every time.

This shuffles our gotos to check for a bitmap in the cluster
for every new bitmap entry we try to add.

Signed-off-by: Chris Mason

Chris Mason
2011-06-11 04:36:57 +0800