Eric Lee / smarc-fsl-linux-kernel

30 Jun, 2017

1 commit

364ecf365 btrfs: qgroup: Introduce extent changeset for qgroup reserve functions ... Browse Code »

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.

Reviewed-by: Chandan Rajendra
Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800

28 Feb, 2017

1 commit

691fa0596 btrfs: all btrfs_delalloc_release_metadata take btrfs_inode ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2017-02-28 18:30:07 +0800

17 Feb, 2017

1 commit

77ab86bf1 btrfs: free-space-cache, clean up unnecessary root arguments ... Browse Code »

The free space cache APIs accept a root but always use the tree root.

Also, btrfs_truncate_free_space_cache accepts a root AND an inode but
the inode always points to the root anyway, so let's just pass the inode.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2017-02-17 19:03:56 +0800

06 Dec, 2016

3 commits

2ff7e61e0 btrfs: take an fs_info directly when the root is not used otherwise ... Browse Code »

There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer. Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:59 +0800
0b246afa6 btrfs: root->fs_info cleanup, add fs_info convenience variables ... Browse Code »

In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable. This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:59 +0800
27965b6c2 btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size ... Browse Code »

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:58 +0800

27 Sep, 2016

1 commit

ab8d0fc48 btrfs: convert pr_* to btrfs_* where possible ... Browse Code »

For many printks, we want to know which file system issued the message.

This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.

fs/btrfs/check-integrity.c is left alone for another day.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2016-09-27 01:37:04 +0800

25 Aug, 2016

1 commit

18513091a btrfs: update btrfs_space_info's bytes_may_use timely ... Browse Code »

This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
#!/bin/bash
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
dev=$(losetup --show -f fs.img)
mkfs.btrfs -f -M $dev
mkdir /tmp/mntpoint
mount $dev /tmp/mntpoint
cd /tmp/mntpoint
xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile

Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
| bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
|-> btrfs_reserve_extent()
|-> btrfs_add_reserved_bytes()
| alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
| change bytes_may_use, and bytes_reserved += 64M. Now
| bytes_may_use + bytes_reserved == 128M, which is greater
| than btrfs_space_info's total_bytes, false enospc occurs.
| Note, the bytes_may_use decrease operation will be done in
| end of btrfs_fallocate(), which is too late.

Here is another simple case for buffered write:
CPU 1 | CPU 2
|
|-> cow_file_range() |-> __btrfs_buffered_write()
|-> btrfs_reserve_extent() | |
| | |
| | |
| ..... | |-> btrfs_check_data_free_space()
| |
| |
|-> extent_clear_unlock_delalloc() |

In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction in order to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.

To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.

As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.

Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.

Signed-off-by: Wang Xiaoguang
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba
Signed-off-by: Chris Mason

Wang Xiaoguang
2016-08-25 18:58:26 +0800

26 Jul, 2016

2 commits

66642832f btrfs: btrfs_abort_transaction, drop root parameter ... Browse Code »

__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer. We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-07-26 19:54:26 +0800
3cdde2240 btrfs: btrfs_test_opt and friends should take a btrfs_fs_info ... Browse Code »

btrfs_test_opt and friends only use the root pointer to access
the fs_info. Let's pass the fs_info directly in preparation to
eliminate similar patterns all over btrfs.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-07-26 19:53:16 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

12 Mar, 2016

1 commit

3c1d84b71 Btrfs: Show a warning message if one of objectid reaches its highest value ... Browse Code »

It's better to show a warning message for the exceptional case
that one of objectid (in most case, inode number) reaches its
highest value. For example, if inode cache is off and this event
happens, we can't create any file even if there are not so many files.
This message ease detecting such problem.

Signed-off-by: Satoru Takeuchi
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Satoru Takeuchi
2016-03-12 00:12:35 +0800

20 Jan, 2016

1 commit

326f78428 Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kda… ... Browse Code »

…ve/linux into for-linus-4.5

Chris Mason
2016-01-20 10:21:30 +0800

16 Jan, 2016

1 commit

f32e48e92 Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots ... Browse Code »

The following call trace is seen when btrfs/031 test is executed in a loop,

[ 158.661848] ------------[ cut here ]------------
[ 158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
[ 158.664102] BTRFS: Transaction aborted (error -2)
[ 158.664774] Modules linked in:
[ 158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711a #2
[ 158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 158.667392] ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
[ 158.668515] ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
[ 158.669647] ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
[ 158.670769] Call Trace:
[ 158.671153] [] dump_stack+0x44/0x5c
[ 158.671884] [] warn_slowpath_common+0x81/0xc0
[ 158.672769] [] warn_slowpath_fmt+0x47/0x50
[ 158.673620] [] create_subvol+0x3d1/0x6ea
[ 158.674440] [] btrfs_mksubvol.isra.30+0x369/0x520
[ 158.675376] [] ? percpu_down_read+0x1a/0x50
[ 158.676235] [] btrfs_ioctl_snap_create_transid+0x101/0x180
[ 158.677268] [] btrfs_ioctl_snap_create+0x52/0x70
[ 158.678183] [] btrfs_ioctl+0x474/0x2f90
[ 158.678975] [] ? vma_merge+0xee/0x300
[ 158.679751] [] ? alloc_pages_vma+0x91/0x170
[ 158.680599] [] ? lru_cache_add_active_or_unevictable+0x22/0x70
[ 158.681686] [] ? selinux_file_ioctl+0xff/0x1d0
[ 158.682581] [] do_vfs_ioctl+0x2c1/0x490
[ 158.683399] [] ? security_file_ioctl+0x3e/0x60
[ 158.684297] [] SyS_ioctl+0x74/0x80
[ 158.685051] [] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 158.685958] ---[ end trace 4b63312de5a2cb76 ]---
[ 158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
[ 158.709508] BTRFS info (device loop0): forced readonly
[ 158.737113] BTRFS info (device loop0): disk space caching is enabled
[ 158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
[ 158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30

This occurs because,

Mount filesystem
Create subvol with ID 257
Unmount filesystem
Mount filesystem
Delete subvol with ID 257
btrfs_drop_snapshot()
Add root corresponding to subvol 257 into
btrfs_transaction->dropped_roots list
Create new subvol (i.e. create_subvol())
257 is returned as the next free objectid
btrfs_read_fs_root_no_name()
Finds the btrfs_root instance corresponding to the old subvol with ID 257
in btrfs_fs_info->fs_roots_radix.
Returns error since btrfs_root_item->refs has the value of 0.

To fix the issue the commit initializes tree root's and subvolume root's
highest_objectid when loading the roots from disk.

Signed-off-by: Chandan Rajendra
Signed-off-by: David Sterba

Chandan Rajendra
2016-01-16 02:25:02 +0800

11 Jan, 2016

1 commit

b28cf5724 Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/kdave/linux into for-linus-4.5

Signed-off-by: Chris Mason <clm@fb.com>

Chris Mason
2016-01-11 22:08:37 +0800

07 Jan, 2016

3 commits

e4058b54d btrfs: cleanup, use enum values for btrfs_path reada ... Browse Code »

Replace the integers by enums for better readability. The value 2 does
not have any meaning since a717531942f488209dded30f6bc648167bcefa72
"Btrfs: do less aggressive btree readahead" (2009-01-22).

Signed-off-by: David Sterba

David Sterba
2016-01-07 22:01:15 +0800
20e5506ba btrfs: constify remaining structs with function pointers ... Browse Code »

* struct extent_io_ops
* struct btrfs_free_space_op

Signed-off-by: David Sterba

David Sterba
2016-01-07 22:01:14 +0800
ee22184b5 Btrfs: use linux/sizes.h to represent constants ... Browse Code »

We use many constants to represent size and offset value. And to make
code readable we use '256 * 1024 * 1024' instead of '268435456' to
represent '256MB'. However we can make far more readable with 'SZ_256MB'
which is defined in the 'linux/sizes.h'.

So this patch replaces 'xxx * 1024 * 1024' kind of expression with
single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is
not a power of 2. And I haven't touched to '4096' & '8192' because it's
more intuitive than 'SZ_4KB' & 'SZ_8KB'.

Signed-off-by: Byongho Lee
Signed-off-by: David Sterba

Byongho Lee
2016-01-07 21:38:02 +0800

22 Oct, 2015

2 commits

7cf5b9765 btrfs: qgroup: Cleanup old inaccurate facilities ... Browse Code »

Cleanup the old facilities which use old btrfs_qgroup_reserve() function
call, replace them with the newer version, and remove the "__" prefix in
them.

Also, make btrfs_qgroup_reserve/free() functions private, as they are
now only used inside qgroup codes.

Now, the whole btrfs qgroup is swithed to use the new reserve facilities.

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-10-22 09:41:06 +0800
df480633b btrfs: extent-tree: Switch to new delalloc space reserve and release ... Browse Code »

Use new __btrfs_delalloc_reserve_space() and
__btrfs_delalloc_release_space() to reserve and release space for
delalloc.

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-10-22 09:41:05 +0800

01 Jul, 2015

2 commits

ae9d8f171 Btrfs: fix race between caching kthread and returning inode to inode cache ... Browse Code »

While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.

This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.

This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:

[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075] ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075] ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075] ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075] [] dump_stack+0x4f/0x7b
[132408.505075] [] ? console_unlock+0x356/0x3a2
[132408.505075] [] __slab_error.isra.28+0x25/0x36
[132408.505075] [] __cache_free+0xe2/0x4b6
[132408.505075] [] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075] [] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075] [] ? time_hardirqs_off+0x15/0x28
[132408.505075] [] ? trace_hardirqs_off+0xd/0xf
[132408.505075] [] ? kfree+0xb6/0x14e
[132408.505075] [] kfree+0xe5/0x14e
[132408.505075] [] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075] [] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075] [] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075] [] kthread+0xef/0xf7
[132408.505075] [] ? time_hardirqs_on+0x15/0x28
[132408.505075] [] ? __kthread_parkme+0xad/0xad
[132408.505075] [] ret_from_fork+0x42/0x70
[132408.505075] [] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!

Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.

Fixes: 1c70d8fb4dfa ("Btrfs: fix inode caching vs tree log")
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2015-07-01 05:36:46 +0800
c3f4a1685 Btrfs: use kmem_cache_free when freeing entry in inode cache ... Browse Code »

The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:

/*
* (...)
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/

So better be safe and use kmem_cache_free().

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Filipe Manana
2015-07-01 05:36:46 +0800

11 Apr, 2015

1 commit

1bbc621ef Btrfs: allow block group cache writeout outside critical section in commit ... Browse Code »

We loop through all of the dirty block groups during commit and write
the free space cache. In order to make sure the cache is currect, we do
this while no other writers are allowed in the commit.

If a large number of block groups are dirty, this can introduce long
stalls during the final stages of the commit, which can block new procs
trying to change the filesystem.

This commit changes the block group cache writeout to take appropriate
locks and allow it to run earlier in the commit. We'll still have to
redo some of the block groups, but it means we can get most of the work
out of the way without blocking the entire FS.

Signed-off-by: Chris Mason

Chris Mason
2015-04-11 05:07:22 +0800

03 Dec, 2014

1 commit

55507ce36 Btrfs: fix race between writing free space cache and trimming ... Browse Code »

Trimming is completely transactionless, and the way it operates consists
of hiding free space entries from a block group, perform the trim/discard
and then make the free space entries visible again.
Therefore while a free space entry is being trimmed, we can have free space
cache writing running in parallel (as part of a transaction commit) which
will miss the free space entry. This means that an unmount (or crash/reboot)
after that transaction commit and mount again before another transaction
starts/commits after the discard finishes, we will have some free space
that won't be used again unless the free space cache is rebuilt. After the
unmount, fsck (btrfsck, btrfs check) reports the issue like the following
example:

*** fsck.btrfs output ***
checking extents
checking free space cache
There is no free space entry for 521764864-521781248
There is no free space entry for 521764864-1103101952
cache appears valid but isnt 29360128
Checking filesystem on /dev/sdc
UUID: b4789e27-4774-4626-98e9-ae8dfbfb0fb5
found 1235681286 bytes used err is -22
(...)

Another issue caused by this race is a crash while writing bitmap entries
to the cache, because while the cache writeout task accesses the bitmaps,
the trim task can be concurrently modifying the bitmap or worse might
be freeing the bitmap. The later case results in the following crash:

[55650.804460] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[55650.804835] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop parport_pc parport i2c_piix4 psmouse evdev pcspkr microcode processor i2ccore serio_raw thermal_sys button ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif sr_mod cdrom crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 [last unloaded: btrfs]
[55650.806169] CPU: 1 PID: 31002 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1
[55650.806493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[55650.806867] task: ffff8800b12f6410 ti: ffff880071538000 task.ti: ffff880071538000
[55650.807166] RIP: 0010:[] [] write_bitmap_entries+0x65/0xbb [btrfs]
[55650.807514] RSP: 0018:ffff88007153bc30 EFLAGS: 00010246
[55650.807687] RAX: 000000005d1ec000 RBX: ffff8800a665df08 RCX: 0000000000000400
[55650.807885] RDX: ffff88005d1ec000 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88005d1ec000
[55650.808017] RBP: ffff88007153bc58 R08: 00000000ddd51536 R09: 00000000000001e0
[55650.808017] R10: 0000000000000000 R11: 0000000000000037 R12: 6b6b6b6b6b6b6b6b
[55650.808017] R13: ffff88007153bca8 R14: 6b6b6b6b6b6b6b6b R15: ffff88007153bc98
[55650.808017] FS: 0000000000000000(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
[55650.808017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[55650.808017] CR2: 0000000002273b88 CR3: 00000000b18f6000 CR4: 00000000000006e0
[55650.808017] Stack:
[55650.808017] ffff88020e834e00 ffff880172d68db0 0000000000000000 ffff88019257c800
[55650.808017] ffff8801d42ea720 ffff88007153bd10 ffffffffa037d2fa ffff880224e99180
[55650.808017] ffff8801469a6188 ffff880224e99140 ffff880172d68c50 00000003000000b7
[55650.808017] Call Trace:
[55650.808017] [] __btrfs_write_out_cache+0x1ea/0x37f [btrfs]
[55650.808017] [] btrfs_write_out_cache+0xa1/0xd8 [btrfs]
[55650.808017] [] btrfs_write_dirty_block_groups+0x4b5/0x505 [btrfs]
[55650.808017] [] commit_cowonly_roots+0x15e/0x1f7 [btrfs]
[55650.808017] [] ? _raw_spin_lock+0xe/0x10
[55650.808017] [] btrfs_commit_transaction+0x411/0x882 [btrfs]
[55650.808017] [] transaction_kthread+0xf2/0x1a4 [btrfs]
[55650.808017] [] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
[55650.808017] [] kthread+0xb7/0xbf
[55650.808017] [] ? __kthread_parkme+0x67/0x67
[55650.808017] [] ret_from_fork+0x7c/0xb0
[55650.808017] [] ? __kthread_parkme+0x67/0x67
[55650.808017] Code: 4c 89 ef 8d 70 ff e8 d4 fc ff ff 41 8b 45 34 41 39 45 30 7d 5c 31 f6 4c 89 ef e8 80 f6 ff ff 49 8b 7d 00 4c 89 f6 b9 00 04 00 00 a5 4c 89 ef 41 8b 45 30 8d 70 ff e8 a3 fc ff ff 41 8b 45 34
[55650.808017] RIP [] write_bitmap_entries+0x65/0xbb [btrfs]
[55650.808017] RSP
[55650.815725] ---[ end trace 1c032e96b149ff86 ]---

Fix this by serializing both tasks in such a way that cache writeout
doesn't wait for the trim/discard of free space entries to finish and
doesn't miss any free space entry.

Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-12-03 10:35:09 +0800

12 Nov, 2014

1 commit

7e1876aca btrfs: switch inode_cache option handling to pending changes ... Browse Code »

The pending mount option(s) now share namespace and bits with the normal
options, and the existing one for (inode_cache) is unset unconditionally
at each transaction commit.

Introduce a separate namespace for pending changes and enhance the
descriptions of the intended change to use separate bits for each
action.

Signed-off-by: David Sterba

David Sterba
2014-11-12 23:53:13 +0800

18 Sep, 2014

1 commit

57cdc8db2 btrfs: cleanup ino cache members of btrfs_root ... Browse Code »

The naming is confusing, generic yet used for a specific cache. Add a
prefix 'ino_' or rename appropriately.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2014-09-18 04:37:09 +0800

10 Jun, 2014

1 commit

67a77eb14 btrfs: remove newline from inode cache kthread name ... Browse Code »

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2014-06-10 08:20:53 +0800

25 Apr, 2014

2 commits

1c70d8fb4 Btrfs: fix inode caching vs tree log ... Browse Code »

Currently, with inode cache enabled, we will reuse its inode id immediately
after unlinking file, we may hit something like following:

|->iput inode
|->return inode id into inode cache
|->create dir,fsync
|->power off

An easy way to reproduce this problem is:

mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt -o inode_cache,commit=100
dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
inode_id=`ls -i /mnt/data | awk '{print $1}'`
rm -f /mnt/data

i=1
while [ 1 ]
do
mkdir /mnt/dir_$i
test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
if [ $test1 -eq $inode_id ]
then
dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
echo b > /proc/sysrq-trigger
fi
sleep 1
i=$(($i+1))
done

mount /dev/sdb /mnt
umount /dev/sdb
btrfs check /dev/sdb

We fix this problem by adding unlinked inode's id into pinned tree,
and we can not reuse them until committing transaction.

Cc: stable@vger.kernel.org
Signed-off-by: Miao Xie
Signed-off-by: Wang Shilong
Signed-off-by: Chris Mason

Miao Xie
2014-04-25 07:43:33 +0800
e60efa842 Btrfs: avoid triggering bug_on() when we fail to start inode caching task ... Browse Code »

When running stress test(including snapshots,balance,fstress), we trigger
the following BUG_ON() which is because we fail to start inode caching task.

[ 181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
[ 181.137963] invalid opcode: 0000 [#1] SMP
[ 181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
[ 181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
[ 181.367506] Call Trace:
[ 181.371107] [] btrfs_return_ino+0x9e/0x110 [btrfs]
[ 181.379191] [] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
[ 181.387464] [] ? autoremove_wake_function+0x40/0x40
[ 181.395642] [] evict+0x9e/0x190
[ 181.401882] [] iput+0xf3/0x180
[ 181.408025] [] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
[ 181.416614] [] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
[ 181.425399] [] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
[ 181.435059] [] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
[ 181.444148] [] btrfs_ioctl+0xf76/0x2b90 [btrfs]
[ 181.451971] [] ? handle_mm_fault+0x475/0xe80
[ 181.459509] [] ? __do_page_fault+0x1ec/0x520
[ 181.467046] [] ? do_mmap_pgoff+0x2f5/0x3c0
[ 181.474393] [] do_vfs_ioctl+0x2d8/0x4b0
[ 181.481450] [] SyS_ioctl+0x81/0xa0
[ 181.488021] [] system_call_fastpath+0x16/0x1b

We should avoid triggering BUG_ON() here, instead, we output warning messages
and clear inode_cache option.

Signed-off-by: Wang Shilong
Signed-off-by: Chris Mason

Wang Shilong
2014-04-25 07:43:32 +0800

07 Apr, 2014

1 commit

9e351cc86 Btrfs: remove transaction from send ... Browse Code »

Lets try this again. We can deadlock the box if we send on a box and try to
write onto the same fs with the app that is trying to listen to the send pipe.
This is because the writer could get stuck waiting for a transaction commit
which is being blocked by the send. So fix this by making sure looking at the
commit roots is always going to be consistent. We do this by keeping track of
which roots need to have their commit roots swapped during commit, and then
taking the commit_root_sem and swapping them all at once. Then make sure we
take a read lock on the commit_root_sem in cases where we search the commit root
to make sure we're always looking at a consistent view of the commit roots.
Previously we had problems with this because we would swap a fs tree commit root
and then swap the extent tree commit root independently which would cause the
backref walking code to screw up sometimes. With this patch we no longer
deadlock and pass all the weird send/receive corner cases. Thanks,

Reportedy-by: Hugo Mills
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-04-07 08:39:30 +0800

12 Nov, 2013

5 commits

fae7f21ce btrfs: Use WARN_ON()'s return value in place of WARN_ON(1) ... Browse Code »

Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source
code that outputs a more descriptive warnings. Also fix the styling
warning of redundant braces that came up as a result of this fix.

Signed-off-by: Dulshani Gunawardhana
Reviewed-by: Zach Brown
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Dulshani Gunawardhana
2013-11-12 11:11:53 +0800
ff76b0565 Btrfs: Don't allocate inode that is already in use ... Browse Code »

Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.

The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().

The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode's number is equal to the highest_ino + 1 (or >= this value
instead of > this value which was the off-by-one error), we mustn't add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.

When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().

One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.

One possible reproducer is this one:
#!/bin/sh
# preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2>/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/$[^/]*$$|\1|'`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
# the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
# if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
#for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
# one of the touch(1) calls in s3 fail due to EEXIST because the inode is
# already in use that btrfs_find_ino_for_alloc() returns.

Signed-off-by: Stefan Behrens
Reviewed-by: Jan Schmidt
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Stefan Behrens
2013-11-12 11:02:36 +0800
745143239 Btrfs: remove path arg from btrfs_truncate_free_space_cache ... Browse Code »

Not used for anything, and removing it avoids caller's need to
allocate a path structure.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2013-11-12 10:51:33 +0800
53645a91f Btrfs: remove duplicated ino cache's inode lookup ... Browse Code »

We're doing a unnecessary extra lookup of the ino cache's
inode when we already have it (and holding a reference)
during the process of saving the ino cache contents to disk.
Therefore remove this extra lookup.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2013-11-12 10:51:24 +0800
69e9c6c6d Btrfs: eliminate the exceptional root_tree refs=0 ... Browse Code »

The fact that btrfs_root_refs() returned 0 for the tree_root caused
bugs in the past, therefore it is set to 1 with this patch and
(hopefully) all affected code is adapted to this change.

I verified this change by temporarily adding WARN_ON() checks
everywhere where btrfs_root_refs() is used, checking whether the
logic of the code is changed by btrfs_root_refs() returning 1
instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID.
With these added checks, I ran the xfstests './check -g auto'.

The two roots chunk_root and log_root_tree that are only referenced
by the superblock and the log_roots below the log_root_tree still
have btrfs_root_refs() == 0, only the tree_root is changed.

Signed-off-by: Stefan Behrens
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Stefan Behrens
2013-11-12 10:49:26 +0800

18 May, 2013

2 commits

7b61cd922 Btrfs: don't use global block reservation for inode cache truncation ... Browse Code »

It is very likely that there are lots of subvolumes/snapshots in the filesystem,
so if we use global block reservation to do inode cache truncation, we may hog
all the free space that is reserved in global rsv. So it is better that we do
the free space reservation for inode cache truncation by ourselves.

Cc: Tsutomu Itoh
Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:22 +0800
7cfa9e51d Btrfs: don't abort the current transaction if there is no enough space for inode cache ... Browse Code »

The filesystem with inode cache was forced to be read-only when we umounted it.

Steps to reproduce:
# mkfs.btrfs -f ${DEV}
# mount -o inode_cache ${DEV} ${MNT}
# dd if=/dev/zero of=${MNT}/file1 bs=1M count=8192
# btrfs fi syn ${MNT}
# dd if=${MNT}/file1 of=/dev/null bs=1M
# rm -f ${MNT}/file1
# btrfs fi syn ${MNT}
# umount ${MNT}

It is because there was no enough space to do inode cache truncation, and then
we aborted the current transaction.

But no space error is not a serious problem when we write out the inode cache,
and it is safe that we just skip this step if we meet this problem. So we need
not abort the current transaction.

Reported-by: Tsutomu Itoh
Signed-off-by: Miao Xie
Tested-by: Tsutomu Itoh
Signed-off-by: Josef Bacik

Miao Xie
2013-05-18 09:40:21 +0800

12 Dec, 2012

1 commit

08e007d2e Btrfs: improve the noflush reservation ... Browse Code »

In some places(such as: evicting inode), we just can not flush the reserved
space of delalloc, flushing the delayed directory index and delayed inode
is OK, but we don't try to flush those things and just go back when there is
no enough space to be reserved. This patch fixes this problem.

We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
If we can in the transaction, we should not flush anything, or the deadlock
would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
and we will flush all things.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-12-12 02:31:31 +0800

29 Mar, 2012

1 commit

2bcc0328c Btrfs: show useful info in space reservation tracepoint ... Browse Code »

o For space info, the type of space info is useful for debug.
o For transaction handle, its transid is useful.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-03-29 21:57:44 +0800

22 Mar, 2012

1 commit

79787eaab btrfs: replace many BUG_ONs with proper error handling ... Browse Code »
43

btrfs currently handles most errors with BUG_ON. This patch is a work-in-
progress but aims to handle most errors other than internal logic
errors and ENOMEM more gracefully.

This iteration prevents most crashes but can run into lockups with
the page lock on occasion when the timing "works out."

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 18:52:54 +0800