Eric Lee / smarc-fsl-linux-kernel

30 Oct, 2017

1 commit

6939f6672 Btrfs: fix confusing worker helper info in stacktrace ... Browse Code »

We've seen the following backtrace stack in ftrace or dmesg log,

kworker/u16:10-4244 [000] 241942.480955: function: btrfs_put_ordered_extent
kworker/u16:10-4244 [000] 241942.480956: kernel_stack:
=> finish_ordered_fn (ffffffffa0384475)
=> btrfs_scrubparity_helper (ffffffffa03ca577) btrfs_freespace_write_helper (ffffffffa03ca98e) process_one_work (ffffffff81117b2f)
=> worker_thread (ffffffff81118c2a)
=> kthread (ffffffff81121de0)
=> ret_from_fork (ffffffff81d7087a)

btrfs_freespace_write_helper is actually calling normal_worker_helper
instead of btrfs_scrubparity_helper, so somehow kernel has parsed the
incorrect function address while unwinding the stack,
btrfs_scrubparity_helper really shouldn't be shown up.

It's caused by compiler doing inline for our helper function, adding a
noinline tag can fix that.

Signed-off-by: Liu Bo
Reviewed-by: David Sterba
[ use noinline_for_stack ]
Signed-off-by: David Sterba

Liu Bo
2017-10-30 19:27:57 +0800

16 Aug, 2017

1 commit

9a35b6372 btrfs: constify tracepoint arguments ... Browse Code »

Tracepoint arguments are all read-only. If we mark the arguments
as const, we're able to keep or convert those arguments to const
where appropriate.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2017-08-16 20:19:53 +0800

09 Jan, 2017

1 commit

ac0c7cf8b btrfs: fix crash when tracepoint arguments are freed by wq callbacks ... Browse Code »

Enabling btrfs tracepoints leads to instant crash, as reported. The wq
callbacks could free the memory and the tracepoints started to
dereference the members to get to fs_info.

The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2
removed the tracepoints but we could preserve them by passing only the
required data in a safe way.

Fixes: bc074524e123 ("btrfs: prefix fsid to all trace events")
CC: stable@vger.kernel.org # 4.8+
Reported-by: Sebastian Andrzej Siewior
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2017-01-09 18:24:50 +0800

14 Dec, 2016

1 commit

2939e1a86 btrfs: limit async_work allocation and worker func duration ... Browse Code »

Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).

Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):

[root@kteam1 ~]# cat prep.sh

DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
mkdir /mnt/$i
btrfs subvolume create /mnt/SV_$i
ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
chmod a+rwx /mnt/$i
done

[root@kteam1 ~]# sh prep.sh

[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done

[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0
kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0
kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0
kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0

The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.

The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.

The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items >=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.

The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.

Changed in v2: remove support of thresh == NO_THRESHOLD.

Signed-off-by: Maxim Patlasov
Signed-off-by: Chris Mason
Cc: stable@vger.kernel.org # v3.15+

Maxim Patlasov
2016-12-14 03:01:30 +0800

26 Jul, 2016

1 commit

cb001095c btrfs: plumb fs_info into btrfs_work ... Browse Code »

In order to provide an fsid for trace events, we'll need a btrfs_fs_info
pointer. The most lightweight way to do that for btrfs_work structures
is to associate it with the __btrfs_workqueue structure. Each queued
btrfs_work structure has a workqueue associated with it, so that's
a natural fit. It's a privately defined structures, so we add accessors
to retrieve the fs_info pointer.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-07-26 19:53:15 +0800

26 Jan, 2016

1 commit

0a95b8513 btrfs: async-thread: Fix a use-after-free error for trace ... Browse Code »

Parameter of trace_btrfs_work_queued() can be freed in its workqueue.
So no one use use that pointer after queue_work().

Fix the user-after-free bug by move the trace line before queue_work().

Reported-by: Dave Jones
Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Qu Wenruo
2016-01-26 08:50:26 +0800

03 Dec, 2015

1 commit

61dd5ae65 btrfs: use GFP_KERNEL for allocations of workqueues ... Browse Code »

We don't have to use GFP_NOFS to allocate workqueue structures, this is
done from mount context or potentially scrub start context, safe to fail
in both cases.

Signed-off-by: David Sterba

David Sterba
2015-12-03 22:03:43 +0800

01 Sep, 2015

1 commit

c6dd6ea55 btrfs: async_thread: Fix workqueue 'max_active' value when initializing ... Browse Code »

At initializing time, for threshold-able workqueue, it's max_active
of kernel workqueue should be 1 and grow if it hits threshold.

But due to the bad naming, there is both 'max_active' for kernel
workqueue and btrfs workqueue.
So wrong value is given at workqueue initialization.

This patch fixes it, and to avoid further misunderstanding, change the
member name of btrfs_workqueue to 'current_active' and 'limit_active'.

Also corresponding comment is added for readability.

Reported-by: Alex Lyakas
Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-09-01 02:46:40 +0800

10 Jun, 2015

1 commit

20b2e3029 btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx() ... Browse Code »

lockdep report following warning in test:
[25176.843958] =================================
[25176.844519] [ INFO: inconsistent lock state ]
[25176.845047] 4.1.0-rc3 #22 Tainted: G W
[25176.845591] ---------------------------------
[25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
[25176.847246] (&wr_ctx->wr_lock){+.?...}, at: [] scrub_free_ctx+0x2d/0xf0 [btrfs]
[25176.847838] {SOFTIRQ-ON-W} state was registered at:
[25176.848396] [] __lock_acquire+0x6a0/0xe10
[25176.848955] [] lock_acquire+0xce/0x2c0
[25176.849491] [] mutex_lock_nested+0x7f/0x410
[25176.850029] [] scrub_stripe+0x4df/0x1080 [btrfs]
[25176.850575] [] scrub_chunk.isra.19+0x111/0x130 [btrfs]
[25176.851110] [] scrub_enumerate_chunks+0x27c/0x510 [btrfs]
[25176.851660] [] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
[25176.852189] [] btrfs_dev_replace_start+0x36e/0x450 [btrfs]
[25176.852771] [] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
[25176.853315] [] do_vfs_ioctl+0x318/0x570
[25176.853868] [] SyS_ioctl+0x41/0x80
[25176.854406] [] system_call_fastpath+0x12/0x6f
[25176.854935] irq event stamp: 51506
[25176.855511] hardirqs last enabled at (51506): [] vprintk_emit+0x225/0x5e0
[25176.856059] hardirqs last disabled at (51505): [] vprintk_emit+0xb7/0x5e0
[25176.856642] softirqs last enabled at (50886): [] __do_softirq+0x363/0x640
[25176.857184] softirqs last disabled at (50949): [] irq_exit+0x10d/0x120
[25176.857746]
other info that might help us debug this:
[25176.858845] Possible unsafe locking scenario:
[25176.859981] CPU0
[25176.860537] ----
[25176.861059] lock(&wr_ctx->wr_lock);
[25176.861705]
[25176.862272] lock(&wr_ctx->wr_lock);
[25176.862881]
*** DEADLOCK ***

Reason:
Above warning is caused by:
Interrupt
-> bio_endio()
-> ...
-> scrub_put_ctx()
-> scrub_free_ctx() *1
-> ...
-> mutex_lock(&wr_ctx->wr_lock);

scrub_put_ctx() is allowed to be called in end_bio interrupt, but
in code design, it will never call scrub_free_ctx(sctx) in interrupe
context(above *1), because btrfs_scrub_dev() get one additional
reference of sctx->refs, which makes scrub_free_ctx() only called
withine btrfs_scrub_dev().

Now the code runs out of our wish, because free sequence in
scrub_pending_bio_dec() have a gap.

Current code:
-----------------------------------+-----------------------------------
scrub_pending_bio_dec() | btrfs_scrub_dev
-----------------------------------+-----------------------------------
atomic_dec(&sctx->bios_in_flight); |
wake_up(&sctx->list_wait); |
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
scrub_put_ctx(sctx); |
-> atomic_dec_and_test(&sctx->refs)|
-> scrub_free_ctx() |
-----------------------------------+-----------------------------------

We expected:
-----------------------------------+-----------------------------------
scrub_pending_bio_dec() | btrfs_scrub_dev
-----------------------------------+-----------------------------------
atomic_dec(&sctx->bios_in_flight); |
wake_up(&sctx->list_wait); |
scrub_put_ctx(sctx); |
-> atomic_dec_and_test(&sctx->refs)|
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
| -> scrub_free_ctx()
-----------------------------------+-----------------------------------

Fix:
Move scrub_pending_bio_dec() to a workqueue, to avoid this function run
in interrupt context.
Tested by check tracelog in debug.

Changelog v1->v2:
Use workqueue instead of adjust function call sequence in v1,
because v1 will introduce a bug pointed out by:
Filipe David Manana

Reported-by: Qu Wenruo
Signed-off-by: Zhao Lei
Reviewed-by: Filipe Manana
Signed-off-by: Chris Mason

Zhao Lei
2015-06-10 22:04:52 +0800

17 Feb, 2015

1 commit

6f0110581 btrfs: use correct type for workqueue flags ... Browse Code »

Through all the local wrappers to alloc_workqueue, __alloc_workqueue_key
takes an unsigned int.

Signed-off-by: David Sterba

David Sterba
2015-02-17 01:48:47 +0800

02 Oct, 2014

1 commit

5d99a998f btrfs: remove unlikely from NULL checks ... Browse Code »

Unlikely is implicit for NULL checks of pointers.

Signed-off-by: David Sterba

David Sterba
2014-10-02 22:06:19 +0800

18 Sep, 2014

1 commit

8b110e393 Btrfs: implement repair function when direct read fails ... Browse Code »

This patch implement data repair function when direct read fails.

The detail of the implementation is:
- When we find the data is not right, we try to read the data from the other
mirror.
- When the io on the mirror ends, we will insert the endio work into the
dedicated btrfs workqueue, not common read endio workqueue, because the
original endio work is still blocked in the btrfs endio workqueue, if we
insert the endio work of the io on the mirror into that workqueue, deadlock
would happen.
- After we get right data, we write it back to the corrupted mirror.
- And if the data on the new mirror is still corrupted, we will try next
mirror until we read right data or all the mirrors are traversed.
- After the above work, we set the uptodate flag according to the result.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2014-09-18 04:39:01 +0800

24 Aug, 2014

1 commit

9e0af2376 Btrfs: fix task hang under heavy compressed write ... Browse Code »

This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
work = container_of(arg, struct btrfs_work, normal_work);

work->func() ordered_list
ordered_work->ordered_func()
ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
btrfs_readpages()
for page that is not in page cache
__do_readpage()
submit_extent_page()
btrfs_submit_bio_hook()
btrfs_bio_wq_end_io()
submit_bio()
end_workqueue_bio() current_work = arg; normal_work
worker->current_func(arg)
normal_work_helper(arg)
A = container_of(arg, struct btrfs_work, normal_work);

A->func()
A->ordered_func()
A->ordered_free() ordered_func()
submit_compressed_extents()
find_free_extent()
load_free_space_inode()
... ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work->func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2014-08-24 22:17:02 +0800

08 Apr, 2014

1 commit

800ee2247 btrfs: fix crash in remount(thread_pool=) case ... Browse Code »

Reproducer:
mount /dev/ubda /mnt
mount -oremount,thread_pool=42 /mnt

Gives a crash:
? btrfs_workqueue_set_max+0x0/0x70
btrfs_resize_thread_pool+0xe3/0xf0
? sync_filesystem+0x0/0xc0
? btrfs_resize_thread_pool+0x0/0xf0
btrfs_remount+0x1d2/0x570
? kern_path+0x0/0x80
do_remount_sb+0xd9/0x1c0
do_mount+0x26a/0xbf0
? kfree+0x0/0x1b0
SyS_mount+0xc4/0x110

It's a call
btrfs_workqueue_set_max(fs_info->scrub_wr_completion_workers, new_pool_size);
with
fs_info->scrub_wr_completion_workers = NULL;

as scrub wqs get created only on user's demand.

Patch skips not-created-yet workqueues.

Signed-off-by: Sergei Trofimovich
CC: Qu Wenruo
CC: Chris Mason
CC: Josef Bacik
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason

Sergei Trofimovich
2014-04-08 01:41:52 +0800

21 Mar, 2014

2 commits

c3a468915 btrfs: Add trace for btrfs_workqueue alloc/destroy ... Browse Code »

Since most of the btrfs_workqueue is printed as pointer address,
for easier analysis, add trace for btrfs_workqueue alloc/destroy.
So it is possible to determine the workqueue that a given work belongs
to(by comparing the wq pointer address with alloc trace event).

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2014-03-21 08:15:28 +0800
ef66af101 Btrfs: add missing kfree in btrfs_destroy_workqueue ... Browse Code »

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-03-21 08:15:27 +0800

11 Mar, 2014

8 commits

52483bc26 btrfs: Add ftrace for btrfs_workqueue ... Browse Code »

Add ftrace for btrfs_workqueue for further workqueue tunning.
This patch needs to applied after the workqueue replace patchset.

Signed-off-by: Qu Wenruo
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:21 +0800
6db8914f9 btrfs: Cleanup the btrfs_workqueue related function type ... Browse Code »

The new btrfs_workqueue still use open-coded function defition,
this patch will change them into btrfs_func_t type which is much the
same as kernel workqueue.

Signed-off-by: Qu Wenruo
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:20 +0800
d458b0540 btrfs: Cleanup the "_struct" suffix in btrfs_workequeue ... Browse Code »

Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:16 +0800
a046e9c88 btrfs: Cleanup the old btrfs_worker. ... Browse Code »

Since all the btrfs_worker is replaced with the newly created
btrfs_workqueue, the old codes can be easily remove.

Signed-off-by: Quwenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:15 +0800
0bd9289c2 btrfs: Add threshold workqueue based on kernel workqueue ... Browse Code »

The original btrfs_workers has thresholding functions to dynamically
create or destroy kthreads.

Though there is no such function in kernel workqueue because the worker
is not created manually, we can still use the workqueue_set_max_active
to simulated the behavior, mainly to achieve a better HDD performance by
setting a high threshold on submit_workers.
(Sadly, no resource can be saved)

So in this patch, extra workqueue pending counters are introduced to
dynamically change the max active of each btrfs_workqueue_struct, hoping
to restore the behavior of the original thresholding function.

Also, workqueue_set_max_active use a mutex to protect workqueue_struct,
which is not meant to be called too frequently, so a new interval
mechanism is applied, that will only call workqueue_set_max_active after
a count of work is queued. Hoping to balance both the random and
sequence performance on HDD.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:04 +0800
1ca08976a btrfs: Add high priority workqueue support for btrfs_workqueue_struct ... Browse Code »

Add high priority function to btrfs_workqueue.

This is implemented by embedding a btrfs_workqueue into a
btrfs_workqueue and use some helper functions to differ the normal
priority wq and high priority wq.
So the high priority wq is completely independent from the normal
workqueue.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:03 +0800
08a9ff326 btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue ... Browse Code »

Use kernel workqueue to implement a new btrfs_workqueue_struct, which
has the ordering execution feature like the btrfs_worker.

The func is executed in a concurrency way, and the
ordred_func/ordered_free is executed in the sequence them are queued
after the corresponding func is done.

The new btrfs_workqueue works much like the original one, one workqueue
for normal work and a list for ordered work.
When a work is queued, ordered work will be added to the list and helper
function will be queued into the workqueue.
The helper function will execute a normal work and then check and execute as many
ordered work as possible in the sequence they were queued.

At this patch, high priority work queue or thresholding is not added yet.
The high priority feature and thresholding will be added in the following patches.

Signed-off-by: Qu Wenruo
Signed-off-by: Lai Jiangshan
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:03 +0800
51b98effa btrfs: always choose work from prio_head first ... Browse Code »

In case we do not refill, we can overwrite cur pointer from prio_head
by one from not prioritized head, what looks as something that was
not intended.

This change make we always take works from prio_head first until it's
not empty.

Signed-off-by: Stanislaw Gruszka
Signed-off-by: Josef Bacik

Stanislaw Gruszka
2014-03-11 03:16:36 +0800

21 Nov, 2013

1 commit

ba69994a4 Btrfs: fix __btrfs_start_workers retval ... Browse Code »

__btrfs_start_workers returns 0 in case it raced with
btrfs_stop_workers and lost the race. This is wrong because worker in
this case is not allowed to start and is in fact destroyed. Return
-EINVAL instead.

Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Ilya Dryomov
2013-11-21 09:42:11 +0800

12 Nov, 2013

1 commit

678712545 btrfs: Fix checkpatch.pl warning of spacing issues ... Browse Code »

Fix spacing issues detected via checkpatch.pl in accordance with the
kernel style guidelines.

Signed-off-by: Dulshani Gunawardhana
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Dulshani Gunawardhana
2013-11-12 11:12:31 +0800

05 Oct, 2013

1 commit

964fb15ac Btrfs: eliminate races in worker stopping code ... Browse Code »

The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop. The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

btrfs_stop_workers(): check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
working list
- removes the first worker from the
working list
- releases the lock to wait for
its kthread's completion
- grabs the lock
- if aworker is on the working list,
moves aworker from the working list
to the idle list
- releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
working list
......
btrfs_stop_workers returns, aworker is on the idle list
FS is umounted, memory is freed
......
aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba
Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik

Ilya Dryomov
2013-10-05 04:02:13 +0800

26 Jul, 2012

1 commit

e9fbcb422 Btrfs: call the ordered free operation without any locks held ... Browse Code »

Each ordered operation has a free callback, and this was called with the
worker spinlock held. Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list. We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.

Signed-off-by: Chris Mason
cc: stable@kernel.org

Chris Mason
2012-07-26 04:15:07 +0800

22 Mar, 2012

1 commit

143bede52 btrfs: return void in functions without error conditions ... Browse Code »

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:34 +0800

26 Dec, 2011

1 commit

b7ba68c4a Merge branch 'pm-sleep' into pm-for-linus ... Browse Code »

* pm-sleep: (51 commits)
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
PM / Sleep: Make [un]lock_system_sleep() generic
PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
PM / Sleep: Unify diagnostic messages from device suspend/resume
ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
PM / Hibernate: Remove deprecated hibernation test modes
PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
...

Conflicts:
kernel/kmod.c

Rafael J. Wysocki
2011-12-26 06:42:20 +0800

23 Dec, 2011

1 commit

8d532b2af Btrfs: fix worker lock misuse in find_worker ... Browse Code »

Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.

Signed-off-by: Chris Mason
Reported-by: Dan Carpenter

Chris Mason
2011-12-23 20:53:00 +0800

22 Dec, 2011

1 commit

b00f4dc5f Merge branch 'master' into pm-sleep ... Browse Code »

* master: (848 commits)
SELinux: Fix RCU deref check warning in sel_netport_insert()
binary_sysctl(): fix memory leak
mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
ipmi_watchdog: restore settings when BMC reset
oom: fix integer overflow of points in oom_badness
memcg: keep root group unchanged if creation fails
nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
nilfs2: unbreak compat ioctl
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
evm: prevent racing during tfm allocation
evm: key must be set once during initialization
mmc: vub300: fix type of firmware_rom_wait_states module parameter
Revert "mmc: enable runtime PM by default"
mmc: sdhci: remove "state" argument from sdhci_suspend_host
x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
IB/qib: Correct sense on freectxts increment and decrement
RDMA/cma: Verify private data length
cgroups: fix a css_set not found bug in cgroup_attach_proc
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
...

Conflicts:
kernel/cgroup_freezer.c

Rafael J. Wysocki
2011-12-22 04:59:45 +0800

16 Dec, 2011

2 commits

567a45e91 Merge branch 'for-chris' of http://git.kernel.org/pub/scm/linux/kernel/git/josef… ... Browse Code »

…/btrfs-work into integration

Conflicts:
fs/btrfs/inode.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Chris Mason
2011-12-16 02:43:49 +0800
0dc3b84a7 Btrfs: fix num_workers_starting bug and other bugs in async thread ... Browse Code »

Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff. First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker. Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-12-16 00:04:21 +0800

15 Dec, 2011

1 commit

8f3b65a3d Btrfs: add a cond_resched() into the worker loop ... Browse Code »

If we have a constant stream of end_io completions or crc work,
we can hit softlockup messages from the async helper threads. This
adds a cond_resched() into the loop to avoid them.

Signed-off-by: Chris Mason

Chris Mason
2011-12-15 23:50:38 +0800

22 Nov, 2011

1 commit

a0acae0e8 freezer: unexport refrigerator() and update try_to_freeze() slightly ... Browse Code »

There is no reason to export two functions for entering the
refrigerator. Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
__refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo
Cc: Samuel Ortiz
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Steven Whitehouse
Cc: Andrew Morton
Cc: Jan Kara
Cc: KONISHI Ryusuke
Cc: Christoph Hellwig

Tejun Heo
2011-11-22 04:32:22 +0800

25 May, 2010

1 commit

ed3b3d314 Btrfs: don't walk around with task->state != TASK_RUNNING ... Browse Code »

Yan Zheng noticed two places we were doing a lot of work
without task->state set to TASK_RUNNING. This sets the state
properly after we get ready to sleep but decide not to.

Signed-off-by: Chris Mason

Chris Mason
2010-05-25 22:34:58 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

05 Oct, 2009

1 commit

61d92c328 Btrfs: fix deadlock on async thread startup ... Browse Code »

The btrfs async worker threads are used for a wide variety of things,
including processing bio end_io functions. This means that when
the endio threads aren't running, the rest of the FS isn't
able to do the final processing required to clear PageWriteback.

The endio threads also try to exit as they become idle and
start more as the work piles up. The problem is that starting more
threads means kthreadd may need to allocate ram, and that allocation
may wait until the global number of writeback pages on the system is
below a certain limit.

The result of that throttling is that end IO threads wait on
kthreadd, who is waiting on IO to end, which will never happen.

This commit fixes the deadlock by handing off thread startup to a
dedicated thread. It also fixes a bug where the on-demand thread
creation was creating far too many threads because it didn't take into
account threads being started by other procs.

Signed-off-by: Chris Mason

Chris Mason
2009-10-05 21:44:45 +0800

16 Sep, 2009

1 commit

6e74057c4 Btrfs: Fix async thread shutdown race ... Browse Code »

It was possible for an async worker thread to be selected to
receive a new work item, but exit before the work item was
actually placed into that thread's work list.

This commit fixes the race by incrementing the num_pending
counter earlier, and making sure to check the number of pending
work items before a thread exits.

Signed-off-by: Chris Mason

Chris Mason
2009-09-16 08:20:17 +0800