Eric Lee / smarc-fsl-linux-kernel

01 Sep, 2015

1 commit

c6dd6ea55 btrfs: async_thread: Fix workqueue 'max_active' value when initializing ... Browse Code »

At initializing time, for threshold-able workqueue, it's max_active
of kernel workqueue should be 1 and grow if it hits threshold.

But due to the bad naming, there is both 'max_active' for kernel
workqueue and btrfs workqueue.
So wrong value is given at workqueue initialization.

This patch fixes it, and to avoid further misunderstanding, change the
member name of btrfs_workqueue to 'current_active' and 'limit_active'.

Also corresponding comment is added for readability.

Reported-by: Alex Lyakas
Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-09-01 02:46:40 +0800

10 Jun, 2015

1 commit

20b2e3029 btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx() ... Browse Code »

lockdep report following warning in test:
[25176.843958] =================================
[25176.844519] [ INFO: inconsistent lock state ]
[25176.845047] 4.1.0-rc3 #22 Tainted: G W
[25176.845591] ---------------------------------
[25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
[25176.847246] (&wr_ctx->wr_lock){+.?...}, at: [] scrub_free_ctx+0x2d/0xf0 [btrfs]
[25176.847838] {SOFTIRQ-ON-W} state was registered at:
[25176.848396] [] __lock_acquire+0x6a0/0xe10
[25176.848955] [] lock_acquire+0xce/0x2c0
[25176.849491] [] mutex_lock_nested+0x7f/0x410
[25176.850029] [] scrub_stripe+0x4df/0x1080 [btrfs]
[25176.850575] [] scrub_chunk.isra.19+0x111/0x130 [btrfs]
[25176.851110] [] scrub_enumerate_chunks+0x27c/0x510 [btrfs]
[25176.851660] [] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
[25176.852189] [] btrfs_dev_replace_start+0x36e/0x450 [btrfs]
[25176.852771] [] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
[25176.853315] [] do_vfs_ioctl+0x318/0x570
[25176.853868] [] SyS_ioctl+0x41/0x80
[25176.854406] [] system_call_fastpath+0x12/0x6f
[25176.854935] irq event stamp: 51506
[25176.855511] hardirqs last enabled at (51506): [] vprintk_emit+0x225/0x5e0
[25176.856059] hardirqs last disabled at (51505): [] vprintk_emit+0xb7/0x5e0
[25176.856642] softirqs last enabled at (50886): [] __do_softirq+0x363/0x640
[25176.857184] softirqs last disabled at (50949): [] irq_exit+0x10d/0x120
[25176.857746]
other info that might help us debug this:
[25176.858845] Possible unsafe locking scenario:
[25176.859981] CPU0
[25176.860537] ----
[25176.861059] lock(&wr_ctx->wr_lock);
[25176.861705]
[25176.862272] lock(&wr_ctx->wr_lock);
[25176.862881]
*** DEADLOCK ***

Reason:
Above warning is caused by:
Interrupt
-> bio_endio()
-> ...
-> scrub_put_ctx()
-> scrub_free_ctx() *1
-> ...
-> mutex_lock(&wr_ctx->wr_lock);

scrub_put_ctx() is allowed to be called in end_bio interrupt, but
in code design, it will never call scrub_free_ctx(sctx) in interrupe
context(above *1), because btrfs_scrub_dev() get one additional
reference of sctx->refs, which makes scrub_free_ctx() only called
withine btrfs_scrub_dev().

Now the code runs out of our wish, because free sequence in
scrub_pending_bio_dec() have a gap.

Current code:
-----------------------------------+-----------------------------------
scrub_pending_bio_dec() | btrfs_scrub_dev
-----------------------------------+-----------------------------------
atomic_dec(&sctx->bios_in_flight); |
wake_up(&sctx->list_wait); |
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
scrub_put_ctx(sctx); |
-> atomic_dec_and_test(&sctx->refs)|
-> scrub_free_ctx() |
-----------------------------------+-----------------------------------

We expected:
-----------------------------------+-----------------------------------
scrub_pending_bio_dec() | btrfs_scrub_dev
-----------------------------------+-----------------------------------
atomic_dec(&sctx->bios_in_flight); |
wake_up(&sctx->list_wait); |
scrub_put_ctx(sctx); |
-> atomic_dec_and_test(&sctx->refs)|
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
| -> scrub_free_ctx()
-----------------------------------+-----------------------------------

Fix:
Move scrub_pending_bio_dec() to a workqueue, to avoid this function run
in interrupt context.
Tested by check tracelog in debug.

Changelog v1->v2:
Use workqueue instead of adjust function call sequence in v1,
because v1 will introduce a bug pointed out by:
Filipe David Manana

Reported-by: Qu Wenruo
Signed-off-by: Zhao Lei
Reviewed-by: Filipe Manana
Signed-off-by: Chris Mason

Zhao Lei
2015-06-10 22:04:52 +0800

17 Feb, 2015

1 commit

6f0110581 btrfs: use correct type for workqueue flags ... Browse Code »

Through all the local wrappers to alloc_workqueue, __alloc_workqueue_key
takes an unsigned int.

Signed-off-by: David Sterba

David Sterba
2015-02-17 01:48:47 +0800

18 Sep, 2014

1 commit

8b110e393 Btrfs: implement repair function when direct read fails ... Browse Code »

This patch implement data repair function when direct read fails.

The detail of the implementation is:
- When we find the data is not right, we try to read the data from the other
mirror.
- When the io on the mirror ends, we will insert the endio work into the
dedicated btrfs workqueue, not common read endio workqueue, because the
original endio work is still blocked in the btrfs endio workqueue, if we
insert the endio work of the io on the mirror into that workqueue, deadlock
would happen.
- After we get right data, we write it back to the corrupted mirror.
- And if the data on the new mirror is still corrupted, we will try next
mirror until we read right data or all the mirrors are traversed.
- After the above work, we set the uptodate flag according to the result.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2014-09-18 04:39:01 +0800

24 Aug, 2014

1 commit

9e0af2376 Btrfs: fix task hang under heavy compressed write ... Browse Code »

This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
work = container_of(arg, struct btrfs_work, normal_work);

work->func() ordered_list
ordered_work->ordered_func()
ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
btrfs_readpages()
for page that is not in page cache
__do_readpage()
submit_extent_page()
btrfs_submit_bio_hook()
btrfs_bio_wq_end_io()
submit_bio()
end_workqueue_bio() current_work = arg; normal_work
worker->current_func(arg)
normal_work_helper(arg)
A = container_of(arg, struct btrfs_work, normal_work);

A->func()
A->ordered_func()
A->ordered_free() ordered_func()
submit_compressed_extents()
find_free_extent()
load_free_space_inode()
... ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work->func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2014-08-24 22:17:02 +0800

21 Mar, 2014

1 commit

c3a468915 btrfs: Add trace for btrfs_workqueue alloc/destroy ... Browse Code »

Since most of the btrfs_workqueue is printed as pointer address,
for easier analysis, add trace for btrfs_workqueue alloc/destroy.
So it is possible to determine the workqueue that a given work belongs
to(by comparing the wq pointer address with alloc trace event).

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2014-03-21 08:15:28 +0800

11 Mar, 2014

6 commits

6db8914f9 btrfs: Cleanup the btrfs_workqueue related function type ... Browse Code »

The new btrfs_workqueue still use open-coded function defition,
this patch will change them into btrfs_func_t type which is much the
same as kernel workqueue.

Signed-off-by: Qu Wenruo
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:20 +0800
d458b0540 btrfs: Cleanup the "_struct" suffix in btrfs_workequeue ... Browse Code »

Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:16 +0800
a046e9c88 btrfs: Cleanup the old btrfs_worker. ... Browse Code »

Since all the btrfs_worker is replaced with the newly created
btrfs_workqueue, the old codes can be easily remove.

Signed-off-by: Quwenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:15 +0800
0bd9289c2 btrfs: Add threshold workqueue based on kernel workqueue ... Browse Code »

The original btrfs_workers has thresholding functions to dynamically
create or destroy kthreads.

Though there is no such function in kernel workqueue because the worker
is not created manually, we can still use the workqueue_set_max_active
to simulated the behavior, mainly to achieve a better HDD performance by
setting a high threshold on submit_workers.
(Sadly, no resource can be saved)

So in this patch, extra workqueue pending counters are introduced to
dynamically change the max active of each btrfs_workqueue_struct, hoping
to restore the behavior of the original thresholding function.

Also, workqueue_set_max_active use a mutex to protect workqueue_struct,
which is not meant to be called too frequently, so a new interval
mechanism is applied, that will only call workqueue_set_max_active after
a count of work is queued. Hoping to balance both the random and
sequence performance on HDD.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:04 +0800
1ca08976a btrfs: Add high priority workqueue support for btrfs_workqueue_struct ... Browse Code »

Add high priority function to btrfs_workqueue.

This is implemented by embedding a btrfs_workqueue into a
btrfs_workqueue and use some helper functions to differ the normal
priority wq and high priority wq.
So the high priority wq is completely independent from the normal
workqueue.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:03 +0800
08a9ff326 btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue ... Browse Code »

Use kernel workqueue to implement a new btrfs_workqueue_struct, which
has the ordering execution feature like the btrfs_worker.

The func is executed in a concurrency way, and the
ordred_func/ordered_free is executed in the sequence them are queued
after the corresponding func is done.

The new btrfs_workqueue works much like the original one, one workqueue
for normal work and a list for ordered work.
When a work is queued, ordered work will be added to the list and helper
function will be queued into the workqueue.
The helper function will execute a normal work and then check and execute as many
ordered work as possible in the sequence they were queued.

At this patch, high priority work queue or thresholding is not added yet.
The high priority feature and thresholding will be added in the following patches.

Signed-off-by: Qu Wenruo
Signed-off-by: Lai Jiangshan
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:03 +0800

05 Oct, 2013

1 commit

964fb15ac Btrfs: eliminate races in worker stopping code ... Browse Code »

The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop. The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

btrfs_stop_workers(): check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
working list
- removes the first worker from the
working list
- releases the lock to wait for
its kthread's completion
- grabs the lock
- if aworker is on the working list,
moves aworker from the working list
to the idle list
- releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
working list
......
btrfs_stop_workers returns, aworker is on the idle list
FS is umounted, memory is freed
......
aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba
Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik

Ilya Dryomov
2013-10-05 04:02:13 +0800

22 Mar, 2012

1 commit

143bede52 btrfs: return void in functions without error conditions ... Browse Code »

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:34 +0800

16 Dec, 2011

1 commit

0dc3b84a7 Btrfs: fix num_workers_starting bug and other bugs in async thread ... Browse Code »

Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff. First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker. Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-12-16 00:04:21 +0800

05 Oct, 2009

1 commit

61d92c328 Btrfs: fix deadlock on async thread startup ... Browse Code »

The btrfs async worker threads are used for a wide variety of things,
including processing bio end_io functions. This means that when
the endio threads aren't running, the rest of the FS isn't
able to do the final processing required to clear PageWriteback.

The endio threads also try to exit as they become idle and
start more as the work piles up. The problem is that starting more
threads means kthreadd may need to allocate ram, and that allocation
may wait until the global number of writeback pages on the system is
below a certain limit.

The result of that throttling is that end IO threads wait on
kthreadd, who is waiting on IO to end, which will never happen.

This commit fixes the deadlock by handing off thread startup to a
dedicated thread. It also fixes a bug where the on-demand thread
creation was creating far too many threads because it didn't take into
account threads being started by other procs.

Signed-off-by: Chris Mason

Chris Mason
2009-10-05 21:44:45 +0800

12 Sep, 2009

2 commits

4e3f9c504 Btrfs: keep irqs on more often in the worker threads ... Browse Code »

The btrfs worker thread spinlock was being used both for the
queueing of IO and for the processing of ordered events.

The ordered events never happen from end_io handlers, and so they
don't need to use the _irq version of spinlocks. This adds a
dedicated lock to the ordered lists so they don't have to run
with irqs off.

Signed-off-by: Chris Mason

Chris Mason
2009-09-12 01:31:04 +0800
9042846bc Btrfs: Allow worker threads to exit when idle ... Browse Code »

The Btrfs worker threads don't currently die off after they have
been idle for a while, leading to a lot of threads sitting around
doing nothing for each mount.

Also, they are unable to start atomically (from end_io hanlders).

This commit reworks the worker threads so they can be started
from end_io handlers (just setting a flag that asks for a thread
to be added at a later date) and so they can exit if they
have been idle for a long time.

Signed-off-by: Chris Mason

Chris Mason
2009-09-12 01:30:56 +0800

21 Apr, 2009

1 commit

d313d7a31 Btrfs: add a priority queue to the async thread helpers ... Browse Code »

Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
higher priority. But, the checksumming helper threads prevent it
from being fully effective.

There are two problems. First, a big queue of pending checksumming
will delay the synchronous IO behind other lower priority writes. Second,
the checksumming uses an ordered async work queue. The ordering makes sure
that IOs are sent to the block layer in the same order they are sent
to the checksumming threads. Usually this gives us less seeky IO.

But, when we start mixing IO priorities, the lower priority IO can delay
the higher priority IO.

This patch solves both problems by adding a high priority list to the async
helper threads, and a new btrfs_set_work_high_prio(), which is used
to make put a new async work item onto the higher priority list.

The ordering is still done on high priority IO, but all of the high
priority bios are ordered separately from the low priority bios. This
ordering is purely an IO optimization, it is not involved in data
or metadata integrity.

Signed-off-by: Chris Mason

Chris Mason
2009-04-21 03:53:08 +0800

07 Nov, 2008

1 commit

4a69a4100 Btrfs: Add ordered async work queues ... Browse Code »

Btrfs uses kernel threads to create async work queues for cpu intensive
operations such as checksumming and decompression. These work well,
but they make it difficult to keep IO order intact.

A single writepages call from pdflush or fsync will turn into a number
of bios, and each bio is checksummed in parallel. Once the checksum is
computed, the bio is sent down to the disk, and since we don't control
the order in which the parallel operations happen, they might go down to
the disk in almost any order.

The code deals with this somewhat by having deep work queues for a single
kernel thread, making it very likely that a single thread will process all
the bios for a single inode.

This patch introduces an explicitly ordered work queue. As work structs
are placed into the queue they are put onto the tail of a list. They have
three callbacks:

->func (cpu intensive processing here)
->ordered_func (order sensitive processing here)
->ordered_free (free the work struct, all processing is done)

The work struct has three callbacks. The func callback does the cpu intensive
work, and when it completes the work struct is marked as done.

Every time a work struct completes, the list is checked to see if the head
is marked as done. If so the ordered_func callback is used to do the
order sensitive processing and the ordered_free callback is used to do
any cleanup. Then we loop back and check the head of the list again.

This patch also changes the checksumming code to use the ordered workqueues.
One a 4 drive array, it increases streaming writes from 280MB/s to 350MB/s.

Signed-off-by: Chris Mason

Chris Mason
2008-11-07 11:03:00 +0800

30 Sep, 2008

1 commit

d352ac681 Btrfs: add and improve comments ... Browse Code »

This improves the comments at the top of many functions. It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.

extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.

Signed-off-by: Chris Mason

Chris Mason
2008-09-30 03:18:18 +0800

25 Sep, 2008

3 commits

5443be45f Btrfs: Give all the worker threads descriptive names ... Browse Code »

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:06 +0800
35d8ba662 Btrfs: Worker thread optimizations ... Browse Code »

This changes the worker thread pool to maintain a list of idle threads,
avoiding a complex search for a good thread to wake up.

Threads have two states:

idle - we try to reuse the last thread used in hopes of improving the batching
ratios

busy - each time a new work item is added to a busy task, the task is
rotated to the end of the line.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:03 +0800
8b7128429 Btrfs: Add async worker threads for pre and post IO checksumming ... Browse Code »

Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system. But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.

This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up. The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.

The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines. Two worker pools are created,
one for writes and one for endio processing. The two could deadlock if
we tried to service both from a single pool.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:03 +0800