16 Dec, 2011

1 commit

  • Al pointed out we have some random problems with the way we account for
    num_workers_starting in the async thread stuff. First of all we need to make
    sure to decrement num_workers_starting if we fail to start the worker, so make
    __btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
    doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
    failed to create a worker. Also check_pending_worker_creates needs to call
    __btrfs_start_work in it's work function since it already increments
    num_workers_starting.

    People only start one worker at a time, so get rid of the num_workers argument
    everywhere, and make btrfs_queue_worker a void since it will always succeed.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

05 Oct, 2009

1 commit

  • The btrfs async worker threads are used for a wide variety of things,
    including processing bio end_io functions. This means that when
    the endio threads aren't running, the rest of the FS isn't
    able to do the final processing required to clear PageWriteback.

    The endio threads also try to exit as they become idle and
    start more as the work piles up. The problem is that starting more
    threads means kthreadd may need to allocate ram, and that allocation
    may wait until the global number of writeback pages on the system is
    below a certain limit.

    The result of that throttling is that end IO threads wait on
    kthreadd, who is waiting on IO to end, which will never happen.

    This commit fixes the deadlock by handing off thread startup to a
    dedicated thread. It also fixes a bug where the on-demand thread
    creation was creating far too many threads because it didn't take into
    account threads being started by other procs.

    Signed-off-by: Chris Mason

    Chris Mason
     

12 Sep, 2009

2 commits

  • The btrfs worker thread spinlock was being used both for the
    queueing of IO and for the processing of ordered events.

    The ordered events never happen from end_io handlers, and so they
    don't need to use the _irq version of spinlocks. This adds a
    dedicated lock to the ordered lists so they don't have to run
    with irqs off.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The Btrfs worker threads don't currently die off after they have
    been idle for a while, leading to a lot of threads sitting around
    doing nothing for each mount.

    Also, they are unable to start atomically (from end_io hanlders).

    This commit reworks the worker threads so they can be started
    from end_io handlers (just setting a flag that asks for a thread
    to be added at a later date) and so they can exit if they
    have been idle for a long time.

    Signed-off-by: Chris Mason

    Chris Mason
     

21 Apr, 2009

1 commit

  • Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
    higher priority. But, the checksumming helper threads prevent it
    from being fully effective.

    There are two problems. First, a big queue of pending checksumming
    will delay the synchronous IO behind other lower priority writes. Second,
    the checksumming uses an ordered async work queue. The ordering makes sure
    that IOs are sent to the block layer in the same order they are sent
    to the checksumming threads. Usually this gives us less seeky IO.

    But, when we start mixing IO priorities, the lower priority IO can delay
    the higher priority IO.

    This patch solves both problems by adding a high priority list to the async
    helper threads, and a new btrfs_set_work_high_prio(), which is used
    to make put a new async work item onto the higher priority list.

    The ordering is still done on high priority IO, but all of the high
    priority bios are ordered separately from the low priority bios. This
    ordering is purely an IO optimization, it is not involved in data
    or metadata integrity.

    Signed-off-by: Chris Mason

    Chris Mason
     

07 Nov, 2008

1 commit

  • Btrfs uses kernel threads to create async work queues for cpu intensive
    operations such as checksumming and decompression. These work well,
    but they make it difficult to keep IO order intact.

    A single writepages call from pdflush or fsync will turn into a number
    of bios, and each bio is checksummed in parallel. Once the checksum is
    computed, the bio is sent down to the disk, and since we don't control
    the order in which the parallel operations happen, they might go down to
    the disk in almost any order.

    The code deals with this somewhat by having deep work queues for a single
    kernel thread, making it very likely that a single thread will process all
    the bios for a single inode.

    This patch introduces an explicitly ordered work queue. As work structs
    are placed into the queue they are put onto the tail of a list. They have
    three callbacks:

    ->func (cpu intensive processing here)
    ->ordered_func (order sensitive processing here)
    ->ordered_free (free the work struct, all processing is done)

    The work struct has three callbacks. The func callback does the cpu intensive
    work, and when it completes the work struct is marked as done.

    Every time a work struct completes, the list is checked to see if the head
    is marked as done. If so the ordered_func callback is used to do the
    order sensitive processing and the ordered_free callback is used to do
    any cleanup. Then we loop back and check the head of the list again.

    This patch also changes the checksumming code to use the ordered workqueues.
    One a 4 drive array, it increases streaming writes from 280MB/s to 350MB/s.

    Signed-off-by: Chris Mason

    Chris Mason
     

30 Sep, 2008

1 commit

  • This improves the comments at the top of many functions. It didn't
    dive into the guts of functions because I was trying to
    avoid merging problems with the new allocator and back reference work.

    extent-tree.c and volumes.c were both skipped, and there is definitely
    more work todo in cleaning and commenting the code.

    Signed-off-by: Chris Mason

    Chris Mason
     

25 Sep, 2008

3 commits

  • Signed-off-by: Chris Mason

    Chris Mason
     
  • This changes the worker thread pool to maintain a list of idle threads,
    avoiding a complex search for a good thread to wake up.

    Threads have two states:

    idle - we try to reuse the last thread used in hopes of improving the batching
    ratios

    busy - each time a new work item is added to a busy task, the task is
    rotated to the end of the line.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Btrfs has been using workqueues to spread the checksumming load across
    other CPUs in the system. But, workqueues only schedule work on the
    same CPU that queued the work, giving them a limited benefit for systems with
    higher CPU counts.

    This code adds a generic facility to schedule work with pools of kthreads,
    and changes the bio submission code to queue bios up. The queueing is
    important to make sure large numbers of procs on the system don't
    turn streaming workloads into random workloads by sending IO down
    concurrently.

    The end result of all of this is much higher performance (and CPU usage) when
    doing checksumming on large machines. Two worker pools are created,
    one for writes and one for endio processing. The two could deadlock if
    we tried to service both from a single pool.

    Signed-off-by: Chris Mason

    Chris Mason