04 Oct, 2010

1 commit

  • We currently use struct backing_dev_info for various different purposes.
    Originally it was introduced to describe a backing device which includes
    an unplug and congestion function and various bits of readahead information
    and VM-relevant flags. We're also using for tracking dirty inodes for
    writeback.

    To make writeback properly find all inodes we need to only access the
    per-filesystem backing_device pointed to by the superblock in ->s_bdi
    inside the writeback code, and not the instances pointeded to by
    inode->i_mapping->backing_dev which can be overriden by special devices
    or might not be set at all by some filesystems.

    Long term we should split out the writeback-relevant bits of struct
    backing_device_info (which includes more than the current bdi_writeback)
    and only point to it from the superblock while leaving the traditional
    backing device as a separate structure that can be overriden by devices.

    The one exception for now is the block device filesystem which really
    wants different writeback contexts for it's different (internal) inodes
    to handle the writeout more efficiently. For now we do this with
    a hack in fs-writeback.c because we're so late in the cycle, but in
    the future I plan to replace this with a superblock method that allows
    for multiple writeback contexts per filesystem.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

22 Sep, 2010

1 commit

  • Inodes of devices such as /dev/zero can get dirty for example via
    utime(2) syscall or due to atime update. Backing device of such inodes
    (zero_bdi, etc.) is however unable to handle dirty inodes and thus
    __mark_inode_dirty complains. In fact, inode should be rather dirtied
    against backing device of the filesystem holding it. This is generally a
    good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes
    in these pseudofilesystems are referenced from ordinary filesystem
    inodes and carry mapping with real data of the device. Thus for these
    inodes we have to use inode->i_mapping->backing_dev_info as we did so
    far. We distinguish these filesystems by checking whether sb->s_bdi
    points to a non-trivial backing device or not.

    Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /.
    There's a device inode A described by a path "/dev/sdb" on this
    filesystem. This inode will be dirtied against backing device "8:0"
    after this patch. bdev filesystem contains block device inode B coupled
    with our inode A. When someone modifies a page of /dev/sdb, it's B that
    gets dirtied and the dirtying happens against the backing device "8:16".
    Thus both inodes get filed to a correct bdi list.

    Cc: stable@kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

28 Aug, 2010

1 commit

  • Setting the task state here may cause us to miss the wake up from
    kthread_stop(), so we need to recheck kthread_should_stop() or risk
    sleeping forever in the following schedule().

    Symptom was an indefinite hang on an NFSv4 mount. (NFSv4 may create
    multiple mounts in a temporary namespace while traversing the mount
    path, and since the temporary namespace is immediately destroyed, it may
    end up destroying a mount very soon after it was created, possibly
    making this race more likely.)

    INFO: task mount.nfs4:4314 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    mount.nfs4 D 0000000000000000 2880 4314 4313 0x00000000
    ffff88001ed6da28 0000000000000046 ffff88001ed6dfd8 ffff88001ed6dfd8
    ffff88001ed6c000 ffff88001ed6c000 ffff88001ed6c000 ffff88001e5003a0
    ffff88001ed6dfd8 ffff88001e5003a8 ffff88001ed6c000 ffff88001ed6dfd8
    Call Trace:
    [] schedule_timeout+0x1cd/0x2e0
    [] ? mark_held_locks+0x6c/0xa0
    [] ? _raw_spin_unlock_irq+0x30/0x60
    [] ? trace_hardirqs_on_caller+0x14d/0x190
    [] ? sub_preempt_count+0xe/0xd0
    [] wait_for_common+0x120/0x190
    [] ? default_wake_function+0x0/0x20
    [] wait_for_completion+0x1d/0x20
    [] kthread_stop+0x4a/0x150
    [] ? thaw_process+0x70/0x80
    [] bdi_unregister+0x10a/0x1a0
    [] nfs_put_super+0x19/0x20
    [] generic_shutdown_super+0x54/0xe0
    [] kill_anon_super+0x16/0x60
    [] nfs4_kill_super+0x39/0x90
    [] deactivate_locked_super+0x45/0x60
    [] deactivate_super+0x49/0x70
    [] mntput_no_expire+0x84/0xe0
    [] release_mounts+0x9f/0xc0
    [] put_mnt_ns+0x65/0x80
    [] nfs_follow_remote_path+0x1e6/0x420
    [] nfs4_try_mount+0x6f/0xd0
    [] nfs4_get_sb+0xa2/0x360
    [] vfs_kern_mount+0x88/0x1f0
    [] do_kern_mount+0x52/0x130
    [] ? _lock_kernel+0x6a/0x170
    [] do_mount+0x26e/0x7f0
    [] ? copy_mount_options+0xea/0x190
    [] sys_mount+0x98/0xf0
    [] system_call_fastpath+0x16/0x1b
    1 lock held by mount.nfs4/4314:
    #0: (&type->s_umount_key#24){+.+...}, at: [] deactivate_super+0x41/0x70

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Jens Axboe
    Acked-by: Artem Bityutskiy

    J. Bruce Fields
     

12 Aug, 2010

5 commits

  • Commit 83ba7b071f3 ("writeback: simplify the write back thread queue")
    broke writeback_in_progress() as in that commit we started to remove work
    items from the list at the moment we start working on them and not at the
    moment they are finished. Thus if the flusher thread was doing some work
    but there was no other work queued, writeback_in_progress() returned
    false. This could in particular cause unnecessary queueing of background
    writeback from balance_dirty_pages() or writeout work from
    writeback_sb_if_idle().

    This patch fixes the problem by introducing a bit in the bdi state which
    indicates that the flusher thread is processing some work and uses this
    bit for writeback_in_progress() test.

    NOTE: Both callsites of writeback_in_progress() (namely,
    writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
    need a different information than what writeback_in_progress() provides.
    They would need to know whether *the kind of writeback they are going to
    submit* is already queued. But this information isn't that simple to
    provide so let's fix writeback_in_progress() for the time being.

    Signed-off-by: Jan Kara
    Cc: Christoph Hellwig
    Cc: Wu Fengguang
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Unify the logic for kupdate and non-kupdate cases. There won't be
    starvation because the inodes requeued into b_more_io will later be
    spliced _after_ the remaining inodes in b_io, hence won't stand in the way
    of other inodes in the next run.

    It avoids unnecessary redirty_tail() calls, hence the update of
    i_dirtied_when. The timestamp update is undesirable because it could
    later delay the inode's periodic writeback, or may exclude the inode from
    the data integrity sync operation (which checks timestamp to avoid extra
    work and livelock).

    ===
    How the redirty_tail() comes about:

    It was a long story.. This redirty_tail() was introduced with
    wbc.more_io. The initial patch for more_io actually does not have the
    redirty_tail(), and when it's merged, several 100% iowait bug reports
    arised:

    reiserfs:
    http://lkml.org/lkml/2007/10/23/93

    jfs:
    commit 29a424f28390752a4ca2349633aaacc6be494db5
    JFS: clear PAGECACHE_TAG_DIRTY for no-write pages

    ext2:
    http://www.spinics.net/linux/lists/linux-ext4/msg04762.html

    They are all old bugs hidden in various filesystems that become "visible"
    with the more_io patch. At the time, the ext2 bug is thought to be
    "trivial", so not fixed. Instead the following updated more_io patch with
    redirty_tail() is merged:

    http://www.spinics.net/linux/lists/linux-ext4/msg04507.html

    This will in general prevent 100% on ext2 and possibly other unknown FS bugs.

    Signed-off-by: Wu Fengguang
    Cc: Dave Chinner
    Cc: Martin Bligh
    Cc: Michael Rubin
    Cc: Peter Zijlstra
    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • This was not a bug, since b_io is empty for kupdate writeback. The next
    patch will do requeue_io() for non-kupdate writeback, so let's fix it.

    Signed-off-by: Wu Fengguang
    Cc: Dave Chinner
    Cc: Martin Bligh
    Cc: Michael Rubin
    Cc: Peter Zijlstra
    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Avoid delaying writeback for an expire inode with lots of dirty pages, but
    no active dirtier at the moment. Previously we only do that for the
    kupdate case.

    Any filesystem that does delayed allocation or unwritten extent conversion
    after IO completion will cause this - for example, XFS.

    Signed-off-by: Wu Fengguang
    Acked-by: Jan Kara
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
    that the latter can be avoided when under global dirty background
    threshold (which is the normal state for most systems).

    Signed-off-by: Wu Fengguang
    Cc: Peter Zijlstra
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

11 Aug, 2010

2 commits

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

2 commits

  • WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't
    write out some huge inode for too long while starving writeout of other
    inodes. To avoid livelocks, we record time we started writeback in
    wbc->wb_start and do not write out inodes which were dirtied after this
    time. But currently, writeback_inodes_wb() resets wb_start each time it
    is called thus effectively invalidating this logic and making any
    WB_SYNC_NONE writeback prone to livelocks.

    This patch makes sure wb_start is set only once when we start writeback.

    Signed-off-by: Jan Kara
    Reviewed-by: Wu Fengguang
    Cc: Christoph Hellwig
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
    equivalent to I_FREEING for almost all code looking at either;
    it's there to keep track of having called clear_inode() exactly
    once per inode lifetime, at some point after having set I_FREEING.
    I_CLEAR and I_FREEING never get set at the same time with the
    current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
    instead of I_CLEAR without loss of information. As the result of
    such change, checks become simpler and the amount of code that needs
    to know about I_CLEAR shrinks a lot.

    Signed-off-by: Al Viro

    Al Viro
     

08 Aug, 2010

12 commits

  • Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
    should take care of the periodic background write-out. However, the write-out
    will actually start only 'dirty_writeback_interval' centisecs later, so we can
    delay the wake-up.

    This change was requested by Nick Piggin who pointed out that if we delay the
    wake-up, we weed out 2 unnecessary contex switches, which matters because
    '__mark_inode_dirty()' is a hot-path function.

    This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
    sets up a timer to wake-up the bdi thread and returns. So the wake-up is
    delayed.

    We also delete the timer in bdi threads just before writing-back. And
    synchronously delete it when unregistering bdi. At the unregister point the bdi
    does not have any users, so no one can arm it again.

    Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
    context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
    this change as well.

    This patch also moves the 'bdi_wb_init()' function down in the file to avoid
    forward-declaration of 'bdi_wakeup_thread_delayed()'.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very
    bad for battery-driven devices.

    There are two types of activities bdi threads do:
    1. process bdi works from the 'bdi->work_list'
    2. periodic write-back

    So there are 2 sources of wake-up events for bdi threads:

    1. 'bdi_queue_work()' - submits bdi works
    2. '__mark_inode_dirty()' - adds dirty I/O to bdi's

    The former already has bdi wake-up code. The latter does not, and this patch
    adds it.

    '__mark_inode_dirty()' is hot-path function, but this patch adds another
    'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when
    the bdi has no dirty inodes. So adding this spinlock should be fine and should
    not affect performance.

    This patch makes sure bdi threads and the forker thread do not wake-up if there
    is nothing to do. The forker thread will nevertheless wake up at least every
    5 min. to check whether it has to kill a bdi thread. This can also be optimized,
    but is not worth it.

    This patch also tidies up the warning about unregistered bid, and turns it from
    an ugly crocodile to a simple 'WARN()' statement.

    Signed-off-by: Artem Bityutskiy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • Currently, bdi threads can decide to exit if there were no useful activities
    for 5 minutes. However, this causes nasty races: we can easily oops in the
    'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up.

    And even if we do not oops, but the bdi tread exits immediately after we wake
    it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs)
    in the bdi work processing.

    This patch makes the forker thread to be the central place which not only
    creates bdi threads, but also kills them if they were inactive long enough.
    This better design-wise.

    Another reason why this change was done is to prepare for the further changes
    which will prevent the bdi threads from waking up every 5 sec and wasting
    power. Indeed, when the task does not wake up periodically anymore, it won't be
    able to exit either.

    This patch also moves the the 'wake_up_bit()' call from the bdi thread to the
    forker thread as well. So now the forker thread sets the BDI_pending bit, then
    forks the task or kills it, then clears the bit and wakes up the waiting
    process.

    The only process which may wain on the bit is 'bdi_wb_shutdown()'. This
    function was changed as well - now it first removes the bdi from the
    'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is
    guaranteed that the forker thread won't race with it, because the bdi is not
    visible. Note, the forker thread sets the 'BDI_pending' bit under the
    'bdi->wb_lock' which is essential for proper serialization.

    And additionally, when we change 'bdi->wb.task', we now take the
    'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise
    would when raced with, say, 'bdi_queue_work()'.

    Signed-off-by: Artem Bityutskiy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • Currently bdi threads use local variable 'last_active' which stores last time
    when the bdi thread did some useful work. Move this local variable to 'struct
    bdi_writeback'. This is just a preparation for the further patches which will
    make the forker thread decide when bdi threads should be killed.

    Signed-off-by: Artem Bityutskiy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • The forker thread removes bdis from 'bdi_list' before forking the bdi thread.
    But this is wrong for at least 2 reasons.

    Reason #1: if we temporary remove a bdi from the list, we may miss works which
    would otherwise be given to us.

    Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are
    always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when
    it races with the forker thread, it can shut down the bdi thread
    at the same time as the forker creates it.

    This patch makes sure the forker thread never removes bdis from 'bdi_list'
    (which was suggested by Christoph Hellwig).

    In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to
    hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending'
    flag.

    NOTE! The error path is interesting. Currently, when we fail to create a bdi
    thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the
    bdi from the list, we cannot move it to the tail either, because then we can
    mess up the RCU readers which walk the list. And also, we'll have the race
    described above in "Reason #2".

    But I not think that adding to the tail is any important so I just do not do
    that.

    Signed-off-by: Artem Bityutskiy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For
    example, if 'bdi_queue_work()' is executed after the bdi thread have had
    finished 'wb_do_writeback()' but before it called
    'schedule_timeout_interruptible()'.

    To fix this issue, we have to check whether we have works to process after we
    have changed the task state to 'TASK_INTERRUPTIBLE'.

    This patch also clean-ups handling of the cases when 'dirty_writeback_interval'
    is zero or non-zero.

    Additionally, this patch also removes unneeded 'list_empty_careful()' call.

    Signed-off-by: Artem Bityutskiy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • The write-back code mixes words "thread" and "task" for the same things. This
    is not a big deal, but still an inconsistency.

    hch: a convention I tend to use and I've seen in various places
    is to always use _task for the storage of the task_struct pointer,
    and thread everywhere else. This especially helps with having
    foo_thread for the actual thread and foo_task for a global
    variable keeping the task_struct pointer

    This patch renames:
    * 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
    * 'bdi_forker_task()' -> 'bdi_forker_thread()'

    because bdi threads are 'bdi_writeback_thread()', so these names are more
    consistent.

    This patch also amends commentaries and makes them refer the forker and bdi
    threads as "thread", not "task".

    Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
    'static void' instead of 'void static' and make checkpatch.pl happy.

    Signed-off-by: Artem Bityutskiy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Artem Bityutskiy
     
  • 83ba7b07 cleans up the writeback.
    So we don't use wb any more in get_next_work_item.
    Let's remove unnecessary argument.

    CC: Christoph Hellwig
    Signed-off-by: Minchan Kim
    Signed-off-by: Jens Axboe

    Minchan Kim
     
  • Tracing high level background writeback events is good, but it doesn't
    give the entire picture. Add visibility into write throttling to catch IO
    dispatched by foreground throttling of processing dirtying lots of pages.

    Signed-off-by: Dave Chinner
    Signed-off-by: Jens Axboe

    Dave Chinner
     
  • Trace queue/sched/exec parts of the writeback loop. This provides
    insight into when and why flusher threads are scheduled to run. e.g
    a sync invocation leaves traces like:

    sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
    flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0

    This also lays the foundation for adding more writeback tracing to
    provide deeper insight into the whole writeback path.

    The original tracing code is from Jens Axboe, though this version is
    a rewrite as a result of the code being traced changing
    significantly.

    Signed-off-by: Dave Chinner
    Signed-off-by: Jens Axboe

    Dave Chinner
     
  • Move all code for the writeback thread into fs/fs-writeback.c instead of
    splitting it over two functions in two files.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The wb_list member of struct backing_device_info always has exactly one
    element. Just use the direct bdi->wb pointer instead and simplify some
    code.

    Also remove bdi_task_init which is now trivial to prepare for the next
    patch.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

06 Jul, 2010

3 commits

  • First remove items from work_list as soon as we start working on them. This
    means we don't have to track any pending or visited state and can get
    rid of all the RCU magic freeing the work items - we can simply free
    them once the operation has finished. Second use a real completion for
    tracking synchronous requests - if the caller sets the completion pointer
    we complete it, otherwise use it as a boolean indicator that we can free
    the work item directly. Third unify struct wb_writeback_args and struct
    bdi_work into a single data structure, wb_writeback_work. Previous we
    set all parameters into a struct wb_writeback_args, copied it into
    struct bdi_work, copied it again on the stack to use it there. Instead
    of just allocate one structure dynamically or on the stack and use it
    all the way through the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The case where we have a superblock doesn't require a loop here as we scan
    over all inodes in writeback_sb_inodes. Split it out into a separate helper
    to make the code simpler. This also allows to get rid of the sb member in
    struct writeback_control, which was rather out of place there.

    Also update the comments in writeback_sb_inodes that explain the handling
    of inodes from wrong superblocks.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This was just an odd wrapper around writeback_inodes_wb. Removing this
    also allows to get rid of the bdi member of struct writeback_control
    which was rather out of place there.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jul, 2010

1 commit

  • Fix kernel-doc to match the function's changed args.

    Warning(fs/fs-writeback.c:190): No description found for parameter 'args'
    Warning(fs/fs-writeback.c:190): Excess function parameter 'sb' description in 'bdi_queue_work_onstack'

    Signed-off-by: Randy Dunlap
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Randy Dunlap
     

11 Jun, 2010

8 commits

  • We need to check for s_instances to make sure we don't bother working
    against a filesystem that is beeing unmounted, and we need to call
    put_super to make sure a superblock is freed when we race against
    umount. Also no need to keep sb_lock after we got a reference on it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • In "writeback: fix writeback_inodes_wb from writeback_inodes_sb" I
    accidentally removed the requeue_io if we need to skip a superblock
    because we can't pin it. Add it back, otherwise we're getting spurious
    lockups after multiple xfstests runs.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • bdi_start_writeback now never gets a superblock passed, so we can just remove
    that case. And to further untangle the code and flatten the call stack
    split it into two trivial helpers for it's two callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • bdi_writeback_all only has one caller, so fold it to simplify the code and
    flatten the call stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • When we call writeback_inodes_wb from writeback_inodes_sb we always have
    s_umount held, which currently makes the whole operation a no-op.

    But if we are called to write out inodes for a specific superblock we always
    have s_umount held, so replace the incorrect logic checking for WB_SYNC_ALL
    which only worked by coincidence with the proper check for an explicit
    superblock argument.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Make sure that not only sync_filesystem but all callers of writeback_inodes_sb
    have the superblock protected against remount. As-is this disables all
    functionality for these callers, but the next patch relies on this locking to
    fix writeback_inodes_sb for sync_filesystem.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • If we want to rely on s_umount in the caller we need to wait for completion
    of the I/O submission before returning to the caller. Refactor
    bdi_sync_writeback into a bdi_queue_work_onstack helper and use it for this
    case.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The code dealing with bdi_work->state and completion of a bdi_work is a
    major mess currently. This patch makes sure we directly use one set of
    flags to deal with it, and use it consistently, which means:

    - always notify about completion from the rcu callback. We only ever
    wait for it from on-stack callers, so this simplification does not
    even cause a theoretical slowdown currently. It also makes sure we
    don't miss out on the notification if we ever add other callers to
    wait for it.
    - make earlier completion notification depending on the on-stack
    allocation, not the sync mode. If we introduce new callers that
    want to do WB_SYNC_NONE writeback from on-stack callers this will
    be nessecary.

    Also rename bdi_wait_on_work_clear to bdi_wait_on_work_done and inline
    a few small functions into their only caller to make the code
    understandable.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jun, 2010

3 commits


25 May, 2010

1 commit

  • When wb_writeback() hasn't written anything it will re-acquire the inode
    lock before calling inode_wait_for_writeback.

    This change tests the sync bit first so that is doesn't need to drop &
    re-acquire the lock if the inode became available while wb_writeback() was
    waiting to get the lock.

    Signed-off-by: Richard Kennedy
    Cc: Alexander Viro
    Cc: Jens Axboe
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Kennedy