17 Jan, 2011

2 commits

  • Instead of splitting refcount between (per-cpu) mnt_count
    and (SMP-only) mnt_longrefs, make all references contribute
    to mnt_count again and keep track of how many are longterm
    ones.

    Accounting rules for longterm count:
    * 1 for each fs_struct.root.mnt
    * 1 for each fs_struct.pwd.mnt
    * 1 for having non-NULL ->mnt_ns
    * decrement to 0 happens only under vfsmount lock exclusive

    That allows nice common case for mntput() - since we can't drop the
    final reference until after mnt_longterm has reached 0 due to the rules
    above, mntput() can grab vfsmount lock shared and check mnt_longterm.
    If it turns out to be non-zero (which is the common case), we know
    that this is not the final mntput() and can just blindly decrement
    percpu mnt_count. Otherwise we grab vfsmount lock exclusive and
    do usual decrement-and-check of percpu mnt_count.

    For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
    namespace.c uses the latter in places where we don't already hold
    vfsmount lock exclusive and opencodes a few remaining spots where
    we need to manipulate mnt_longterm.

    Note that we mostly revert the code outside of fs/namespace.c back
    to what we used to have; in particular, normal code doesn't need
    to care about two kinds of references, etc. And we get to keep
    the optimization Nick's variant had bought us...

    Signed-off-by: Al Viro

    Al Viro
     
  • Expiry-related code calls umount_tree() several times with
    the same list to collect vfsmounts to. Which is fine, except
    that umount_tree() implicitly assumed that the list would
    be empty on each call - it moves the victims over there and
    then iterates through the list kicking them out. It's *almost*
    idempotent, so everything nearly worked. However, mnt->ghosts
    handling (and thus expirability checks) had been broken - that
    part was not idempotent...

    The fix is trivial - use local temporary list, splice it to
    the the collector list when we are through.

    Signed-off-by: Al Viro

    Al Viro
     

16 Jan, 2011

20 commits

  • Merge the remaining autofs4 dentry ops tables. It doesn't matter if
    d_automount and d_manage are present on something that's not mountable or
    holdable as these ops are only used if the appropriate flags are set in
    dentry->d_flags.

    [AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the
    same dentry_operations.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
    added rather than calling do_add_mount() itself. follow_automount() will then
    do the addition.

    This slightly complicates things as ->d_automount() normally wants to add the
    new vfsmount to an expiration list and start an expiration timer. The problem
    with that is that the vfsmount will be deleted if it has a refcount of 1 and
    the timer will not repeat if the expiration list is empty.

    To this end, we require the vfsmount to be returned from d_automount() with a
    refcount of (at least) 2. One of these refs will be dropped unconditionally.
    In addition, follow_automount() must get a 3rd ref around the call to
    do_add_mount() lest it eat a ref and return an error, leaving the mount we
    have open to being expired as we would otherwise have only 1 ref on it.

    d_automount() should also add the the vfsmount to the expiration list (by
    calling mnt_set_expiry()) and start the expiration timer before returning, if
    this mechanism is to be used. The vfsmount will be unlinked from the
    expiration list by follow_automount() if do_add_mount() fails.

    This patch also fixes the call to do_add_mount() for AFS to propagate the mount
    flags from the parent vfsmount.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
    as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call
    d_manage() directly. d_manage() needs a parameter to indicate that it is in
    RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
    -ECHILD instead).

    autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
    accesses it and otherwise request dropping back to ref-walk mode.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Remove a further kludge from __do_follow_link() as it's no longer required with
    the automount code.

    This reverts the non-helper-function parts of
    051d381259eb57d6074d02a6ba6e90e744f1a29f, which breaks union mounts.

    Reported-by: vaurora@redhat.com
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Version 4 of autofs provides a pseudo direct mount implementation
    that relies on directories at the leaves of a directory tree under
    an indirect mount to trigger mounts.

    This patch adds support for that functionality.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • It is possible for the check in wait.c:validate_request() to return
    an incorrect result if the dentry that was mounted upon has changed
    during the callback.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • When this function is called the local reference count does't need to
    be updated since the dentry is going away and dput definitely must
    not be called here.

    Also the autofs info struct field inode isn't used so remove it.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • There are now two distinct dentry operations uses. One for dentrys
    that trigger mounts and one for dentrys that do not.

    Rationalize the use of these dentry operations and rename them to
    reflect their function.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • Since the use of ->follow_link() has been eliminated there is no
    need to separate the indirect and direct inode operations.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • Remove code that is not used due to the use of ->d_automount()
    and ->d_manage().

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • This patch required a previous patch to add the ->d_automount()
    dentry operation.

    Add a function to use the newly defined ->d_manage() dentry operation
    for blocking during mount and expire.

    Whether the VFS calls the dentry operations d_automount() and d_manage()
    is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs
    uses the d_automount() operation to callback to user space to request
    mount operations and the d_manage() operation to block walks into mounts
    that are under construction or destruction.

    In order to prevent these functions from being called unnecessarily the
    DMANAGED_* flags are cleared for cases which would cause this. In the
    common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both
    set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is
    cleared upon successful mount request completion and set during expire
    runs, both during the dentry expire check, and if selected for expire,
    is left set until a subsequent successful mount request completes.

    The exception to this is the so-called rootless multi-mount which has
    no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag
    is cleared upon successful mount request completion as well and set
    again after a successful expire.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • Add a function to use the newly defined ->d_automount() dentry operation
    for triggering mounts instead of doing the user space callback in ->lookup()
    and ->d_revalidate().

    Note, to be useful the subsequent patch to add the ->d_manage() dentry
    operation is also needed so the discussion of functionality is deferred to
    that patch.

    Signed-off-by: Ian Kent
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Ian Kent
     
  • Remove the automount through follow_link() kludge code from pathwalk in favour
    of using d_automount().

    Signed-off-by: David Howells
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Make CIFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.

    [NOTE: THIS IS UNTESTED!]

    Signed-off-by: David Howells
    Cc: Steve French
    Signed-off-by: Al Viro

    David Howells
     
  • Make NFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.

    Signed-off-by: David Howells
    Acked-by: Trond Myklebust
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Make AFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
    point directories. This can be used by fstatat() users to permit the
    gathering of attributes on an automount point and also prevent
    mass-automounting of a directory of automount points by ls.

    Signed-off-by: David Howells
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
    sleep when it tries to transit away from one of that filesystem's directories
    during a pathwalk. The operation is keyed off a new dentry flag
    (DCACHE_MANAGE_TRANSIT).

    The filesystem is allowed to be selective about which processes it holds and
    which it permits to continue on or prohibits from transiting from each flagged
    directory. This will allow autofs to hold up client processes whilst letting
    its userspace daemon through to maintain the directory or the stuff behind it
    or mounted upon it.

    The ->d_manage() dentry operation:

    int (*d_manage)(struct path *path, bool mounting_here);

    takes a pointer to the directory about to be transited away from and a flag
    indicating whether the transit is undertaken by do_add_mount() or
    do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

    It should return 0 if successful and to let the process continue on its way;
    -EISDIR to prohibit the caller from skipping to overmounted filesystems or
    automounting, and to use this directory; or some other error code to return to
    the user.

    ->d_manage() is called with namespace_sem writelocked if mounting_here is true
    and no other locks held, so it may sleep. However, if mounting_here is true,
    it may not initiate or wait for a mount or unmount upon the parameter
    directory, even if the act is actually performed by userspace.

    Within fs/namei.c, follow_managed() is extended to check with d_manage() first
    on each managed directory, before transiting away from it or attempting to
    automount upon it.

    follow_down() is renamed follow_down_one() and should only be used where the
    filesystem deliberately intends to avoid management steps (e.g. autofs).

    A new follow_down() is added that incorporates the loop done by all other
    callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
    and CIFS do use it, their use is removed by converting them to use
    d_automount()). The new follow_down() calls d_manage() as appropriate. It
    also takes an extra parameter to indicate if it is being called from mount code
    (with namespace_sem writelocked) which it passes to d_manage(). follow_down()
    ignores automount points so that it can be used to mount on them.

    __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
    DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
    sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
    that determine whether to abort or not itself. That would allow the autofs
    daemon to continue on in rcu-walk mode.

    Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
    required as every tranist from that directory will cause d_manage() to be
    invoked. It can always be set again when necessary.

    ==========================
    WHAT THIS MEANS FOR AUTOFS
    ==========================

    Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
    trigger the automounting of indirect mounts, and both of these can be called
    with i_mutex held.

    autofs knows that the i_mutex will be held by the caller in lookup(), and so
    can drop it before invoking the daemon - but this isn't so for d_revalidate(),
    since the lock is only held on _some_ of the code paths that call it. This
    means that autofs can't risk dropping i_mutex from its d_revalidate() function
    before it calls the daemon.

    The bug could manifest itself as, for example, a process that's trying to
    validate an automount dentry that gets made to wait because that dentry is
    expired and needs cleaning up:

    mkdir S ffffffff8014e05a 0 32580 24956
    Call Trace:
    [] :autofs4:autofs4_wait+0x674/0x897
    [] avc_has_perm+0x46/0x58
    [] autoremove_wake_function+0x0/0x2e
    [] :autofs4:autofs4_expire_wait+0x41/0x6b
    [] :autofs4:autofs4_revalidate+0x91/0x149
    [] __lookup_hash+0xa0/0x12f
    [] lookup_create+0x46/0x80
    [] sys_mkdirat+0x56/0xe4

    versus the automount daemon which wants to remove that dentry, but can't
    because the normal process is holding the i_mutex lock:

    automount D ffffffff8014e05a 0 32581 1 32561
    Call Trace:
    [] __mutex_lock_slowpath+0x60/0x9b
    [] do_path_lookup+0x2ca/0x2f1
    [] .text.lock.mutex+0xf/0x14
    [] do_rmdir+0x77/0xde
    [] tracesys+0x71/0xe0
    [] tracesys+0xd5/0xe0

    which means that the system is deadlocked.

    This patch allows autofs to hold up normal processes whilst the daemon goes
    ahead and does things to the dentry tree behind the automouter point without
    risking a deadlock as almost no locks are held in d_manage() and none in
    d_automount().

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_automount) to handle automounting directories rather than
    abusing the follow_link() inode operation. The operation is keyed off a new
    dentry flag (DCACHE_NEED_AUTOMOUNT).

    This also makes it easier to add an AT_ flag to suppress terminal segment
    automount during pathwalk and removes the need for the kludge code in the
    pathwalk algorithm to handle directories with follow_link() semantics.

    The ->d_automount() dentry operation:

    struct vfsmount *(*d_automount)(struct path *mountpoint);

    takes a pointer to the directory to be mounted upon, which is expected to
    provide sufficient data to determine what should be mounted. If successful, it
    should return the vfsmount struct it creates (which it should also have added
    to the namespace using do_add_mount() or similar). If there's a collision with
    another automount attempt, NULL should be returned. If the directory specified
    by the parameter should be used directly rather than being mounted upon,
    -EISDIR should be returned. In any other case, an error code should be
    returned.

    The ->d_automount() operation is called with no locks held and may sleep. At
    this point the pathwalk algorithm will be in ref-walk mode.

    Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
    added to handle mountpoints. It will return -EREMOTE if the automount flag was
    set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
    symlinks or mountpoints, -EISDIR if the walk point should be used without
    mounting and 0 if successful. The path will be updated to point to the mounted
    filesystem if a successful automount took place.

    __follow_mount() is replaced by follow_managed() which is more generic
    (especially with the patch that adds ->d_manage()). This handles transits from
    directories during pathwalk, including automounting and skipping over
    mountpoints (and holding processes with the next patch).

    __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
    automount point with nothing mounted on it.

    follow_dotdot*() does not handle automounts as you don't want to trigger them
    whilst following "..".

    I've also extracted the mount/don't-mount logic from autofs4 and included it
    here. It makes the mount go ahead anyway if someone calls open() or creat(),
    tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
    or sticks a '/' on the end of the pathname. If they do a stat(), however,
    they'll only trigger the automount if they didn't also say O_NOFOLLOW.

    I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
    inodes as automount points. This flag is automatically propagated to the
    dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could
    save AFS a private flag bit apiece, but is not strictly necessary. It would be
    preferable to do the propagation in d_set_d_op(), but that doesn't normally
    have access to the inode.

    [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
    succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
    that, rather than just returning with ungrabbed *path]

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • do_lookup() has a path leading from LOOKUP_RCU case to non-RCU
    crossing of mountpoints, which breaks things badly. If we
    hit need_revalidate: and do nothing in there, we need to come
    back into LOOKUP_RCU half of things, not to done: in non-RCU
    one.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2011

3 commits

  • flush_scheduled_work() is going away. afs needs to make sure all the
    works it has queued have finished before being unloaded and there can
    be arbitrary number of pending works. Add afs_wq and use it as the
    flush domain instead of the system workqueue.

    Also, convert cancel_delayed_work() + flush_scheduled_work() to
    cancel_delayed_work_sync() in afs_mntpt_kill_timer().

    Signed-off-by: Tejun Heo
    Signed-off-by: David Howells
    Cc: linux-afs@lists.infradead.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • fscache_submit_exclusive_op() adds an operation to the pending list if
    other operations are pending. Fix the check for pending ops as n_ops
    must be greater than 0 at the point it is checked as it is incremented
    immediately before under lock.

    Signed-off-by: Akshat Aranya
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Akshat Aranya
     
  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
    kernel: fix hlist_bl again
    cgroups: Fix a lockdep warning at cgroup removal
    fs: namei fix ->put_link on wrong inode in do_filp_open

    Linus Torvalds
     

14 Jan, 2011

15 commits

  • J. R. Okajima noticed that ->put_link is being attempted on the
    wrong inode, and suggested the way to fix it. I changed it a bit
    according to Al's suggestion to keep an explicit link path around.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
    fs: fix do_last error case when need_reval_dot
    nfs: add missing rcu-walk check
    fs: hlist UP debug fixup
    fs: fix dropping of rcu-walk from force_reval_path
    fs: force_reval_path drop rcu-walk before d_invalidate
    fs: small rcu-walk documentation fixes

    Fixed up trivial conflicts in Documentation/filesystems/porting

    Linus Torvalds
     
  • When open(2) without O_DIRECTORY opens an existing dir, it should return
    EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it
    is changed by d_revalidate() which returns any positive to represent
    'the target dir is valid.'

    Should we keep and return the initialized 'error' in this case.

    Signed-off-by: Nick Piggin

    J. R. Okajima
     
  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • As J. R. Okajima noted, force_reval_path passes in the same dentry to
    d_revalidate as the one in the nameidata structure (other callers pass in a
    child), so the locking breaks. This can oops with a chrooted nfs mount, for
    example. Similarly there can be other problems with revalidating a dentry
    which is already in nameidata of the path walk.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • d_revalidate can return in rcu-walk mode even when it returns 0. We can't just
    call any old dcache function on rcu-walk dentry (the dentry is unstable, so
    even through d_lock can safely be taken, the result may no longer be what we
    expect -- careful re-checks would be required). So just drop rcu in this case.

    (I missed this conversion when switching to the rcu-walk convention that Linus
    suggested)

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • The sync_inodes_sb() function does not have a return value. Remove the
    outdated documentation comment.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefan Hajnoczi
     
  • PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
    be added to page->flags without overflowing (because of the sparse section
    bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
    has to move the memory hotplug code from _mapcount to lru.next to avoid
    any risk of clashes. We can't use lru.next for PG_buddy removal, but
    memory hotplug can use lru.next even more easily than the mapcount
    instead.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add hugepage stat information to /proc/vmstat and /proc/meminfo.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • We'd like to be able to oom_score_adj a process up/down as it
    enters/leaves the foreground. Currently, it is not possible to oom_adj
    down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
    oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
    or its inherited value at fork. Assuming the thread that has forked it
    has oom_score_adj of 0, each process could decrease it back from 0 upon
    activation unless a CAP_SYS_RESOURCE thread elevated it to something
    higher.

    Alternative considered:

    * a setuid binary
    * a daemon with CAP_SYS_RESOURCE

    Since you don't wan't all processes to be able to reduce their oom_adj, a
    setuid or daemon implementation would be complex. The alternatives also
    have much higher overhead.

    This patch updated from original patch based on feedback from David
    Rientjes.

    Signed-off-by: Mandeep Singh Baines
    Acked-by: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • Currently there is no way to find whether a process has locked its pages
    in memory or not. And which of the memory regions are locked in memory.

    Add a new field "Locked" to export this information via the smaps file.

    Signed-off-by: Nikanth Karthikesan
    Acked-by: Balbir Singh
    Acked-by: Wu Fengguang
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     
  • Merge mpage_end_io_read() and mpage_end_io_write() into mpage_end_io() to
    eliminate code duplication.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Hai Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hai Shan
     
  • Use correct function name, remove incorrect apostrophe

    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
    usually set to LONG_MAX. The logic in wb_writeback() then calls
    __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we
    easily end up with non-positive nr_to_write after the function returns, if
    the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment.

    When nr_to_write is
    Signed-off-by: Wu Fengguang
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Jan Engelhardt
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Background writeback is easily livelockable in a loop in wb_writeback() by
    a process continuously re-dirtying pages (or continuously appending to a
    file). This is in fact intended as the target of background writeback is
    to write dirty pages it can find as long as we are over
    dirty_background_threshold.

    But the above behavior gets inconvenient at times because no other work
    queued in the flusher thread's queue gets processed. In particular, since
    e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1)
    can hang forever waiting for flusher thread to do the work.

    Generally, when a flusher thread has some work queued, someone submitted
    the work to achieve a goal more specific than what background writeback
    does. Moreover by working on the specific work, we also reduce amount of
    dirty pages which is exactly the target of background writeout. So it
    makes sense to give specific work a priority over a generic page cleaning.

    Thus we interrupt background writeback if there is some other work to do.
    We return to the background writeback after completing all the queued
    work.

    This may delay the writeback of expired inodes for a while, however the
    expired inodes will eventually be flushed to disk as long as the other
    works won't livelock.

    [fengguang.wu@intel.com: update comment]
    Signed-off-by: Jan Kara
    Signed-off-by: Wu Fengguang
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Jan Engelhardt
    Cc: Jens Axboe

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara