16 Apr, 2015

2 commits


23 Feb, 2015

2 commits

  • Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
    thereof) in cachefiles:

    (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
    it doesn't want to deal with automounts in its cache.

    (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
    cachefiles.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

20 Nov, 2014

1 commit


14 Oct, 2014

2 commits

  • …git/dhowells/linux-fs

    Pull fs-cache fixes from David Howells:
    "Two fixes for bugs in CacheFiles and a cleanup in FS-Cache"

    * tag 'fscache-fixes-20141013' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    fs/fscache/object-list.c: use __seq_open_private()
    CacheFiles: Fix incorrect test for in-memory object collision
    CacheFiles: Handle object being killed before being set up

    Linus Torvalds
     
  • When CacheFiles cache objects are in use, they have in-memory representations,
    as defined by the cachefiles_object struct. These are kept in a tree rooted in
    the cache and indexed by dentry pointer (since there's a unique mapping between
    object index key and dentry).

    Collisions can occur between a representation already in the tree and a new
    representation being set up because it takes time to dispose of an old
    representation - particularly if it must be unlinked or renamed.

    When such a collision occurs, cachefiles_mark_object_active() is meant to check
    to see if the old, already-present representation is in the process of being
    discarded (ie. FSCACHE_OBJECT_IS_LIVE is not set on it) - and, if so, wait for
    the representation to be removed (ie. CACHEFILES_OBJECT_ACTIVE is then
    cleared).

    However, the test for whether the old representation is still live is checking
    the new object - which always will be live at this point. This leads to an
    oops looking like:

    CacheFiles: Error: Unexpected object collision
    object: OBJ1b354
    objstate=LOOK_UP_OBJECT fl=8 wbusy=2 ev=0[0]
    ops=0 inp=0 exc=0
    parent=ffff88053f5417c0
    cookie=ffff880538f202a0 [pr=ffff8805381b7160 nd=ffff880509c6eb78 fl=27]
    key=[8] '2490000000000000'
    xobject: OBJ1a600
    xobjstate=DROP_OBJECT fl=70 wbusy=2 ev=0[0]
    xops=0 inp=0 exc=0
    xparent=ffff88053f5417c0
    xcookie=ffff88050f4cbf70 [pr=ffff8805381b7160 nd= (null) fl=12]
    ------------[ cut here ]------------
    kernel BUG at fs/cachefiles/namei.c:200!
    ...
    Workqueue: fscache_object fscache_object_work_func [fscache]
    ...
    RIP: ... cachefiles_walk_to_object+0x7ea/0x860 [cachefiles]
    ...
    Call Trace:
    [] ? cachefiles_lookup_object+0x58/0x100 [cachefiles]
    [] ? fscache_look_up_object+0xb9/0x1d0 [fscache]
    [] ? fscache_parent_ready+0x2d/0x80 [fscache]
    [] ? fscache_object_work_func+0x92/0x1f0 [fscache]
    [] ? process_one_work+0x16b/0x400
    [] ? worker_thread+0x116/0x380
    [] ? manage_workers.isra.21+0x290/0x290
    [] ? kthread+0xbc/0xe0
    [] ? flush_kthread_worker+0x80/0x80
    [] ? ret_from_fork+0x7c/0xb0
    [] ? flush_kthread_worker+0x80/0x80

    Reported-by: Manuel Schölling
    Signed-off-by: David Howells
    Acked-by: Steve Dickson

    David Howells
     

09 Oct, 2014

1 commit


30 Sep, 2014

1 commit

  • If a cache object gets killed whilst in the process of being set up - for
    instance if the netfs relinquishes the cookie that the object is associated
    with - then the object's state machine will transit to the DROP_OBJECT state
    without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.

    This is a problem for CacheFiles because cachefiles_drop_object() assumes that
    object->dentry will be set upon reaching the DROP_OBJECT state and has an
    ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
    set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
    they fail).

    To fix this, just make the dentry cleanup in cachefiles_drop_object()
    conditional on the dentry actually being set and remove the assertion.

    CacheFiles: Assertion failed
    ------------[ cut here ]------------
    kernel BUG at .../fs/cachefiles/namei.c:425!
    ...
    Workqueue: fscache_object fscache_object_work_func [fscache]
    ...
    RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
    ...
    Call Trace:
    [] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
    [] ? fscache_drop_object+0xd1/0x1d0 [fscache]
    [] ? fscache_object_work_func+0x87/0x210 [fscache]
    [] ? process_one_work+0x155/0x450
    [] ? worker_thread+0x114/0x370
    [] ? manage_workers.isra.21+0x2c0/0x2c0
    [] ? kthread+0xbc/0xe0
    [] ? flush_kthread_worker+0xa0/0xa0
    [] ? ret_from_fork+0x7c/0xb0
    [] ? flush_kthread_worker+0xa0/0xa0

    Reported-by: Manuel Schölling
    Signed-off-by: David Howells
    Acked-by: Steve Dickson

    David Howells
     

26 Sep, 2014

1 commit


18 Sep, 2014

2 commits

  • Not all filesystems now provide the rename i_op - ext4 for one - but rather
    provide the rename2 i_op. CacheFiles checks that the filesystem has rename
    and so will reject ext4 now with EPERM:

    CacheFiles: Failed to register: -1

    Fix this by checking for rename2 as an alternative. The call to vfs_rename()
    actually handles selection of the appropriate function, so we needn't worry
    about that.

    Turning on debugging shows:

    [cachef] ==> cachefiles_get_directory(,,cache)
    [cachef] subdir -> ffff88000b22b778 positive
    [cachef]

    David Howells
     
  • These two have been unused since

    commit c4d6d8dbf335c7fa47341654a37c53a512b519bb
    CacheFiles: Fix the marking of cached pages

    in 3.8.

    Signed-off-by: NeilBrown
    Signed-off-by: David Howells

    NeilBrown
     

07 Jun, 2014

2 commits


13 Apr, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     

05 Apr, 2014

1 commit

  • Pull renameat2 system call from Miklos Szeredi:
    "This adds a new syscall, renameat2(), which is the same as renameat()
    but with a flags argument.

    The purpose of extending rename is to add cross-rename, a symmetric
    variant of rename, which exchanges the two files. This allows
    interesting things, which were not possible before, for example
    atomically replacing a directory tree with a symlink, etc... This
    also allows overlayfs and friends to operate on whiteouts atomically.

    Andy Lutomirski also suggested a "noreplace" flag, which disables the
    overwriting behavior of rename.

    These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
    implemented for ext4 as an example and for testing"

    * 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ext4: add cross rename support
    ext4: rename: split out helper functions
    ext4: rename: move EMLINK check up
    ext4: rename: create ext4_renament structure for local vars
    vfs: add cross-rename
    vfs: lock_two_nondirectories: allow directory args
    security: add flags to rename hooks
    vfs: add RENAME_NOREPLACE flag
    vfs: add renameat2 syscall
    vfs: rename: use common code for dir and non-dir
    vfs: rename: move d_move() up
    vfs: add d_is_dir()

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • This code used to have its own lru cache pagevec up until a0b8cab3 ("mm:
    remove lru parameter from __pagevec_lru_add and remove parts of pagevec
    API"). Now it's just add_to_page_cache() followed by lru_cache_add(),
    might as well use add_to_page_cache_lru() directly.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

02 Apr, 2014

1 commit


01 Apr, 2014

2 commits


13 Nov, 2013

1 commit

  • Pull vfs updates from Al Viro:
    "All kinds of stuff this time around; some more notable parts:

    - RCU'd vfsmounts handling
    - new primitives for coredump handling
    - files_lock is gone
    - Bruce's delegations handling series
    - exportfs fixes

    plus misc stuff all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
    ecryptfs: ->f_op is never NULL
    locks: break delegations on any attribute modification
    locks: break delegations on link
    locks: break delegations on rename
    locks: helper functions for delegation breaking
    locks: break delegations on unlink
    namei: minor vfs_unlink cleanup
    locks: implement delegations
    locks: introduce new FL_DELEG lock flag
    vfs: take i_mutex on renamed file
    vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
    vfs: don't use PARENT/CHILD lock classes for non-directories
    vfs: pull ext4's double-i_mutex-locking into common code
    exportfs: fix quadratic behavior in filehandle lookup
    exportfs: better variable name
    exportfs: move most of reconnect_path to helper function
    exportfs: eliminate unused "noprogress" counter
    exportfs: stop retrying once we race with rename/remove
    exportfs: clear DISCONNECTED on all parents sooner
    exportfs: more detailed comment for path_reconnect
    ...

    Linus Torvalds
     

09 Nov, 2013

3 commits

  • NFSv4 uses leases to guarantee that clients can cache metadata as well
    as data.

    Cc: Mikulas Patocka
    Cc: David Howells
    Cc: Tyler Hicks
    Cc: Dustin Kirkland
    Acked-by: Jeff Layton
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Cc: David Howells
    Acked-by: Jeff Layton
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • We need to break delegations on any operation that changes the set of
    links pointing to an inode. Start with unlink.

    Such operations also hold the i_mutex on a parent directory. Breaking a
    delegation may require waiting for a timeout (by default 90 seconds) in
    the case of a unresponsive NFS client. To avoid blocking all directory
    operations, we therefore drop locks before waiting for the delegation.
    The logic then looks like:

    acquire locks
    ...
    test for delegation; if found:
    take reference on inode
    release locks
    wait for delegation break
    drop reference on inode
    retry

    It is possible this could never terminate. (Even if we take precautions
    to prevent another delegation being acquired on the same inode, we could
    get a different inode on each retry.) But this seems very unlikely.

    The initial test for a delegation happens after the lock on the target
    inode is acquired, but the directory inode may have been acquired
    further up the call stack. We therefore add a "struct inode **"
    argument to any intervening functions, which we use to pass the inode
    back up to the caller in the case it needs a delegation synchronously
    broken.

    Cc: David Howells
    Cc: Tyler Hicks
    Cc: Dustin Kirkland
    Acked-by: Jeff Layton
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     

28 Sep, 2013

1 commit

  • Provide the ability to enable and disable fscache cookies. A disabled cookie
    will reject or ignore further requests to:

    Acquire a child cookie
    Invalidate and update backing objects
    Check the consistency of a backing object
    Allocate storage for backing page
    Read backing pages
    Write to backing pages

    but still allows:

    Checks/waits on the completion of already in-progress objects
    Uncaching of pages
    Relinquishment of cookies

    Two new operations are provided:

    (1) Disable a cookie:

    void fscache_disable_cookie(struct fscache_cookie *cookie,
    bool invalidate);

    If the cookie is not already disabled, this locks the cookie against other
    dis/enablement ops, marks the cookie as being disabled, discards or
    invalidates any backing objects and waits for cessation of activity on any
    associated object.

    This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
    but it reinitialises the cookie such that it can be reenabled.

    All possible failures are handled internally. The caller should consider
    calling fscache_uncache_all_inode_pages() afterwards to make sure all page
    markings are cleared up.

    (2) Enable a cookie:

    void fscache_enable_cookie(struct fscache_cookie *cookie,
    bool (*can_enable)(void *data),
    void *data)

    If the cookie is not already enabled, this locks the cookie against other
    dis/enablement ops, invokes can_enable() and, if the cookie is not an
    index cookie, will begin the procedure of acquiring backing objects.

    The optional can_enable() function is passed the data argument and returns
    a ruling as to whether or not enablement should actually be permitted to
    begin.

    All possible failures are handled internally. The cookie will only be
    marked as enabled if provisional backing objects are allocated.

    A later patch will introduce these to NFS. Cookie enablement during nfs_open()
    is then contingent on i_writecount <dhowells@redhat.com

    David Howells
     

21 Sep, 2013

2 commits

  • Don't try to dump the index key that distinguishes an object if netfs
    data in the cookie the object refers to has been cleared (ie. the
    cookie has passed most of the way through
    __fscache_relinquish_cookie()).

    Since the netfs holds the index key, we can't get at it once the ->def
    and ->netfs_data pointers have been cleared - and a NULL pointer
    exception will ensue, usually just after a:

    CacheFiles: Error: Unexpected object collision

    error is reported.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if
    we determine there's an error or that the data is stale.

    Further, assigning the output of vfs_getxattr() to auxbuf->len gives
    problems with checking for errors as auxbuf->len is a u16. We don't
    actually need to set auxbuf->len, so keep the length in a variable for
    now. We shouldn't need to check the upper limit of the buffer as an
    overflow there should be indicated by -ERANGE.

    While we're at it, fscache_check_aux() returns an enum value, not an
    int, so assign it to an appropriately typed variable rather than to ret.

    Signed-off-by: Josh Boyer
    Signed-off-by: David Howells
    cc: Hongyi Jia
    cc: Milosz Tanski
    Signed-off-by: Linus Torvalds

    Josh Boyer
     

06 Sep, 2013

1 commit


04 Jul, 2013

1 commit

  • Now that the LRU to add a page to is decided at LRU-add time, remove the
    misleading lru parameter from __pagevec_lru_add. A consequence of this
    is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
    helpers are misleading as the caller no longer has direct control over
    what LRU the page is added to. Unused helpers are removed by this patch
    and existing users of pagevec_lru_add_file() are converted to use
    lru_cache_add_file() directly and use the per-cpu pagevecs instead of
    creating their own pagevec.

    Signed-off-by: Mel Gorman
    Reviewed-by: Jan Kara
    Reviewed-by: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

19 Jun, 2013

5 commits

  • Signed-off-by: Haicheng Li
    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    Haicheng Li
     
  • Simplify the way fscache cache objects retain their cookie. The way I
    implemented the cookie storage handling made synchronisation a pain (ie. the
    object state machine can't rely on the cookie actually still being there).

    Instead of the the object being detached from the cookie and the cookie being
    freed in __fscache_relinquish_cookie(), we defer both operations:

    (*) The detachment of the object from the list in the cookie now takes place
    in fscache_drop_object() and is thus governed by the object state machine
    (fscache_detach_from_cookie() has been removed).

    (*) The release of the cookie is now in fscache_object_destroy() - which is
    called by the cache backend just before it frees the object.

    This means that the fscache_cookie struct is now available to the cache all the
    way through from ->alloc_object() to ->drop_object() and ->put_object() -
    meaning that it's no longer necessary to take object->lock to guarantee access.

    However, __fscache_relinquish_cookie() doesn't wait for the object to go all
    the way through to destruction before letting the netfs proceed. That would
    massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the
    cookie around, in must therefore break all attachments to the netfs - which
    includes ->def, ->netfs_data and any outstanding page read/writes.

    To handle this, struct fscache_cookie now has an n_active counter:

    (1) This starts off initialised to 1.

    (2) Any time the cache needs to get at the netfs data, it calls
    fscache_use_cookie() to increment it - if it is not zero. If it was zero,
    then access is not permitted.

    (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
    to decrement it. This does a wake-up on it if it reaches 0.

    (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
    reach 0. The initialisation to 1 in step (1) ensures that we only get
    wake ups when we're trying to get rid of the cookie.

    This leaves __fscache_relinquish_cookie() a lot simpler.

    ***
    This fixes a problem in the current code whereby if fscache_invalidate() is
    followed sufficiently quickly by fscache_relinquish_cookie() then it is
    possible for __fscache_relinquish_cookie() to have detached the cookie from the
    object and cleared the pointer before a thread is dispatched to process the
    invalidation state in the object state machine.

    Since the pending write clearance was deferred to the invalidation state to
    make it asynchronous, we need to either wait in relinquishment for the stores
    tree to be cleared in the invalidation state or we need to handle the clearance
    in relinquishment.

    Further, if the relinquishment code does clear the tree, then the invalidation
    state need to make the clearance contingent on still having the cookie to hand
    (since that's where the tree is rooted) and we have to prevent the cookie from
    disappearing for the duration.

    This can lead to an oops like the following:

    BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
    ...
    RIP: 0010:[] _spin_lock+0xe/0x30
    ...
    CR2: 000000000000000c ...
    ...
    Process kslowd002 (...)
    ....
    Call Trace:
    [] fscache_invalidate_writes+0x38/0xd0 [fscache]
    [] ? __switch_to+0xd0/0x320
    [] ? find_busiest_queue+0x69/0x150
    [] ? slow_work_enqueue+0x104/0x180
    [] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
    [] ? bit_waitqueue+0x17/0xd0
    [] slow_work_execute+0x233/0x310
    [] slow_work_thread+0x205/0x360
    [] ? autoremove_wake_function+0x0/0x40
    [] ? slow_work_thread+0x0/0x360
    [] kthread+0x96/0xa0
    [] child_rip+0xa/0x20
    [] ? kthread+0x0/0xa0
    [] ? child_rip+0x0/0x20

    The parameter to fscache_invalidate_writes() was object->cookie which is NULL.

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Fix object state machine to have separate work and wait states as that makes
    it easier to envision.

    There are now three kinds of state:

    (1) Work state. This is an execution state. No event processing is performed
    by a work state. The function attached to a work state returns a pointer
    indicating the next state to which the OSM should transition. Returning
    NO_TRANSIT repeats the current state, but goes back to the scheduler
    first.

    (2) Wait state. This is an event processing state. No execution is
    performed by a wait state. Wait states are just tables of "if event X
    occurs, clear it and transition to state Y". The dispatcher returns to
    the scheduler if none of the events in which the wait state has an
    interest are currently pending.

    (3) Out-of-band state. This is a special work state. Transitions to normal
    states can be overridden when an unexpected event occurs (eg. I/O error).
    Instead the dispatcher disables and clears the OOB event and transits to
    the specified work state. This then acts as an ordinary work state,
    though object->state points to the overridden destination. Returning
    NO_TRANSIT resumes the overridden transition.

    In addition, the states have names in their definitions, so there's no need for
    tables of state names. Further, the EV_REQUEUE event is no longer necessary as
    that is automatic for work states.

    Since the states are now separate structs rather than values in an enum, it's
    not possible to use comparisons other than (non-)equality between them, so use
    some object->flags to indicate what phase an object is in.

    The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
    (EV_KILL). An object flag now carries the information about retirement.

    Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
    into an KILL_OBJECT state and additional states have been added for handling
    waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

    A state has also been added for synchronising with parent object initialisation
    (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Wrap checks on object state (mostly outside of fs/fscache/object.c) with
    inline functions so that the mechanism can be replaced.

    Some of the state checks within object.c are left as-is as they will be
    replaced.

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Just some cleanup.

    (And note the caller of this function may, for example, call vfs_unlink
    on a child, so the "1" (I_MUTEX_PARENT) really was what was intended
    here.)

    Signed-off-by: J. Bruce Fields
    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    J. Bruce Fields
     

10 Apr, 2013

1 commit


21 Dec, 2012

5 commits

  • Mark as cancelled an operation that is in progress rather than pending at the
    time it is cancelled, and call fscache_complete_op() to cancel an operation so
    that blocked ops can be started.

    Signed-off-by: David Howells

    David Howells
     
  • Don't mask off the object event mask when printing it. That way it can be seen
    if threre are bits set that shouldn't be.

    Signed-off-by: David Howells

    David Howells
     
  • CacheFiles is missing some calls to fscache_retrieval_complete() in the error
    handling/collision paths of its reader functions.

    This can be seen by the following assertion tripping in fscache_put_operation()
    whereby the operation being destroyed is still in the in-progress state and has
    not been cancelled or completed:

    FS-Cache: Assertion failed
    3 == 5 is false
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/operation.c:408!
    invalid opcode: 0000 [#1] SMP
    CPU 2
    Modules linked in: xfs ioatdma dca loop joydev evdev
    psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
    pci_hotplug sg sr_mod]

    Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
    RIP: 0010:[] [] fscache_put_operation+0x304/0x330
    RSP: 0018:ffff880062f739d8 EFLAGS: 00010296
    RAX: 0000000000000025 RBX: ffff8800c5122e84 RCX: ffffffff81ddf040
    RDX: 00000000ffffffff RSI: 0000000000000082 RDI: ffffffff81ddef30
    RBP: ffff880062f739f8 R08: 0000000000000005 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000003 R12: ffff8800c5122e40
    R13: ffff880037a2cd20 R14: ffff880087c7a058 R15: ffff880087c7a000
    FS: 00007f63dcf636e0(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f0c0a91f000 CR3: 0000000062ec2000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process httpd (pid: 8062, threadinfo ffff880062f72000, task ffff880087e58000)
    Stack:
    ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
    ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
    ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
    Call Trace:
    [] __fscache_read_or_alloc_pages+0x1fe/0x530
    [] __nfs_readpages_from_fscache+0x70/0x1c0
    [] nfs_readpages+0xca/0x1e0
    [] ? rpc_do_put_task+0x36/0x50
    [] ? alloc_nfs_open_context+0x4b/0x110
    [] ? rpc_call_sync+0x5a/0x70
    [] __do_page_cache_readahead+0x1ca/0x270
    [] ra_submit+0x21/0x30
    [] ondemand_readahead+0x11d/0x250
    [] page_cache_sync_readahead+0x36/0x60
    [] generic_file_aio_read+0x454/0x770
    [] nfs_file_read+0xe1/0x130
    [] do_sync_read+0xd9/0x120
    [] ? mntput+0x1f/0x40
    [] ? fput+0x1cb/0x260
    [] vfs_read+0xc8/0x180
    [] sys_read+0x55/0x90

    Reported-by: Mark Moseley
    Signed-off-by: David Howells

    David Howells
     
  • Implement invalidation for CacheFiles. This is in two parts:

    (1) Provide an invalidation method (which just truncates the backing file).

    (2) Abort attempts to copy anything read from the backing file whilst
    invalidation is in progress.

    Question: CacheFiles uses truncation in a couple of places. It has been using
    notify_change() rather than sys_truncate() or something similar. This means
    it bypasses a bunch of checks and suchlike that it possibly should be making
    (security, file locking, lease breaking, vfsmount write). Should it be using
    vfs_truncate() as added by a preceding patch or should it use notify_write()
    and assume that anyone poking around in the cache files on disk gets
    everything they deserve?

    Signed-off-by: David Howells

    David Howells
     
  • Fix the state management of internal fscache operations and the accounting of
    what operations are in what states.

    This is done by:

    (1) Give struct fscache_operation a enum variable that directly represents the
    state it's currently in, rather than spreading this knowledge over a bunch
    of flags, who's processing the operation at the moment and whether it is
    queued or not.

    This makes it easier to write assertions to check the state at various
    points and to prevent invalid state transitions.

    (2) Add an 'operation complete' state and supply a function to indicate the
    completion of an operation (fscache_op_complete()) and make things call
    it. The final call to fscache_put_operation() can then check that an op
    in the appropriate state (complete or cancelled).

    (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
    govern the state of an object:

    (a) The ->n_ops is now the number of extant operations on the object
    and is now decremented by fscache_put_operation() only.

    (b) The ->n_in_progress is simply the number of objects that have been
    taken off of the object's pending queue for the purposes of being
    run. This is decremented by fscache_op_complete() only.

    (c) The ->n_exclusive is the number of exclusive ops that have been
    submitted and queued or are in progress. It is decremented by
    fscache_op_complete() and by fscache_cancel_op().

    fscache_put_operation() and fscache_operation_gc() now no longer try to
    clean up ->n_exclusive and ->n_in_progress. That was leading to double
    decrements against fscache_cancel_op().

    fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
    double decrements against fscache_put_operation().

    fscache_submit_exclusive_op() now decides whether it has to queue an op
    based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
    will persist in being true even after all preceding operations have been
    cancelled or completed. Furthermore, if an object is active and there are
    runnable ops against it, there must be at least one op running.

    (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
    provide a function to record completion of the pages as they complete.

    When n_pages reaches 0, the operation is deemed to be complete and
    fscache_op_complete() is called.

    Add calls to fscache_retrieval_complete() anywhere we've finished with a
    page we've been given to read or allocate for. This includes places where
    we just return pages to the netfs for reading from the server and where
    accessing the cache fails and we discard the proposed netfs page.

    The bugs in the unfixed state management manifest themselves as oopses like the
    following where the operation completion gets out of sync with return of the
    cookie by the netfs. This is possible because the cache unlocks and returns
    all the netfs pages before recording its completion - which means that there's
    nothing to stop the netfs discarding them and returning the cookie.

    FS-Cache: Cookie 'NFS.fh' still has outstanding reads
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/cookie.c:519!
    invalid opcode: 0000 [#1] SMP
    CPU 1
    Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

    Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
    RIP: 0010:[] [] __fscache_relinquish_cookie+0x170/0x343 [fscache]
    RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282
    RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
    RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
    RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
    R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
    R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
    FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
    Stack:
    ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
    ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
    ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
    Call Trace:
    [] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
    [] nfs_clear_inode+0x3c/0x41 [nfs]
    [] nfs4_evict_inode+0x2f/0x33 [nfs]
    [] evict+0xa1/0x15c
    [] dispose_list+0x2c/0x38
    [] prune_icache_sb+0x28c/0x29b
    [] prune_super+0xd5/0x140
    [] shrink_slab+0x102/0x1ab
    [] balance_pgdat+0x2f2/0x595
    [] ? process_timeout+0xb/0xb
    [] kswapd+0x270/0x289
    [] ? __init_waitqueue_head+0x46/0x46
    [] ? balance_pgdat+0x595/0x595
    [] kthread+0x7f/0x87
    [] kernel_thread_helper+0x4/0x10
    [] ? finish_task_switch+0x45/0xc0
    [] ? retint_restore_args+0xe/0xe
    [] ? __init_kthread_worker+0x53/0x53
    [] ? gs_change+0xb/0xb

    Signed-off-by: David Howells

    David Howells