05 May, 2020

1 commit

  • - Add a SPDX header;
    - Adjust document and section titles;
    - Some whitespace fixes and new line breaks;
    - Mark literal blocks as such;
    - Add it to filesystems/caching/index.rst.

    Signed-off-by: Mauro Carvalho Chehab
    Link: https://lore.kernel.org/r/cfe4cb1bf8e1f0093d44c30801ec42e74721e543.1588021877.git.mchehab+huawei@kernel.org
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

18 Oct, 2018

2 commits

  • fscache_set_key() can incur an out-of-bounds read, reported by KASAN:

    BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
    Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615

    and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236

    BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
    BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
    Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466

    This happens for any index_key_len which is not divisible by 4 and is
    larger than the size of the inline key, because the code allocates exactly
    index_key_len for the key buffer, but the hashing loop is stepping through
    it 4 bytes (u32) at a time in the buf[] array.

    Fix this by calculating how many u32 buffers we'll need by using
    DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
    buffer to hold the index_key, then using that same count as the hashing
    index limit.

    Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
    Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
    Signed-off-by: Eric Sandeen
    Cc: stable
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    Eric Sandeen
     
  • The inline key in struct rxrpc_cookie is insufficiently initialized,
    zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
    bytes will end up hashing uninitialized memory because the memcpy only
    partially fills the last buf[] element.

    Fix this by clearing fscache_cookie objects on allocation rather than using
    the slab constructor to initialise them. We're going to pretty much fill
    in the entire struct anyway, so bringing it into our dcache writably
    shouldn't incur much overhead.

    This removes the need to do clearance in fscache_set_key() (where we aren't
    doing it correctly anyway).

    Also, we don't need to set cookie->key_len in fscache_set_key() as we
    already did it in the only caller, so remove that.

    Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
    Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
    Reported-by: Eric Sandeen
    Cc: stable
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

25 Jul, 2018

1 commit

  • When a cookie is allocated that causes fscache_object structs to be
    allocated, those objects are initialised with the cookie pointer, but
    aren't blessed with a ref on that cookie unless the attachment is
    successfully completed in fscache_attach_object().

    If attachment fails because the parent object was dying or there was a
    collision, fscache_attach_object() returns without incrementing the cookie
    counter - but upon failure of this function, the object is released which
    then puts the cookie, whether or not a ref was taken on the cookie.

    Fix this by taking a ref on the cookie when it is assigned in
    fscache_object_init(), even when we're creating a root object.

    Analysis from Kiran Kumar:

    This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel

    BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277

    fscache cookie ref count updated incorrectly during fscache object
    allocation resulting in following Oops.

    kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
    kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!

    [Cause]
    Two threads are trying to do operate on a cookie and two objects.

    (1) One thread tries to unmount the filesystem and in process goes over a
    huge list of objects marking them dead and deleting the objects.
    cookie->usage is also decremented in following path:

    nfs_fscache_release_super_cookie
    -> __fscache_relinquish_cookie
    ->__fscache_cookie_put
    ->BUG_ON(atomic_read(&cookie->usage) fscache_object_init
    -> assign cookie, but usage not bumped.
    2) fscache_attach_object -> fails in cant_attach_object because the
    cookie's backing object or cookie's->parent object are going away
    3) fscache_put_object
    -> cachefiles_put_object
    ->fscache_object_destroy
    ->fscache_cookie_put
    ->BUG_ON(atomic_read(&cookie->usage)
    Signed-off-by: David Howells

    Kiran Kumar Modukuri
     

12 Apr, 2018

1 commit

  • Don't open-code accesses to data structure internals.

    Link: http://lkml.kernel.org/r/20180313132639.17387-7-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Jeff Layton
    Cc: Darrick J. Wong
    Cc: Dave Chinner
    Cc: Ryusuke Konishi
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

06 Apr, 2018

2 commits

  • Maintain a catalogue of allocated cookies so that cookie collisions can be
    handled properly. For the moment, this just involves printing a warning
    and returning a NULL cookie to the caller of fscache_acquire_cookie(), but
    in future it might make sense to wait for the old cookie to finish being
    cleaned up.

    This requires the cookie key to be stored attached to the cookie so that we
    still have the key available if the netfs relinquishes the cookie. This is
    done by an earlier patch.

    The catalogue also renders redundant fscache_netfs_list (used for checking
    for duplicates), so that can be removed.

    Signed-off-by: David Howells
    Acked-by: Anna Schumaker
    Tested-by: Steve Dickson

    David Howells
     
  • Pass the object size in to fscache_acquire_cookie() and
    fscache_write_page() rather than the netfs providing a callback by which it
    can be received. This makes it easier to update the size of the object
    when a new page is written that extends the object.

    The current object size is also passed by fscache to the check_aux
    function, obviating the need to store it in the aux data.

    Signed-off-by: David Howells
    Acked-by: Anna Schumaker
    Tested-by: Steve Dickson

    David Howells
     

04 Apr, 2018

4 commits

  • Attach copies of the index key and auxiliary data to the fscache cookie so
    that:

    (1) The callbacks to the netfs for this stuff can be eliminated. This
    can simplify things in the cache as the information is still
    available, even after the cache has relinquished the cookie.

    (2) Simplifies the locking requirements of accessing the information as we
    don't have to worry about the netfs object going away on us.

    (3) The cache can do lazy updating of the coherency information on disk.
    As long as the cache is flushed before reboot/poweroff, there's no
    need to update the coherency info on disk every time it changes.

    (4) Cookies can be hashed or put in a tree as the index key is easily
    available. This allows:

    (a) Checks for duplicate cookies can be made at the top fscache layer
    rather than down in the bowels of the cache backend.

    (b) Caching can be added to a netfs object that has a cookie if the
    cache is brought online after the netfs object is allocated.

    A certain amount of space is made in the cookie for inline copies of the
    data, but if it won't fit there, extra memory will be allocated for it.

    The downside of this is that live cache operation requires more memory.

    Signed-off-by: David Howells
    Acked-by: Anna Schumaker
    Tested-by: Steve Dickson

    David Howells
     
  • Add more tracepoints to fscache, including:

    (*) fscache_page - Tracks netfs pages known to fscache.

    (*) fscache_check_page - Tracks the netfs querying whether a page is
    pending storage.

    (*) fscache_wake_cookie - Tracks cookies being woken up after a page
    completes/aborts storage in the cache.

    (*) fscache_op - Tracks operations being initialised.

    (*) fscache_wrote_page - Tracks return of the backend write_page op.

    (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write
    operation.

    Signed-off-by: David Howells

    David Howells
     
  • Add some tracepoints to fscache:

    (*) fscache_cookie - Tracks a cookie's usage count.

    (*) fscache_netfs - Logs registration of a network filesystem, including
    the pointer to the cookie allocated.

    (*) fscache_acquire - Logs cookie acquisition.

    (*) fscache_relinquish - Logs cookie relinquishment.

    (*) fscache_enable - Logs enablement of a cookie.

    (*) fscache_disable - Logs disablement of a cookie.

    (*) fscache_osm - Tracks execution of states in the object state machine.

    and cachefiles:

    (*) cachefiles_ref - Tracks a cachefiles object's usage count.

    (*) cachefiles_lookup - Logs result of lookup_one_len().

    (*) cachefiles_mkdir - Logs result of vfs_mkdir().

    (*) cachefiles_create - Logs result of vfs_create().

    (*) cachefiles_unlink - Logs calls to vfs_unlink().

    (*) cachefiles_rename - Logs calls to vfs_rename().

    (*) cachefiles_mark_active - Logs an object becoming active.

    (*) cachefiles_wait_active - Logs a wait for an old object to be
    destroyed.

    (*) cachefiles_mark_inactive - Logs an object becoming inactive.

    (*) cachefiles_mark_buried - Logs the burial of an object.

    Signed-off-by: David Howells

    David Howells
     
  • Report if an fscache cookie is relinquished multiple times by the netfs.

    Signed-off-by: David

    David Howells
     

20 Mar, 2018

1 commit


13 Nov, 2017

1 commit

  • Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
    extra argument and make it 'unsigned int throughout.

    Also, consolidate a bunch of identical action functions into a default
    function that can do the appropriate thing for the mode.

    Also, change the argument name in the bit_wait*() function declarations to
    reflect the fact that it's the mode and not the bit number.

    [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
    should be done differently, though he's not immediately sure as to how]

    Signed-off-by: David Howells
    Acked-by: Peter Zijlstra
    cc: Ingo Molnar

    David Howells
     

01 Feb, 2017

1 commit

  • fscache_disable_cookie() needs to clear the outstanding writes on the
    cookie it's disabling because they cannot be completed after.

    Without this, fscache_nfs_open_file() gets stuck because it disables the
    cookie when the file is opened for writing but can't uncache the pages till
    afterwards - otherwise there's a race between the open routine and anyone
    who already has it open R/O and is still reading from it.

    Looking in /proc/pid/stack of the offending process shows:

    [] __fscache_wait_on_page_write+0x82/0x9b [fscache]
    [] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
    [] nfs_fscache_open_file+0x59/0x9e [nfs]
    [] nfs4_file_open+0x17f/0x1b8 [nfsv4]
    [] do_dentry_open+0x16d/0x2b7
    [] vfs_open+0x5c/0x65
    [] path_openat+0x785/0x8fb
    [] do_filp_open+0x48/0x9e
    [] do_sys_open+0x13b/0x1cb
    [] SyS_open+0x19/0x1b
    [] do_syscall_64+0x80/0x17a
    [] return_from_SYSCALL_64+0x0/0x7a
    [] 0xffffffffffffffff

    Reported-by: Jianhong Yin
    Signed-off-by: David Howells
    Acked-by: Jeff Layton
    Acked-by: Steve Dickson
    Signed-off-by: Al Viro

    David Howells
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

02 Apr, 2015

2 commits

  • Any time an incomplete operation is cancelled, the operation cancellation
    function needs to be called to clean up. This is currently being passed
    directly to some of the functions that might want to call it, but not all.

    Instead, pass the cancellation method pointer to the fscache_operation_init()
    and have that cache it in the operation struct. Further, plug in a dummy
    cancellation handler if the caller declines to set one as this allows us to
    call the function unconditionally (the extra overhead isn't worth bothering
    about as we don't expect to be calling this typically).

    The cancellation method must thence be called everywhere the CANCELLED state
    is set. Note that we call it *before* setting the CANCELLED state such that
    the method can use the old state value to guide its operation.

    fscache_do_cancel_retrieval() needs moving higher up in the sources so that
    the init function can use it now.

    Without this, the following oops may be seen:

    FS-Cache: Assertion failed
    FS-Cache: 3 == 0 is false
    ------------[ cut here ]------------
    kernel BUG at ../fs/fscache/page.c:261!
    ...
    RIP: 0010:[] fscache_release_retrieval_op+0x77/0x100
    [] fscache_put_operation+0x114/0x2da
    [] __fscache_read_or_alloc_pages+0x358/0x3b3
    [] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
    [] nfs_readpages+0x10c/0x185 [nfs]
    [] ? alloc_pages_current+0x119/0x13e
    [] ? __page_cache_alloc+0xfb/0x10a
    [] __do_page_cache_readahead+0x188/0x22c
    [] ondemand_readahead+0x29e/0x2af
    [] page_cache_sync_readahead+0x38/0x3a
    [] generic_file_read_iter+0x1a2/0x55a
    [] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
    [] nfs_file_read+0x49/0x70 [nfs]
    [] new_sync_read+0x78/0x9c
    [] __vfs_read+0x13/0x38
    [] vfs_read+0x95/0x121
    [] SyS_read+0x4c/0x8a
    [] system_call_fastpath+0x12/0x17

    The assertion is showing that the remaining number of pages (n_pages) is not 0
    when the operation is being released.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • fscache_object_is_dead() returns true only if the object is marked dead and
    the cache got an I/O error. This should be a logical OR instead. Since two
    of the callers got split up into handling for separate subcases, expand the
    other callers and kill the function. This is probably the right thing to do
    anyway since one of the subcases isn't about the object at all, but rather
    about the cache.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     

16 Jul, 2014

1 commit

  • The current "wait_on_bit" interface requires an 'action'
    function to be provided which does the actual waiting.
    There are over 20 such functions, many of them identical.
    Most cases can be satisfied by one of just two functions, one
    which uses io_schedule() and one which just uses schedule().

    So:
    Rename wait_on_bit and wait_on_bit_lock to
    wait_on_bit_action and wait_on_bit_lock_action
    to make it explicit that they need an action function.

    Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
    which are *not* given an action function but implicitly use
    a standard one.
    The decision to error-out if a signal is pending is now made
    based on the 'mode' argument rather than being encoded in the action
    function.

    All instances of the old wait_on_bit and wait_on_bit_lock which
    can use the new version have been changed accordingly and their
    action functions have been discarded.
    wait_on_bit{_lock} does not return any specific error code in the
    event of a signal so the caller must check for non-zero and
    interpolate their own error code as appropriate.

    The wait_on_bit() call in __fscache_wait_on_invalidate() was
    ambiguous as it specified TASK_UNINTERRUPTIBLE but used
    fscache_wait_bit_interruptible as an action function.
    David Howells confirms this should be uniformly
    "uninterruptible"

    The main remaining user of wait_on_bit{,_lock}_action is NFS
    which needs to use a freezer-aware schedule() call.

    A comment in fs/gfs2/glock.c notes that having multiple 'action'
    functions is useful as they display differently in the 'wchan'
    field of 'ps'. (and /proc/$PID/wchan).
    As the new bit_wait{,_io} functions are tagged "__sched", they
    will not show up at all, but something higher in the stack. So
    the distinction will still be visible, only with different
    function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
    gfs2/glock.c case).

    Since first version of this patch (against 3.15) two new action
    functions appeared, on in NFS and one in CIFS. CIFS also now
    uses an action function that makes the same freezer aware
    schedule call as NFS.

    Signed-off-by: NeilBrown
    Acked-by: David Howells (fscache, keys)
    Acked-by: Steven Whitehouse (gfs2)
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steve French
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
    Signed-off-by: Ingo Molnar

    NeilBrown
     

05 Jun, 2014

1 commit


28 Sep, 2013

2 commits

  • Provide the ability to enable and disable fscache cookies. A disabled cookie
    will reject or ignore further requests to:

    Acquire a child cookie
    Invalidate and update backing objects
    Check the consistency of a backing object
    Allocate storage for backing page
    Read backing pages
    Write to backing pages

    but still allows:

    Checks/waits on the completion of already in-progress objects
    Uncaching of pages
    Relinquishment of cookies

    Two new operations are provided:

    (1) Disable a cookie:

    void fscache_disable_cookie(struct fscache_cookie *cookie,
    bool invalidate);

    If the cookie is not already disabled, this locks the cookie against other
    dis/enablement ops, marks the cookie as being disabled, discards or
    invalidates any backing objects and waits for cessation of activity on any
    associated object.

    This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
    but it reinitialises the cookie such that it can be reenabled.

    All possible failures are handled internally. The caller should consider
    calling fscache_uncache_all_inode_pages() afterwards to make sure all page
    markings are cleared up.

    (2) Enable a cookie:

    void fscache_enable_cookie(struct fscache_cookie *cookie,
    bool (*can_enable)(void *data),
    void *data)

    If the cookie is not already enabled, this locks the cookie against other
    dis/enablement ops, invokes can_enable() and, if the cookie is not an
    index cookie, will begin the procedure of acquiring backing objects.

    The optional can_enable() function is passed the data argument and returns
    a ruling as to whether or not enablement should actually be permitted to
    begin.

    All possible failures are handled internally. The cookie will only be
    marked as enabled if provisional backing objects are allocated.

    A later patch will introduce these to NFS. Cookie enablement during nfs_open()
    is then contingent on i_writecount <dhowells@redhat.com

    David Howells
     
  • Add wrapper functions for dealing with cookie->n_active:

    (*) __fscache_use_cookie() to increment it.

    (*) __fscache_unuse_cookie() to decrement and test against zero.

    (*) __fscache_wake_unused_cookie() to wake up anyone waiting for it to reach
    zero.

    The second and third are split so that the third can be done after cookie->lock
    has been released in case the waiter wakes up whilst we're still holding it and
    tries to get it.

    We will need to wake-on-zero once the cookie disablement patch is applied
    because it will then be possible to see n_active become zero without the cookie
    being relinquished.

    Also move the cookie usement out of fscache_attr_changed_op() and into
    fscache_attr_changed() and the operation struct so that cookie disablement
    will be able to track it.

    Whilst we're at it, only increment n_active if we're about to do
    fscache_submit_op() so that we don't have to deal with undoing it if anything
    earlier fails. Possibly this should be moved into fscache_submit_op() which
    could look at FSCACHE_OP_UNUSE_COOKIE.

    Signed-off-by: David Howells

    David Howells
     

11 Sep, 2013

1 commit

  • __fscache_check_consistency() does not decrement the count of operations
    active after it finishes in the success case. This leads to a hung tasks on
    cookie de-registration (commonly in inode eviction).

    INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
    kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
    Workqueue: ceph-msgr con_work [libceph]
    ...
    Call Trace:
    [] ? _raw_spin_unlock_irqrestore+0x16/0x20
    [] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
    [] schedule+0x29/0x70
    [] fscache_wait_atomic_t+0xe/0x20 [fscache]
    [] out_of_line_wait_on_atomic_t+0x9f/0xe0
    [] ? autoremove_wake_function+0x40/0x40
    [] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
    [] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
    [] ceph_destroy_inode+0x33/0x200 [ceph]
    [] ? __fsnotify_inode_delete+0xe/0x10
    [] destroy_inode+0x3c/0x70
    [] evict+0x119/0x1b0

    Signed-off-by: Milosz Tanski
    Acked-by: David Howells
    Signed-off-by: Sage Weil

    Milosz Tanski
     

06 Sep, 2013

1 commit

  • Extend the fscache netfs API so that the netfs can ask as to whether a cache
    object is up to date with respect to its corresponding netfs object:

    int fscache_check_consistency(struct fscache_cookie *cookie)

    This will call back to the netfs to check whether the auxiliary data associated
    with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it
    may also return -ENOMEM and -ERESTARTSYS.

    The backends now have to implement a mandatory operation pointer:

    int (*check_consistency)(struct fscache_object *object)

    that corresponds to the above API call. FS-Cache takes care of pinning the
    object and the cookie in memory and managing this call with respect to the
    object state.

    Original-author: Hongyi Jia
    Signed-off-by: David Howells
    cc: Hongyi Jia
    cc: Milosz Tanski

    David Howells
     

19 Jun, 2013

3 commits

  • Simplify the way fscache cache objects retain their cookie. The way I
    implemented the cookie storage handling made synchronisation a pain (ie. the
    object state machine can't rely on the cookie actually still being there).

    Instead of the the object being detached from the cookie and the cookie being
    freed in __fscache_relinquish_cookie(), we defer both operations:

    (*) The detachment of the object from the list in the cookie now takes place
    in fscache_drop_object() and is thus governed by the object state machine
    (fscache_detach_from_cookie() has been removed).

    (*) The release of the cookie is now in fscache_object_destroy() - which is
    called by the cache backend just before it frees the object.

    This means that the fscache_cookie struct is now available to the cache all the
    way through from ->alloc_object() to ->drop_object() and ->put_object() -
    meaning that it's no longer necessary to take object->lock to guarantee access.

    However, __fscache_relinquish_cookie() doesn't wait for the object to go all
    the way through to destruction before letting the netfs proceed. That would
    massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the
    cookie around, in must therefore break all attachments to the netfs - which
    includes ->def, ->netfs_data and any outstanding page read/writes.

    To handle this, struct fscache_cookie now has an n_active counter:

    (1) This starts off initialised to 1.

    (2) Any time the cache needs to get at the netfs data, it calls
    fscache_use_cookie() to increment it - if it is not zero. If it was zero,
    then access is not permitted.

    (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
    to decrement it. This does a wake-up on it if it reaches 0.

    (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
    reach 0. The initialisation to 1 in step (1) ensures that we only get
    wake ups when we're trying to get rid of the cookie.

    This leaves __fscache_relinquish_cookie() a lot simpler.

    ***
    This fixes a problem in the current code whereby if fscache_invalidate() is
    followed sufficiently quickly by fscache_relinquish_cookie() then it is
    possible for __fscache_relinquish_cookie() to have detached the cookie from the
    object and cleared the pointer before a thread is dispatched to process the
    invalidation state in the object state machine.

    Since the pending write clearance was deferred to the invalidation state to
    make it asynchronous, we need to either wait in relinquishment for the stores
    tree to be cleared in the invalidation state or we need to handle the clearance
    in relinquishment.

    Further, if the relinquishment code does clear the tree, then the invalidation
    state need to make the clearance contingent on still having the cookie to hand
    (since that's where the tree is rooted) and we have to prevent the cookie from
    disappearing for the duration.

    This can lead to an oops like the following:

    BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
    ...
    RIP: 0010:[] _spin_lock+0xe/0x30
    ...
    CR2: 000000000000000c ...
    ...
    Process kslowd002 (...)
    ....
    Call Trace:
    [] fscache_invalidate_writes+0x38/0xd0 [fscache]
    [] ? __switch_to+0xd0/0x320
    [] ? find_busiest_queue+0x69/0x150
    [] ? slow_work_enqueue+0x104/0x180
    [] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
    [] ? bit_waitqueue+0x17/0xd0
    [] slow_work_execute+0x233/0x310
    [] slow_work_thread+0x205/0x360
    [] ? autoremove_wake_function+0x0/0x40
    [] ? slow_work_thread+0x0/0x360
    [] kthread+0x96/0xa0
    [] child_rip+0xa/0x20
    [] ? kthread+0x0/0xa0
    [] ? child_rip+0x0/0x20

    The parameter to fscache_invalidate_writes() was object->cookie which is NULL.

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Fix object state machine to have separate work and wait states as that makes
    it easier to envision.

    There are now three kinds of state:

    (1) Work state. This is an execution state. No event processing is performed
    by a work state. The function attached to a work state returns a pointer
    indicating the next state to which the OSM should transition. Returning
    NO_TRANSIT repeats the current state, but goes back to the scheduler
    first.

    (2) Wait state. This is an event processing state. No execution is
    performed by a wait state. Wait states are just tables of "if event X
    occurs, clear it and transition to state Y". The dispatcher returns to
    the scheduler if none of the events in which the wait state has an
    interest are currently pending.

    (3) Out-of-band state. This is a special work state. Transitions to normal
    states can be overridden when an unexpected event occurs (eg. I/O error).
    Instead the dispatcher disables and clears the OOB event and transits to
    the specified work state. This then acts as an ordinary work state,
    though object->state points to the overridden destination. Returning
    NO_TRANSIT resumes the overridden transition.

    In addition, the states have names in their definitions, so there's no need for
    tables of state names. Further, the EV_REQUEUE event is no longer necessary as
    that is automatic for work states.

    Since the states are now separate structs rather than values in an enum, it's
    not possible to use comparisons other than (non-)equality between them, so use
    some object->flags to indicate what phase an object is in.

    The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
    (EV_KILL). An object flag now carries the information about retirement.

    Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
    into an KILL_OBJECT state and additional states have been added for handling
    waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

    A state has also been added for synchronising with parent object initialisation
    (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Wrap checks on object state (mostly outside of fs/fscache/object.c) with
    inline functions so that the mechanism can be replaced.

    Some of the state checks within object.c are left as-is as they will be
    replaced.

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

21 Dec, 2012

3 commits

  • Provide a proper invalidation method rather than relying on the netfs retiring
    the cookie it has and getting a new one. The problem with this is that isn't
    easy for the netfs to make sure that it has completed/cancelled all its
    outstanding storage and retrieval operations on the cookie it is retiring.

    Instead, have the cache provide an invalidation method that will cancel or wait
    for all currently outstanding operations before invalidating the cache, and
    will cause new operations to queue up behind that. Whilst invalidation is in
    progress, some requests will be rejected until the cache can stack a barrier on
    the operation queue to cause new operations to be deferred behind it.

    Signed-off-by: David Howells

    David Howells
     
  • Make fscache_relinquish_cookie() log a warning and wait if there are any
    outstanding reads left on the cookie it was given.

    Signed-off-by: David Howells

    David Howells
     
  • Check that the netfs isn't trying to relinquish a cookie that still has read
    operations in progress upon it. If there are, then give log a warning and BUG.

    Signed-off-by: David Howells

    David Howells
     

20 Nov, 2009

6 commits

  • Add a stat counter to count retirement events rather than ordinary release
    events (the retire argument to fscache_relinquish_cookie()).

    Signed-off-by: David Howells

    David Howells
     
  • FS-Cache has two structs internally for keeping track of the internal state of
    a cached file: the fscache_cookie struct, which represents the netfs's state,
    and fscache_object struct, which represents the cache's state. Each has a
    pointer that points to the other (when both are in existence), and each has a
    spinlock for pointer maintenance.

    Since netfs operations approach these structures from the cookie side, they get
    the cookie lock first, then the object lock. Cache operations, on the other
    hand, approach from the object side, and get the object lock first. It is not
    then permitted for a cache operation to get the cookie lock whilst it is
    holding the object lock lest deadlock occur; instead, it must do one of two
    things:

    (1) increment the cookie usage counter, drop the object lock and then get both
    locks in order, or

    (2) simply hold the object lock as certain parts of the cookie may not be
    altered whilst the object lock is held.

    It is also not permitted to follow either pointer without holding the lock at
    the end you start with. To break the pointers between the cookie and the
    object, both locks must be held.

    fscache_write_op(), however, violates the locking rules: It attempts to get the
    cookie lock without (a) checking that the cookie pointer is a valid pointer,
    and (b) holding the object lock to protect the cookie pointer whilst it follows
    it. This is so that it can access the pending page store tree without
    interference from __fscache_write_page().

    This is fixed by splitting the cookie lock, such that the page store tracking
    tree is protected by its own lock, and checking that the cookie pointer is
    non-NULL before we attempt to follow it whilst holding the object lock.

    The new lock is subordinate to both the cookie lock and the object lock, and so
    should be taken after those.

    Signed-off-by: David Howells

    David Howells
     
  • __fscache_write_page() attempts to load the radix tree preallocation pool for
    the CPU it is on before calling radix_tree_insert(), as the insertion must be
    done inside a pair of spinlocks.

    Use of the preallocation pool, however, is contingent on the radix tree being
    initialised without __GFP_WAIT specified. __fscache_acquire_cookie() was
    passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.

    The solution is to AND out __GFP_WAIT.

    Additionally, the banner comment to radix_tree_preload() is altered to make
    note of this prerequisite. Possibly there should be a WARN_ON() too.

    Without this fix, I have seen the following recursive deadlock caused by
    radix_tree_insert() attempting to allocate memory inside the spinlocked
    region, which resulted in FS-Cache being called back into to release memory -
    which required the spinlock already held.

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.32-rc6-cachefs #24
    ---------------------------------------------
    nfsiod/7916 is trying to acquire lock:
    (&cookie->lock){+.+.-.}, at: [] __fscache_uncache_page+0xdb/0x160 [fscache]

    but task is already holding lock:
    (&cookie->lock){+.+.-.}, at: [] __fscache_write_page+0x15c/0x3f3 [fscache]

    other info that might help us debug this:
    5 locks held by nfsiod/7916:
    #0: (nfsiod){+.+.+.}, at: [] worker_thread+0x19a/0x2e2
    #1: (&task->u.tk_work#2){+.+.+.}, at: [] worker_thread+0x19a/0x2e2
    #2: (&cookie->lock){+.+.-.}, at: [] __fscache_write_page+0x15c/0x3f3 [fscache]
    #3: (&object->lock#2){+.+.-.}, at: [] __fscache_write_page+0x197/0x3f3 [fscache]
    #4: (&cookie->stores_lock){+.+...}, at: [] __fscache_write_page+0x19f/0x3f3 [fscache]

    stack backtrace:
    Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
    Call Trace:
    [] __lock_acquire+0x1649/0x16e3
    [] ? __lock_acquire+0x7b7/0x16e3
    [] ? dump_trace+0x248/0x257
    [] lock_acquire+0x57/0x6d
    [] ? __fscache_uncache_page+0xdb/0x160 [fscache]
    [] _spin_lock+0x2c/0x3b
    [] ? __fscache_uncache_page+0xdb/0x160 [fscache]
    [] __fscache_uncache_page+0xdb/0x160 [fscache]
    [] ? __fscache_check_page_write+0x0/0x71 [fscache]
    [] nfs_fscache_release_page+0x86/0xc4 [nfs]
    [] nfs_release_page+0x3c/0x41 [nfs]
    [] try_to_release_page+0x32/0x3b
    [] shrink_page_list+0x316/0x4ac
    [] ? mark_held_locks+0x52/0x70
    [] ? _spin_unlock_irq+0x2b/0x31
    [] shrink_inactive_list+0x392/0x67c
    [] ? mark_held_locks+0x52/0x70
    [] shrink_list+0x8d/0x8f
    [] shrink_zone+0x278/0x33c
    [] ? ktime_get_ts+0xad/0xba
    [] try_to_free_pages+0x22e/0x392
    [] ? isolate_pages_global+0x0/0x212
    [] __alloc_pages_nodemask+0x3dc/0x5cf
    [] cache_alloc_refill+0x34d/0x6c1
    [] ? radix_tree_node_alloc+0x52/0x5c
    [] kmem_cache_alloc+0xb2/0x118
    [] radix_tree_node_alloc+0x52/0x5c
    [] radix_tree_insert+0x57/0x19c
    [] __fscache_write_page+0x1e3/0x3f3 [fscache]
    [] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
    [] nfs_readpage_release+0x34/0x9b [nfs]
    [] nfs_readpage_release_full+0x32/0x4b [nfs]
    [] rpc_release_calldata+0x12/0x14 [sunrpc]
    [] rpc_free_task+0x59/0x61 [sunrpc]
    [] rpc_async_release+0x10/0x12 [sunrpc]
    [] worker_thread+0x1ef/0x2e2
    [] ? worker_thread+0x19a/0x2e2
    [] ? thread_return+0x3e/0x101
    [] ? rpc_async_release+0x0/0x12 [sunrpc]
    [] ? autoremove_wake_function+0x0/0x34
    [] ? trace_hardirqs_on+0xd/0xf
    [] ? worker_thread+0x0/0x2e2
    [] kthread+0x7a/0x82
    [] child_rip+0xa/0x20
    [] ? restore_args+0x0/0x30
    [] ? add_wait_queue+0x15/0x44
    [] ? kthread+0x0/0x82
    [] ? child_rip+0x0/0x20

    Signed-off-by: David Howells

    David Howells
     
  • Clear the pointers from the fscache_cookie struct to netfs private data after
    clearing the pointer to the cookie from the fscache_object struct and
    releasing the object lock, rather than before.

    This allows the netfs private data pointers to be relied on simply by holding
    the object lock, rather than having to hold the cookie lock. This is makes
    things simpler as the cookie lock has to be taken before the object lock, but
    sometimes the object pointer is all that the code has.

    Signed-off-by: David Howells

    David Howells
     
  • Count entries to and exits from cache operation table functions. Maintain
    these as a single counter that's added to or removed from as appropriate.

    Signed-off-by: David Howells

    David Howells
     
  • Allow the current state of all fscache objects to be dumped by doing:

    cat /proc/fs/fscache/objects

    By default, all objects and all fields will be shown. This can be restricted
    by adding a suitable key to one of the caller's keyrings (such as the session
    keyring):

    keyctl add user fscache:objlist "" @s

    The are:

    K Show hexdump of object key (don't show if not given)
    A Show hexdump of object aux data (don't show if not given)

    And paired restrictions:

    C Show objects that have a cookie
    c Show objects that don't have a cookie
    B Show objects that are busy
    b Show objects that aren't busy
    W Show objects that have pending writes
    w Show objects that don't have pending writes
    R Show objects that have outstanding reads
    r Show objects that don't have outstanding reads
    S Show objects that have slow work queued
    s Show objects that don't have slow work queued

    If neither side of a restriction pair is given, then both are implied. For
    example:

    keyctl add user fscache:objlist KB @s

    shows objects that are busy, and lists their object keys, but does not dump
    their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
    not implied.

    Signed-off-by: David Howells

    David Howells
     

03 Apr, 2009

2 commits

  • Implement the cookie management part of the FS-Cache netfs client API. The
    documentation and API header file were added in a previous patch.

    This patch implements the following three functions:

    (1) fscache_acquire_cookie().

    Acquire a cookie to represent an object to the netfs. If the object in
    question is a non-index object, then that object and its parent indices
    will be created on disk at this point if they don't already exist. Index
    creation is deferred because an index may reside in multiple caches.

    (2) fscache_relinquish_cookie().

    Retire or release a cookie previously acquired. At this point, the
    object on disk may be destroyed.

    (3) fscache_update_cookie().

    Update the in-cache representation of a cookie. This is used to update
    the auxiliary data for coherency management purposes.

    With this patch it is possible to have a netfs instruct a cache backend to
    look up, validate and create metadata on disk and to destroy it again.
    The ability to actually store and retrieve data in the objects so created is
    added in later patches.

    Note that these functions will never return an error. _All_ errors are
    handled internally to FS-Cache.

    The worst that can happen is that fscache_acquire_cookie() may return a NULL
    pointer - which is considered a negative cookie pointer and can be passed back
    to any function that takes a cookie without harm. A negative cookie pointer
    merely suppresses caching at that level.

    The stub in linux/fscache.h will detect inline the negative cookie pointer and
    abort the operation as fast as possible. This means that the compiler doesn't
    have to set up for a call in that case.

    See the documentation in Documentation/filesystems/caching/netfs-api.txt for
    more information.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • Provide a slab from which can be allocated the FS-Cache cookies that will be
    presented to the netfs.

    Also provide a slab constructor and a function to recursively discard a cookie
    and its ancestor chain.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells