05 May, 2020

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 Jul, 2018

1 commit

  • Alter the state-check assertion in fscache_enqueue_operation() to allow
    cancelled operations to be given processing time so they can be cleaned up.

    Also fix a debugging statement that was requiring such operations to have
    an object assigned.

    Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
    Reported-by: Kiran Kumar Modukuri
    Signed-off-by: David Howells

    Kiran Kumar Modukuri
     

04 Apr, 2018

2 commits

  • Attach copies of the index key and auxiliary data to the fscache cookie so
    that:

    (1) The callbacks to the netfs for this stuff can be eliminated. This
    can simplify things in the cache as the information is still
    available, even after the cache has relinquished the cookie.

    (2) Simplifies the locking requirements of accessing the information as we
    don't have to worry about the netfs object going away on us.

    (3) The cache can do lazy updating of the coherency information on disk.
    As long as the cache is flushed before reboot/poweroff, there's no
    need to update the coherency info on disk every time it changes.

    (4) Cookies can be hashed or put in a tree as the index key is easily
    available. This allows:

    (a) Checks for duplicate cookies can be made at the top fscache layer
    rather than down in the bowels of the cache backend.

    (b) Caching can be added to a netfs object that has a cookie if the
    cache is brought online after the netfs object is allocated.

    A certain amount of space is made in the cookie for inline copies of the
    data, but if it won't fit there, extra memory will be allocated for it.

    The downside of this is that live cache operation requires more memory.

    Signed-off-by: David Howells
    Acked-by: Anna Schumaker
    Tested-by: Steve Dickson

    David Howells
     
  • Add more tracepoints to fscache, including:

    (*) fscache_page - Tracks netfs pages known to fscache.

    (*) fscache_check_page - Tracks the netfs querying whether a page is
    pending storage.

    (*) fscache_wake_cookie - Tracks cookies being woken up after a page
    completes/aborts storage in the cache.

    (*) fscache_op - Tracks operations being initialised.

    (*) fscache_wrote_page - Tracks return of the backend write_page op.

    (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write
    operation.

    Signed-off-by: David Howells

    David Howells
     

02 Apr, 2015

10 commits

  • Now that the retrieval operation may be disposed of by fscache_put_operation()
    before we actually set the context, the retrieval-specific cleanup operation
    can produce a NULL-pointer dereference when it tries to unconditionally clean
    up the netfs context.

    Given that it is expected that we'll get at least as far as the place where we
    currently set the context pointer and it is unlikely we'll go through the
    error handling paths prior to that point, retain the context right from the
    point that the retrieval op is allocated.

    Concomitant to this, we need to retain the cookie pointer in the retrieval op
    also so that we can call the netfs to release its context in the release
    method.

    In addition, we might now get into fscache_release_retrieval_op() with the op
    only initialised. To this end, set the operation to DEAD only after the
    release method has been called and skip the n_pages test upon cleanup if the
    op is still in the INITIALISED state.

    Without these changes, the following oops might be seen:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
    ...
    RIP: 0010:[] fscache_release_retrieval_op+0xae/0x100
    ...
    Call Trace:
    [] fscache_put_operation+0x117/0x2e0
    [] __fscache_read_or_alloc_pages+0x351/0x3ac
    [] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
    [] nfs_readpages+0x10c/0x185 [nfs]
    [] ? alloc_pages_current+0x119/0x13e
    [] ? __page_cache_alloc+0xfb/0x10a
    [] __do_page_cache_readahead+0x188/0x22c
    [] ondemand_readahead+0x29e/0x2af
    [] page_cache_sync_readahead+0x38/0x3a
    [] generic_file_read_iter+0x1a2/0x55a
    [] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
    [] nfs_file_read+0x49/0x70 [nfs]
    [] new_sync_read+0x78/0x9c
    [] __vfs_read+0x13/0x38
    [] vfs_read+0x95/0x121
    [] SyS_read+0x4c/0x8a
    [] system_call_fastpath+0x12/0x17

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Any time an incomplete operation is cancelled, the operation cancellation
    function needs to be called to clean up. This is currently being passed
    directly to some of the functions that might want to call it, but not all.

    Instead, pass the cancellation method pointer to the fscache_operation_init()
    and have that cache it in the operation struct. Further, plug in a dummy
    cancellation handler if the caller declines to set one as this allows us to
    call the function unconditionally (the extra overhead isn't worth bothering
    about as we don't expect to be calling this typically).

    The cancellation method must thence be called everywhere the CANCELLED state
    is set. Note that we call it *before* setting the CANCELLED state such that
    the method can use the old state value to guide its operation.

    fscache_do_cancel_retrieval() needs moving higher up in the sources so that
    the init function can use it now.

    Without this, the following oops may be seen:

    FS-Cache: Assertion failed
    FS-Cache: 3 == 0 is false
    ------------[ cut here ]------------
    kernel BUG at ../fs/fscache/page.c:261!
    ...
    RIP: 0010:[] fscache_release_retrieval_op+0x77/0x100
    [] fscache_put_operation+0x114/0x2da
    [] __fscache_read_or_alloc_pages+0x358/0x3b3
    [] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
    [] nfs_readpages+0x10c/0x185 [nfs]
    [] ? alloc_pages_current+0x119/0x13e
    [] ? __page_cache_alloc+0xfb/0x10a
    [] __do_page_cache_readahead+0x188/0x22c
    [] ondemand_readahead+0x29e/0x2af
    [] page_cache_sync_readahead+0x38/0x3a
    [] generic_file_read_iter+0x1a2/0x55a
    [] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
    [] nfs_file_read+0x49/0x70 [nfs]
    [] new_sync_read+0x78/0x9c
    [] __vfs_read+0x13/0x38
    [] vfs_read+0x95/0x121
    [] SyS_read+0x4c/0x8a
    [] system_call_fastpath+0x12/0x17

    The assertion is showing that the remaining number of pages (n_pages) is not 0
    when the operation is being released.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Call fscache_put_operation() or a wrapper on any op that has gone through
    fscache_operation_init() so that the accounting shown in /proc is done
    correctly, specifically fscache_n_op_release.

    fscache_put_operation() therefore now allows an op in the INITIALISED state as
    well as in the CANCELLED and COMPLETE states.

    Note that this means that an operation can get put that doesn't have its
    ->object pointer filled in, so anything that depends on the object needs to be
    conditional in fscache_put_operation().

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Cancellation of an in-progress operation needs to update the relevant counters
    and start any operations that are pending waiting on this one.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Count and display through /proc/fs/fscache/stats the number of initialised
    operations.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Out of line fscache_operation_init() so that it can access internal FS-Cache
    features, such as stats, in a later commit.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Currently, fscache_cancel_op() only cancels pending operations - attempts to
    cancel in-progress operations are ignored. This leads to a problem in
    fscache_wait_for_operation_activation() whereby the wait is terminated, but
    the object has been killed.

    The check at the end of the function now triggers because it's no longer
    contingent on the cache having produced an I/O error since the commit that
    fixed the logic error in fscache_object_is_dead().

    The result of the check is that it tries to cancel the operation - but since
    the object may not be pending by this point, the cancellation request may be
    ignored - with the result that the the object is just put by the caller and
    fscache_put_operation has an assertion failure because the operation isn't in
    either the COMPLETE or the CANCELLED states.

    To fix this, we permit in-progress ops to be cancelled under some
    circumstances.

    The bug results in an oops that looks something like this:

    FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3]
    FS-Cache:
    FS-Cache: Assertion failed
    FS-Cache: 3 == 5 is false
    ------------[ cut here ]------------
    kernel BUG at ../fs/fscache/operation.c:432!
    ...
    RIP: 0010:[] fscache_put_operation+0xf2/0x2cd
    Call Trace:
    [] __fscache_read_or_alloc_pages+0x2ec/0x3b3
    [] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
    [] nfs_readpages+0x10c/0x185 [nfs]
    [] ? alloc_pages_current+0x119/0x13e
    [] ? __page_cache_alloc+0xfb/0x10a
    [] __do_page_cache_readahead+0x188/0x22c
    [] ondemand_readahead+0x29e/0x2af
    [] page_cache_sync_readahead+0x38/0x3a
    [] generic_file_read_iter+0x1a2/0x55a
    [] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
    [] nfs_file_read+0x49/0x70 [nfs]
    [] new_sync_read+0x78/0x9c
    [] __vfs_read+0x13/0x38
    [] vfs_read+0x95/0x121
    [] SyS_read+0x4c/0x8a
    [] system_call_fastpath+0x12/0x17

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Reject new operations that are being submitted against an object if that
    object has failed its lookup or creation states or has been killed by the
    cache backend for some other reason, such as having been culled.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • When submitting an operation, prefer to cancel the operation immediately
    rather than queuing it for later processing if the object is marked as dying
    (ie. the object state machine has reached the KILL_OBJECT state).

    Whilst we're at it, change the series of related test_bit() calls into a
    READ_ONCE() and bitwise-AND operators to reduce the number of load
    instructions (test_bit() has a volatile address).

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     
  • Move fscache_report_unexpected_submission() up within operation.c so that it
    can be called from fscache_submit_exclusive_op() too.

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Acked-by: Jeff Layton

    David Howells
     

05 Jun, 2014

1 commit


19 Jun, 2013

4 commits

  • Under certain circumstances, spin_is_locked() is hardwired to 0 - even when the
    code would normally be in a locked section where it should return 1. This
    means it cannot be used for an assertion that checks that a spinlock is locked.

    Remove such usages from FS-Cache.

    The following oops might otherwise be observed:

    FS-Cache: Assertion failed
    BUG: failure at fs/fscache/operation.c:270/fscache_start_operations()!
    Kernel panic - not syncing: BUG!
    CPU: 0 PID: 10 Comm: kworker/u2:1 Not tainted 3.10.0-rc1-00133-ge7ebb75 #2
    Workqueue: fscache_operation fscache_op_work_func [fscache]
    7f091c48 603c8947 7f090000 7f9b1361 7f25f080 00000001 7f26d440 7f091c90
    60299eb8 7f091d90 602951c5 7f26d440 3000000008 7f091da0 7f091cc0 7f091cd0
    00000007 00000007 00000006 7f091ae0 00000010 0000010e 7f9af330 7f091ae0
    Call Trace:
    7f091c88: [] dump_stack+0x17/0x19
    7f091c98: [] panic+0xf4/0x1e9
    7f091d38: [] set_signals+0x1e/0x40
    7f091d58: [] __wake_up+0x4e/0x70
    7f091d98: [] fscache_start_operations+0x43/0x50 [fscache]
    7f091da8: [] fscache_op_complete+0x1d3/0x220 [fscache]
    7f091db8: [] unlock_page+0x55/0x60
    7f091de8: [] cachefiles_read_copier+0x250/0x330 [cachefiles]
    7f091e58: [] fscache_op_work_func+0xac/0x120 [fscache]
    7f091e88: [] process_one_work+0x250/0x3a0
    7f091ef8: [] worker_thread+0x177/0x2a0
    7f091f38: [] worker_thread+0x0/0x2a0
    7f091f58: [] kthread+0xd8/0xe0
    7f091f68: [] finish_task_switch.isra.64+0x37/0xa0
    7f091fd8: [] new_thread_handler+0x8f/0xb0

    Reported-by: Milosz Tanski
    Signed-off-by: David Howells
    Reviewed-and-tested-By: Milosz Tanski

    David Howells
     
  • Simplify the way fscache cache objects retain their cookie. The way I
    implemented the cookie storage handling made synchronisation a pain (ie. the
    object state machine can't rely on the cookie actually still being there).

    Instead of the the object being detached from the cookie and the cookie being
    freed in __fscache_relinquish_cookie(), we defer both operations:

    (*) The detachment of the object from the list in the cookie now takes place
    in fscache_drop_object() and is thus governed by the object state machine
    (fscache_detach_from_cookie() has been removed).

    (*) The release of the cookie is now in fscache_object_destroy() - which is
    called by the cache backend just before it frees the object.

    This means that the fscache_cookie struct is now available to the cache all the
    way through from ->alloc_object() to ->drop_object() and ->put_object() -
    meaning that it's no longer necessary to take object->lock to guarantee access.

    However, __fscache_relinquish_cookie() doesn't wait for the object to go all
    the way through to destruction before letting the netfs proceed. That would
    massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the
    cookie around, in must therefore break all attachments to the netfs - which
    includes ->def, ->netfs_data and any outstanding page read/writes.

    To handle this, struct fscache_cookie now has an n_active counter:

    (1) This starts off initialised to 1.

    (2) Any time the cache needs to get at the netfs data, it calls
    fscache_use_cookie() to increment it - if it is not zero. If it was zero,
    then access is not permitted.

    (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
    to decrement it. This does a wake-up on it if it reaches 0.

    (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
    reach 0. The initialisation to 1 in step (1) ensures that we only get
    wake ups when we're trying to get rid of the cookie.

    This leaves __fscache_relinquish_cookie() a lot simpler.

    ***
    This fixes a problem in the current code whereby if fscache_invalidate() is
    followed sufficiently quickly by fscache_relinquish_cookie() then it is
    possible for __fscache_relinquish_cookie() to have detached the cookie from the
    object and cleared the pointer before a thread is dispatched to process the
    invalidation state in the object state machine.

    Since the pending write clearance was deferred to the invalidation state to
    make it asynchronous, we need to either wait in relinquishment for the stores
    tree to be cleared in the invalidation state or we need to handle the clearance
    in relinquishment.

    Further, if the relinquishment code does clear the tree, then the invalidation
    state need to make the clearance contingent on still having the cookie to hand
    (since that's where the tree is rooted) and we have to prevent the cookie from
    disappearing for the duration.

    This can lead to an oops like the following:

    BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
    ...
    RIP: 0010:[] _spin_lock+0xe/0x30
    ...
    CR2: 000000000000000c ...
    ...
    Process kslowd002 (...)
    ....
    Call Trace:
    [] fscache_invalidate_writes+0x38/0xd0 [fscache]
    [] ? __switch_to+0xd0/0x320
    [] ? find_busiest_queue+0x69/0x150
    [] ? slow_work_enqueue+0x104/0x180
    [] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
    [] ? bit_waitqueue+0x17/0xd0
    [] slow_work_execute+0x233/0x310
    [] slow_work_thread+0x205/0x360
    [] ? autoremove_wake_function+0x0/0x40
    [] ? slow_work_thread+0x0/0x360
    [] kthread+0x96/0xa0
    [] child_rip+0xa/0x20
    [] ? kthread+0x0/0xa0
    [] ? child_rip+0x0/0x20

    The parameter to fscache_invalidate_writes() was object->cookie which is NULL.

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Fix object state machine to have separate work and wait states as that makes
    it easier to envision.

    There are now three kinds of state:

    (1) Work state. This is an execution state. No event processing is performed
    by a work state. The function attached to a work state returns a pointer
    indicating the next state to which the OSM should transition. Returning
    NO_TRANSIT repeats the current state, but goes back to the scheduler
    first.

    (2) Wait state. This is an event processing state. No execution is
    performed by a wait state. Wait states are just tables of "if event X
    occurs, clear it and transition to state Y". The dispatcher returns to
    the scheduler if none of the events in which the wait state has an
    interest are currently pending.

    (3) Out-of-band state. This is a special work state. Transitions to normal
    states can be overridden when an unexpected event occurs (eg. I/O error).
    Instead the dispatcher disables and clears the OOB event and transits to
    the specified work state. This then acts as an ordinary work state,
    though object->state points to the overridden destination. Returning
    NO_TRANSIT resumes the overridden transition.

    In addition, the states have names in their definitions, so there's no need for
    tables of state names. Further, the EV_REQUEUE event is no longer necessary as
    that is automatic for work states.

    Since the states are now separate structs rather than values in an enum, it's
    not possible to use comparisons other than (non-)equality between them, so use
    some object->flags to indicate what phase an object is in.

    The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
    (EV_KILL). An object flag now carries the information about retirement.

    Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
    into an KILL_OBJECT state and additional states have been added for handling
    waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

    A state has also been added for synchronising with parent object initialisation
    (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     
  • Wrap checks on object state (mostly outside of fs/fscache/object.c) with
    inline functions so that the mechanism can be replaced.

    Some of the state checks within object.c are left as-is as they will be
    replaced.

    Signed-off-by: David Howells
    Tested-By: Milosz Tanski
    Acked-by: Jeff Layton

    David Howells
     

21 Dec, 2012

6 commits

  • Provide fscache_cancel_op() with a pointer to a function it should invoke under
    lock if it cancels an operation.

    Use this to clear the remaining page count upon cancellation of a pending
    retrieval operation so that fscache_release_retrieval_op() doesn't get an
    assertion failure (see below). This can happen when a signal occurs, say from
    CTRL-C being pressed during data retrieval.

    FS-Cache: Assertion failed
    3 == 0 is false
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/page.c:237!
    invalid opcode: 0000 [#641] SMP
    Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
    CPU 0
    Pid: 6075, comm: slurp-q Tainted: GF D 3.7.0-rc8-fsdevel+ #411 /DG965RY
    RIP: 0010:[] [] fscache_release_retrieval_op+0x75/0xff [fscache]
    RSP: 0000:ffff88001c6d7988 EFLAGS: 00010296
    RAX: 000000000000000f RBX: ffff880014cdfe00 RCX: ffffffff6c102000
    RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
    RBP: ffff88001c6d7998 R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffe00
    R13: ffff88001c6d7ab4 R14: ffff88001a8638a0 R15: ffff88001552b190
    FS: 00007f877aaf0700(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007fff11378fd2 CR3: 000000001c6c6000 CR4: 00000000000007f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process slurp-q (pid: 6075, threadinfo ffff88001c6d6000, task ffff88001c6c4080)
    Stack:
    ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
    ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
    ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
    Call Trace:
    [] ? fscache_cancel_op+0x194/0x1cf [fscache]
    [] fscache_put_operation+0x135/0x2ed [fscache]
    [] ? fscache_cancel_op+0x194/0x1cf [fscache]
    [] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
    [] ? __alloc_pages_nodemask+0x195/0x75c
    [] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
    [] nfs_readpages+0x186/0x1bd [nfs]
    [] ? alloc_pages_current+0xc7/0xe4
    [] ? __page_cache_alloc+0x84/0x91
    [] ? __do_page_cache_readahead+0xa6/0x2e0
    [] __do_page_cache_readahead+0x237/0x2e0
    [] ? __do_page_cache_readahead+0xa6/0x2e0
    [] ra_submit+0x1c/0x20
    [] ondemand_readahead+0x359/0x382
    [] page_cache_sync_readahead+0x38/0x3a
    [] generic_file_aio_read+0x26b/0x637
    [] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
    [] nfs_file_read+0xaa/0xcf [nfs]
    [] do_sync_read+0x91/0xd1
    [] vfs_read+0x9b/0x144
    [] sys_read+0x44/0x75
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Howells

    David Howells
     
  • Mark as cancelled an operation that is in progress rather than pending at the
    time it is cancelled, and call fscache_complete_op() to cancel an operation so
    that blocked ops can be started.

    Signed-off-by: David Howells

    David Howells
     
  • The function to submit an exclusive op (fscache_submit_exclusive_op()) can BUG
    if there's been an I/O error because it may see the parent cache object in an
    unexpected state. It should only BUG if there hasn't been an I/O error.

    In this case the problem was produced by remounting the cache partition to be
    R/O. The EROFS state was detected and the cache was aborted, but not
    everything handled the aborting correctly.

    SysRq : Emergency Remount R/O
    EXT4-fs (sda6): re-mounted. Opts: (null)
    Emergency Remount complete
    CacheFiles: I/O Error: Failed to update xattr with error -30
    FS-Cache: Cache cachefiles stopped due to I/O error
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/operation.c:128!
    invalid opcode: 0000 [#1] SMP
    CPU 0
    Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

    Pid: 6612, comm: kworker/u:2 Not tainted 3.1.0-rc8-fsdevel+ #1093 /DG965RY
    RIP: 0010:[] [] fscache_submit_exclusive_op+0x2ad/0x2c2 [fscache]
    RSP: 0018:ffff880000853d40 EFLAGS: 00010206
    RAX: ffff880038ac72a8 RBX: ffff8800181f2260 RCX: ffffffff81f2b2b0
    RDX: 0000000000000001 RSI: ffffffff8179a478 RDI: ffff8800181f2280
    RBP: ffff880000853d60 R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: ffff880038ac7268
    R13: ffff8800181f2280 R14: ffff88003a359190 R15: 000000010122b162
    FS: 0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000034cc4a77f0 CR3: 0000000010e96000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kworker/u:2 (pid: 6612, threadinfo ffff880000852000, task ffff880014c3c040)
    Stack:
    ffff8800181f2260 ffff8800181f2310 ffff880038ac7268 ffff8800181f2260
    ffff880000853dc0 ffffffffa0072375 ffff880037ecfe00 ffff88003a359198
    ffff880000853dc0 0000000000000246 0000000000000000 ffff88000a91d308
    Call Trace:
    [] fscache_object_work_func+0x792/0xe65 [fscache]
    [] process_one_work+0x1eb/0x37f
    [] ? process_one_work+0x18d/0x37f
    [] ? fscache_enqueue_dependents+0xd8/0xd8 [fscache]
    [] worker_thread+0x15a/0x21a
    [] ? rescuer_thread+0x188/0x188
    [] kthread+0x7f/0x87
    [] kernel_thread_helper+0x4/0x10
    [] ? finish_task_switch+0x45/0xc0
    [] ? retint_restore_args+0xe/0xe
    [] ? __init_kthread_worker+0x53/0x53
    [] ? gs_change+0xb/0xb

    Signed-off-by: David Howells

    David Howells
     
  • Provide a proper invalidation method rather than relying on the netfs retiring
    the cookie it has and getting a new one. The problem with this is that isn't
    easy for the netfs to make sure that it has completed/cancelled all its
    outstanding storage and retrieval operations on the cookie it is retiring.

    Instead, have the cache provide an invalidation method that will cancel or wait
    for all currently outstanding operations before invalidating the cache, and
    will cause new operations to queue up behind that. Whilst invalidation is in
    progress, some requests will be rejected until the cache can stack a barrier on
    the operation queue to cause new operations to be deferred behind it.

    Signed-off-by: David Howells

    David Howells
     
  • Fix the state management of internal fscache operations and the accounting of
    what operations are in what states.

    This is done by:

    (1) Give struct fscache_operation a enum variable that directly represents the
    state it's currently in, rather than spreading this knowledge over a bunch
    of flags, who's processing the operation at the moment and whether it is
    queued or not.

    This makes it easier to write assertions to check the state at various
    points and to prevent invalid state transitions.

    (2) Add an 'operation complete' state and supply a function to indicate the
    completion of an operation (fscache_op_complete()) and make things call
    it. The final call to fscache_put_operation() can then check that an op
    in the appropriate state (complete or cancelled).

    (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
    govern the state of an object:

    (a) The ->n_ops is now the number of extant operations on the object
    and is now decremented by fscache_put_operation() only.

    (b) The ->n_in_progress is simply the number of objects that have been
    taken off of the object's pending queue for the purposes of being
    run. This is decremented by fscache_op_complete() only.

    (c) The ->n_exclusive is the number of exclusive ops that have been
    submitted and queued or are in progress. It is decremented by
    fscache_op_complete() and by fscache_cancel_op().

    fscache_put_operation() and fscache_operation_gc() now no longer try to
    clean up ->n_exclusive and ->n_in_progress. That was leading to double
    decrements against fscache_cancel_op().

    fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
    double decrements against fscache_put_operation().

    fscache_submit_exclusive_op() now decides whether it has to queue an op
    based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
    will persist in being true even after all preceding operations have been
    cancelled or completed. Furthermore, if an object is active and there are
    runnable ops against it, there must be at least one op running.

    (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
    provide a function to record completion of the pages as they complete.

    When n_pages reaches 0, the operation is deemed to be complete and
    fscache_op_complete() is called.

    Add calls to fscache_retrieval_complete() anywhere we've finished with a
    page we've been given to read or allocate for. This includes places where
    we just return pages to the netfs for reading from the server and where
    accessing the cache fails and we discard the proposed netfs page.

    The bugs in the unfixed state management manifest themselves as oopses like the
    following where the operation completion gets out of sync with return of the
    cookie by the netfs. This is possible because the cache unlocks and returns
    all the netfs pages before recording its completion - which means that there's
    nothing to stop the netfs discarding them and returning the cookie.

    FS-Cache: Cookie 'NFS.fh' still has outstanding reads
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/cookie.c:519!
    invalid opcode: 0000 [#1] SMP
    CPU 1
    Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

    Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
    RIP: 0010:[] [] __fscache_relinquish_cookie+0x170/0x343 [fscache]
    RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282
    RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
    RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
    RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
    R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
    R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
    FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
    Stack:
    ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
    ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
    ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
    Call Trace:
    [] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
    [] nfs_clear_inode+0x3c/0x41 [nfs]
    [] nfs4_evict_inode+0x2f/0x33 [nfs]
    [] evict+0xa1/0x15c
    [] dispose_list+0x2c/0x38
    [] prune_icache_sb+0x28c/0x29b
    [] prune_super+0xd5/0x140
    [] shrink_slab+0x102/0x1ab
    [] balance_pgdat+0x2f2/0x595
    [] ? process_timeout+0xb/0xb
    [] kswapd+0x270/0x289
    [] ? __init_waitqueue_head+0x46/0x46
    [] ? balance_pgdat+0x595/0x595
    [] kthread+0x7f/0x87
    [] kernel_thread_helper+0x4/0x10
    [] ? finish_task_switch+0x45/0xc0
    [] ? retint_restore_args+0xe/0xe
    [] ? __init_kthread_worker+0x53/0x53
    [] ? gs_change+0xb/0xb

    Signed-off-by: David Howells

    David Howells
     
  • Make fscache_relinquish_cookie() log a warning and wait if there are any
    outstanding reads left on the cookie it was given.

    Signed-off-by: David Howells

    David Howells
     

25 May, 2011

1 commit


15 Jan, 2011

1 commit

  • fscache_submit_exclusive_op() adds an operation to the pending list if
    other operations are pending. Fix the check for pending ops as n_ops
    must be greater than 0 at the point it is checked as it is incremented
    immediately before under lock.

    Signed-off-by: Akshat Aranya
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Akshat Aranya
     

23 Jul, 2010

1 commit

  • Make fscache operation to use only workqueue instead of combination of
    workqueue and slow-work. FSCACHE_OP_SLOW is dropped and
    FSCACHE_OP_FAST is renamed to FSCACHE_OP_ASYNC and uses newly added
    fscache_op_wq workqueue to execute op->processor().
    fscache_operation_init_slow() is dropped and fscache_operation_init()
    now takes @processor argument directly.

    * Unbound workqueue is used.

    * fscache_retrieval_work() is no longer necessary as OP_ASYNC now does
    the equivalent thing.

    * sysctl fscache.operation_max_active added to control concurrency.
    The default value is nr_cpus clamped between 2 and
    WQ_UNBOUND_MAX_ACTIVE.

    * debugfs support is dropped for now. Tracing API based debug
    facility is planned to be added.

    Signed-off-by: Tejun Heo
    Acked-by: David Howells

    Tejun Heo
     

30 Mar, 2010

2 commits

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     
  • CONFIG_SLOW_WORK_PROC was changed to CONFIG_SLOW_WORK_DEBUG, but not in all
    instances. Change the remaining instances. This makes the debugfs file
    display the time mark and the owner's description again.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

20 Nov, 2009

5 commits

  • FS-Cache doesn't correctly handle the netfs requesting a read from the cache
    on an object that failed or was withdrawn by the cache. A trace similar to
    the following might be seen:

    CacheFiles: Lookup failed error -105
    [exe ] unexpected submission OP165afe [OBJ6cac OBJECT_LC_DYING]
    [exe ] objstate=OBJECT_LC_DYING [OBJECT_LC_DYING]
    [exe ] objflags=0
    [exe ] objevent=9 [fffffffffffffffb]
    [exe ] ops=0 inp=0 exc=0
    Pid: 6970, comm: exe Not tainted 2.6.32-rc6-cachefs #50
    Call Trace:
    [] fscache_submit_op+0x3ff/0x45a [fscache]
    [] __fscache_read_or_alloc_pages+0x187/0x3c4 [fscache]
    [] ? nfs_readpage_from_fscache_complete+0x0/0x66 [nfs]
    [] __nfs_readpages_from_fscache+0x7e/0x176 [nfs]
    [] ? __alloc_pages_nodemask+0x11c/0x5cf
    [] nfs_readpages+0x114/0x1d7 [nfs]
    [] __do_page_cache_readahead+0x15f/0x1ec
    [] ? __do_page_cache_readahead+0x73/0x1ec
    [] ra_submit+0x1c/0x20
    [] ondemand_readahead+0x227/0x23a
    [] page_cache_sync_readahead+0x17/0x19
    [] generic_file_aio_read+0x236/0x5a0
    [] nfs_file_read+0xe4/0xf3 [nfs]
    [] do_sync_read+0xe3/0x120
    [] ? _spin_unlock_irq+0x2b/0x31
    [] ? autoremove_wake_function+0x0/0x34
    [] ? selinux_file_permission+0x5d/0x10f
    [] ? thread_return+0x3e/0x101
    [] ? security_file_permission+0x11/0x13
    [] vfs_read+0xaa/0x16f
    [] ? trace_hardirqs_on_caller+0x10c/0x130
    [] sys_read+0x45/0x6c
    [] system_call_fastpath+0x16/0x1b

    The object state might also be OBJECT_DYING or OBJECT_WITHDRAWING.

    This should be handled by simply rejecting the new operation with ENOBUFS.
    There's no need to log an error for it. Events of this type now appear in the
    stats file under Ops:rej.

    Signed-off-by: David Howells

    David Howells
     
  • Permit the operations to retrieve data from the cache or to allocate space in
    the cache for future writes to be interrupted whilst they're waiting for
    permission for the operation to proceed. Typically this wait occurs whilst the
    cache object is being looked up on disk in the background.

    If an interruption occurs, and the operation has not yet been given the
    go-ahead to run, the operation is dequeued and cancelled, and control returns
    to the read operation of the netfs routine with none of the requested pages
    having been read or in any way marked as known by the cache.

    This means that the initial wait is done interruptibly rather than
    uninterruptibly.

    In addition, extra stats values are made available to show the number of ops
    cancelled and the number of cache space allocations interrupted.

    Signed-off-by: David Howells

    David Howells
     
  • Allow the current state of all fscache objects to be dumped by doing:

    cat /proc/fs/fscache/objects

    By default, all objects and all fields will be shown. This can be restricted
    by adding a suitable key to one of the caller's keyrings (such as the session
    keyring):

    keyctl add user fscache:objlist "" @s

    The are:

    K Show hexdump of object key (don't show if not given)
    A Show hexdump of object aux data (don't show if not given)

    And paired restrictions:

    C Show objects that have a cookie
    c Show objects that don't have a cookie
    B Show objects that are busy
    b Show objects that aren't busy
    W Show objects that have pending writes
    w Show objects that don't have pending writes
    R Show objects that have outstanding reads
    r Show objects that don't have outstanding reads
    S Show objects that have slow work queued
    s Show objects that don't have slow work queued

    If neither side of a restriction pair is given, then both are implied. For
    example:

    keyctl add user fscache:objlist KB @s

    shows objects that are busy, and lists their object keys, but does not dump
    their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
    not implied.

    Signed-off-by: David Howells

    David Howells
     
  • Annotate slow-work runqueue proc lines for FS-Cache work items. Objects
    include the object ID and the state. Operations include the object ID, the
    operation ID and the operation type and state.

    Signed-off-by: David Howells

    David Howells
     
  • Wait for outstanding slow work items belonging to a module to clear when
    unregistering that module as a user of the facility. This prevents the put_ref
    code of a work item from being taken away before it returns.

    Signed-off-by: David Howells

    David Howells
     

03 Apr, 2009

1 commit

  • Add and document asynchronous operation handling for use by FS-Cache's data
    storage and retrieval routines.

    The following documentation is added to:

    Documentation/filesystems/caching/operations.txt

    ================================
    ASYNCHRONOUS OPERATIONS HANDLING
    ================================

    ========
    OVERVIEW
    ========

    FS-Cache has an asynchronous operations handling facility that it uses for its
    data storage and retrieval routines. Its operations are represented by
    fscache_operation structs, though these are usually embedded into some other
    structure.

    This facility is available to and expected to be be used by the cache backends,
    and FS-Cache will create operations and pass them off to the appropriate cache
    backend for completion.

    To make use of this facility, should be #included.

    ===============================
    OPERATION RECORD INITIALISATION
    ===============================

    An operation is recorded in an fscache_operation struct:

    struct fscache_operation {
    union {
    struct work_struct fast_work;
    struct slow_work slow_work;
    };
    unsigned long flags;
    fscache_operation_processor_t processor;
    ...
    };

    Someone wanting to issue an operation should allocate something with this
    struct embedded in it. They should initialise it by calling:

    void fscache_operation_init(struct fscache_operation *op,
    fscache_operation_release_t release);

    with the operation to be initialised and the release function to use.

    The op->flags parameter should be set to indicate the CPU time provision and
    the exclusivity (see the Parameters section).

    The op->fast_work, op->slow_work and op->processor flags should be set as
    appropriate for the CPU time provision (see the Parameters section).

    FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
    operation and waited for afterwards.

    ==========
    PARAMETERS
    ==========

    There are a number of parameters that can be set in the operation record's flag
    parameter. There are three options for the provision of CPU time in these
    operations:

    (1) The operation may be done synchronously (FSCACHE_OP_MYTHREAD). A thread
    may decide it wants to handle an operation itself without deferring it to
    another thread.

    This is, for example, used in read operations for calling readpages() on
    the backing filesystem in CacheFiles. Although readpages() does an
    asynchronous data fetch, the determination of whether pages exist is done
    synchronously - and the netfs does not proceed until this has been
    determined.

    If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
    before submitting the operation, and the operating thread must wait for it
    to be cleared before proceeding:

    wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
    fscache_wait_bit, TASK_UNINTERRUPTIBLE);

    (2) The operation may be fast asynchronous (FSCACHE_OP_FAST), in which case it
    will be given to keventd to process. Such an operation is not permitted
    to sleep on I/O.

    This is, for example, used by CacheFiles to copy data from a backing fs
    page to a netfs page after the backing fs has read the page in.

    If this option is used, op->fast_work and op->processor must be
    initialised before submitting the operation:

    INIT_WORK(&op->fast_work, do_some_work);

    (3) The operation may be slow asynchronous (FSCACHE_OP_SLOW), in which case it
    will be given to the slow work facility to process. Such an operation is
    permitted to sleep on I/O.

    This is, for example, used by FS-Cache to handle background writes of
    pages that have just been fetched from a remote server.

    If this option is used, op->slow_work and op->processor must be
    initialised before submitting the operation:

    fscache_operation_init_slow(op, processor)

    Furthermore, operations may be one of two types:

    (1) Exclusive (FSCACHE_OP_EXCLUSIVE). Operations of this type may not run in
    conjunction with any other operation on the object being operated upon.

    An example of this is the attribute change operation, in which the file
    being written to may need truncation.

    (2) Shareable. Operations of this type may be running simultaneously. It's
    up to the operation implementation to prevent interference between other
    operations running at the same time.

    =========
    PROCEDURE
    =========

    Operations are used through the following procedure:

    (1) The submitting thread must allocate the operation and initialise it
    itself. Normally this would be part of a more specific structure with the
    generic op embedded within.

    (2) The submitting thread must then submit the operation for processing using
    one of the following two functions:

    int fscache_submit_op(struct fscache_object *object,
    struct fscache_operation *op);

    int fscache_submit_exclusive_op(struct fscache_object *object,
    struct fscache_operation *op);

    The first function should be used to submit non-exclusive ops and the
    second to submit exclusive ones. The caller must still set the
    FSCACHE_OP_EXCLUSIVE flag.

    If successful, both functions will assign the operation to the specified
    object and return 0. -ENOBUFS will be returned if the object specified is
    permanently unavailable.

    The operation manager will defer operations on an object that is still
    undergoing lookup or creation. The operation will also be deferred if an
    operation of conflicting exclusivity is in progress on the object.

    If the operation is asynchronous, the manager will retain a reference to
    it, so the caller should put their reference to it by passing it to:

    void fscache_put_operation(struct fscache_operation *op);

    (3) If the submitting thread wants to do the work itself, and has marked the
    operation with FSCACHE_OP_MYTHREAD, then it should monitor
    FSCACHE_OP_WAITING as described above and check the state of the object if
    necessary (the object might have died whilst the thread was waiting).

    When it has finished doing its processing, it should call
    fscache_put_operation() on it.

    (4) The operation holds an effective lock upon the object, preventing other
    exclusive ops conflicting until it is released. The operation can be
    enqueued for further immediate asynchronous processing by adjusting the
    CPU time provisioning option if necessary, eg:

    op->flags &= ~FSCACHE_OP_TYPE;
    op->flags |= ~FSCACHE_OP_FAST;

    and calling:

    void fscache_enqueue_operation(struct fscache_operation *op)

    This can be used to allow other things to have use of the worker thread
    pools.

    =====================
    ASYNCHRONOUS CALLBACK
    =====================

    When used in asynchronous mode, the worker thread pool will invoke the
    processor method with a pointer to the operation. This should then get at the
    container struct by using container_of():

    static void fscache_write_op(struct fscache_operation *_op)
    {
    struct fscache_storage *op =
    container_of(_op, struct fscache_storage, op);
    ...
    }

    The caller holds a reference on the operation, and will invoke
    fscache_put_operation() when the processor function returns. The processor
    function is at liberty to call fscache_enqueue_operation() or to take extra
    references.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells