16 Jul, 2014

1 commit

  • The current "wait_on_bit" interface requires an 'action'
    function to be provided which does the actual waiting.
    There are over 20 such functions, many of them identical.
    Most cases can be satisfied by one of just two functions, one
    which uses io_schedule() and one which just uses schedule().

    So:
    Rename wait_on_bit and wait_on_bit_lock to
    wait_on_bit_action and wait_on_bit_lock_action
    to make it explicit that they need an action function.

    Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
    which are *not* given an action function but implicitly use
    a standard one.
    The decision to error-out if a signal is pending is now made
    based on the 'mode' argument rather than being encoded in the action
    function.

    All instances of the old wait_on_bit and wait_on_bit_lock which
    can use the new version have been changed accordingly and their
    action functions have been discarded.
    wait_on_bit{_lock} does not return any specific error code in the
    event of a signal so the caller must check for non-zero and
    interpolate their own error code as appropriate.

    The wait_on_bit() call in __fscache_wait_on_invalidate() was
    ambiguous as it specified TASK_UNINTERRUPTIBLE but used
    fscache_wait_bit_interruptible as an action function.
    David Howells confirms this should be uniformly
    "uninterruptible"

    The main remaining user of wait_on_bit{,_lock}_action is NFS
    which needs to use a freezer-aware schedule() call.

    A comment in fs/gfs2/glock.c notes that having multiple 'action'
    functions is useful as they display differently in the 'wchan'
    field of 'ps'. (and /proc/$PID/wchan).
    As the new bit_wait{,_io} functions are tagged "__sched", they
    will not show up at all, but something higher in the stack. So
    the distinction will still be visible, only with different
    function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
    gfs2/glock.c case).

    Since first version of this patch (against 3.15) two new action
    functions appeared, on in NFS and one in CIFS. CIFS also now
    uses an action function that makes the same freezer aware
    schedule call as NFS.

    Signed-off-by: NeilBrown
    Acked-by: David Howells (fscache, keys)
    Acked-by: Steven Whitehouse (gfs2)
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steve French
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
    Signed-off-by: Ingo Molnar

    NeilBrown
     

14 Nov, 2013

1 commit

  • Key pointers stored in the keyring are marked in bit 1 to indicate if they
    point to a keyring. We need to strip off this bit before using the pointer
    when iterating over the keyring for the purpose of looking for links to garbage
    collect.

    This means that expirable keyrings aren't correctly expiring because the
    checker is seeing their key pointer with 2 added to it.

    Since the fix for this involves knowing about the internals of the keyring,
    key_gc_keyring() is moved to keyring.c and merged into keyring_gc().

    This can be tested by:

    echo 2 >/proc/sys/kernel/keys/gc_delay
    keyctl timeout `keyctl add keyring qwerty "" @s` 2
    cat /proc/keys
    sleep 5; cat /proc/keys

    which should see a keyring called "qwerty" appear in the session keyring and
    then disappear after it expires, and:

    echo 2 >/proc/sys/kernel/keys/gc_delay
    a=`keyctl get_persistent @s`
    b=`keyctl add keyring 0 "" $a`
    keyctl add user a a $b
    keyctl timeout $b 2
    cat /proc/keys
    sleep 5; cat /proc/keys

    which should see a keyring called "0" with a key called "a" in it appear in the
    user's persistent keyring (which will be attached to the session keyring) and
    then both the "0" keyring and the "a" key should disappear when the "0" keyring
    expires.

    Signed-off-by: David Howells
    Acked-by: Simo Sorce

    David Howells
     

24 Sep, 2013

1 commit

  • Expand the capacity of a keyring to be able to hold a lot more keys by using
    the previously added associative array implementation. Currently the maximum
    capacity is:

    (PAGE_SIZE - sizeof(header)) / sizeof(struct key *)

    which, on a 64-bit system, is a little more 500. However, since this is being
    used for the NFS uid mapper, we need more than that. The new implementation
    gives us effectively unlimited capacity.

    With some alterations, the keyutils testsuite runs successfully to completion
    after this patch is applied. The alterations are because (a) keyrings that
    are simply added to no longer appear ordered and (b) some of the errors have
    changed a bit.

    Signed-off-by: David Howells

    David Howells
     

21 Aug, 2012

1 commit

  • system_nrt[_freezable]_wq are now spurious. Mark them deprecated and
    convert all users to system[_freezable]_wq.

    If you're cc'd and wondering what's going on: Now all workqueues are
    non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
    Please use system[_freezable]_wq instead.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-By: Lai Jiangshan

    Cc: Jens Axboe
    Cc: David Airlie
    Cc: Jiri Kosina
    Cc: "David S. Miller"
    Cc: Rusty Russell
    Cc: "Paul E. McKenney"
    Cc: David Howells

    Tejun Heo
     

11 May, 2012

3 commits

  • Add support for invalidating a key - which renders it immediately invisible to
    further searches and causes the garbage collector to immediately wake up,
    remove it from keyrings and then destroy it when it's no longer referenced.

    It's better not to do this with keyctl_revoke() as that marks the key to start
    returning -EKEYREVOKED to searches when what is actually desired is to have the
    key refetched.

    To invalidate a key the caller must be granted SEARCH permission by the key.
    This may be too strict. It may be better to also permit invalidation if the
    caller has any of READ, WRITE or SETATTR permission.

    The primary use for this is to evict keys that are cached in special keyrings,
    such as the DNS resolver or an ID mapper.

    Signed-off-by: David Howells

    David Howells
     
  • Make use of the previous patch that makes the garbage collector perform RCU
    synchronisation before destroying defunct keys. Key pointers can now be
    replaced in-place without creating a new keyring payload and replacing the
    whole thing as the discarded keys will not be destroyed until all currently
    held RCU read locks are released.

    If the keyring payload space needs to be expanded or contracted, then a
    replacement will still need allocating, and the original will still have to be
    freed by RCU.

    Signed-off-by: David Howells

    David Howells
     
  • Make the keys garbage collector invoke synchronize_rcu() prior to destroying
    keys with a zero usage count. This means that a key can be examined under the
    RCU read lock in the safe knowledge that it won't get deallocated until after
    the lock is released - even if its usage count becomes zero whilst we're
    looking at it.

    This is useful in keyring search vs key link. Consider a keyring containing a
    link to a key. That link can be replaced in-place in the keyring without
    requiring an RCU copy-and-replace on the keyring contents without breaking a
    search underway on that keyring when the displaced key is released, provided
    the key is actually destroyed only after the RCU read lock held by the search
    algorithm is released.

    This permits __key_link() to replace a key without having to reallocate the key
    payload. A key gets replaced if a new key being linked into a keyring has the
    same type and description.

    Signed-off-by: David Howells
    Acked-by: Jeff Layton

    David Howells
     

18 Jan, 2012

1 commit

  • Add missing smp_rmb() primitives to the keyring search code.

    When keyring payloads are appended to without replacement (thus using up spare
    slots in the key pointer array), an smp_wmb() is issued between the pointer
    assignment and the increment of the key count (nkeys).

    There should be corresponding read barriers between the read of nkeys and
    dereferences of keys[n] when n is dependent on the value of nkeys.

    Signed-off-by: David Howells
    Reviewed-by: Paul E. McKenney
    Signed-off-by: James Morris

    David Howells
     

23 Aug, 2011

3 commits

  • unregister_key_type() has code to mark a key as dead and make it unavailable in
    one loop and then destroy all those unavailable key payloads in the next loop.
    However, the loop to mark keys dead renders the key undetectable to the second
    loop by changing the key type pointer also.

    Fix this by the following means:

    (1) The key code has two garbage collectors: one deletes unreferenced keys and
    the other alters keyrings to delete links to old dead, revoked and expired
    keys. They can end up holding each other up as both want to scan the key
    serial tree under spinlock. Combine these into a single routine.

    (2) Move the dead key marking, dead link removal and dead key removal into the
    garbage collector as a three phase process running over the three cycles
    of the normal garbage collection procedure. This is tracked by the
    KEY_GC_REAPING_DEAD_1, _2 and _3 state flags.

    unregister_key_type() then just unlinks the key type from the list, wakes
    up the garbage collector and waits for the third phase to complete.

    (3) Downgrade the key types sem in unregister_key_type() once it has deleted
    the key type from the list so that it doesn't block the keyctl() syscall.

    (4) Dead keys that cannot be simply removed in the third phase have their
    payloads destroyed with the key's semaphore write-locked to prevent
    interference by the keyctl() syscall. There should be no in-kernel users
    of dead keys of that type by the point of unregistration, though keyctl()
    may be holding a reference.

    (5) Only perform timer recalculation in the GC if the timer actually expired.
    If it didn't, we'll get another cycle when it goes off - and if the key
    that actually triggered it has been removed, it's not a problem.

    (6) Only garbage collect link if the timer expired or if we're doing dead key
    clean up phase 2.

    (7) As only key_garbage_collector() is permitted to use rb_erase() on the key
    serial tree, it doesn't need to revalidate its cursor after dropping the
    spinlock as the node the cursor points to must still exist in the tree.

    (8) Drop the spinlock in the GC if there is contention on it or if we need to
    reschedule. After dealing with that, get the spinlock again and resume
    scanning.

    This has been tested in the following ways:

    (1) Run the keyutils testsuite against it.

    (2) Using the AF_RXRPC and RxKAD modules to test keytype removal:

    Load the rxrpc_s key type:

    # insmod /tmp/af-rxrpc.ko
    # insmod /tmp/rxkad.ko

    Create a key (http://people.redhat.com/~dhowells/rxrpc/listen.c):

    # /tmp/listen &
    [1] 8173

    Find the key:

    # grep rxrpc_s /proc/keys
    091086e1 I--Q-- 1 perm 39390000 0 0 rxrpc_s 52:2

    Link it to a session keyring, preferably one with a higher serial number:

    # keyctl link 0x20e36251 @s

    Kill the process (the key should remain as it's linked to another place):

    # fg
    /tmp/listen
    ^C

    Remove the key type:

    rmmod rxkad
    rmmod af-rxrpc

    This can be made a more effective test by altering the following part of
    the patch:

    if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2)) {
    /* Make sure everyone revalidates their keys if we marked a
    * bunch as being dead and make sure all keyring ex-payloads
    * are destroyed.
    */
    kdebug("dead sync");
    synchronize_rcu();

    To call synchronize_rcu() in GC phase 1 instead. That causes that the
    keyring's old payload content to hang around longer until it's RCU
    destroyed - which usually happens after GC phase 3 is complete. This
    allows the destroy_dead_key branch to be tested.

    Reported-by: Benjamin Coddington
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • The dead key link reaper should be non-reentrant as it relies on global state
    to keep track of where it's got to when it returns to the work queue manager to
    give it some air.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Move the unreferenced key reaper function to the keys garbage collector file
    as that's a more appropriate place with the dead key link reaper.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

22 Jan, 2011

1 commit


05 May, 2010

1 commit

  • key_gc_keyring() needs to either hold the RCU read lock or hold the keyring
    semaphore if it's going to scan the keyring's list. Given that it only needs
    to read the key list, and it's doing so under a spinlock, the RCU read lock is
    the thing to use.

    Furthermore, the RCU check added in e7b0a61b7929632d36cf052d9e2820ef0a9c1bfe is
    incorrect as holding the spinlock on key_serial_lock is not grounds for
    assuming a keyring's pointer list can be read safely. Instead, a simple
    rcu_dereference() inside of the previously mentioned RCU read lock is what we
    want.

    Reported-by: Serge E. Hallyn
    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Acked-by: "Paul E. McKenney"
    Signed-off-by: James Morris

    David Howells
     

25 Feb, 2010

1 commit

  • Apply lockdep-ified RCU primitives to key_gc_keyring() and
    keyring_destroy().

    Cc: David Howells
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

24 Sep, 2009

1 commit

  • The key garbage collector sets a timer to start a new collection cycle at the
    point the earliest key to expire should be considered garbage. However, it
    currently only does this if the key it is considering hasn't yet expired.

    If the key being considering has expired, but hasn't yet reached the collection
    time then it is ignored, and won't be collected until some other key provokes a
    round of collection.

    Make the garbage collector set the timer for the earliest key that hasn't yet
    passed its collection time, rather than the earliest key that hasn't yet
    expired.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

15 Sep, 2009

1 commit

  • Fix a number of problems with the new key garbage collector:

    (1) A rogue semicolon in keyring_gc() was causing the initial count of dead
    keys to be miscalculated.

    (2) A missing return in keyring_gc() meant that under certain circumstances,
    the keyring semaphore would be unlocked twice.

    (3) The key serial tree iterator (key_garbage_collector()) part of the garbage
    collector has been modified to:

    (a) Complete each scan of the keyrings before setting the new timer.

    (b) Only set the new timer for keys that have yet to expire. This means
    that the new timer is now calculated correctly, and the gc doesn't
    get into a loop continually scanning for keys that have expired, and
    preventing other things from happening, like RCU cleaning up the old
    keyring contents.

    (c) Perform an extra scan if any keys were garbage collected in this one
    as a key might become garbage during a scan, and (b) could mean we
    don't set the timer again.

    (4) Made key_schedule_gc() take the time at which to do a collection run,
    rather than the time at which the key expires. This means the collection
    of dead keys (key type unregistered) can happen immediately.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

02 Sep, 2009

2 commits

  • Add a keyctl to install a process's session keyring onto its parent. This
    replaces the parent's session keyring. Because the COW credential code does
    not permit one process to change another process's credentials directly, the
    change is deferred until userspace next starts executing again. Normally this
    will be after a wait*() syscall.

    To support this, three new security hooks have been provided:
    cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
    the blank security creds and key_session_to_parent() - which asks the LSM if
    the process may replace its parent's session keyring.

    The replacement may only happen if the process has the same ownership details
    as its parent, and the process has LINK permission on the session keyring, and
    the session keyring is owned by the process, and the LSM permits it.

    Note that this requires alteration to each architecture's notify_resume path.
    This has been done for all arches barring blackfin, m68k* and xtensa, all of
    which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the
    replacement to be performed at the point the parent process resumes userspace
    execution.

    This allows the userspace AFS pioctl emulation to fully emulate newpag() and
    the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
    alter the parent process's PAG membership. However, since kAFS doesn't use
    PAGs per se, but rather dumps the keys into the session keyring, the session
    keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
    the newpag flag.

    This can be tested with the following program:

    #include
    #include
    #include

    #define KEYCTL_SESSION_TO_PARENT 18

    #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)

    int main(int argc, char **argv)
    {
    key_serial_t keyring, key;
    long ret;

    keyring = keyctl_join_session_keyring(argv[1]);
    OSERROR(keyring, "keyctl_join_session_keyring");

    key = add_key("user", "a", "b", 1, keyring);
    OSERROR(key, "add_key");

    ret = keyctl(KEYCTL_SESSION_TO_PARENT);
    OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");

    return 0;
    }

    Compiled and linked with -lkeyutils, you should see something like:

    [dhowells@andromeda ~]$ keyctl show
    Session Keyring
    -3 --alswrv 4043 4043 keyring: _ses
    355907932 --alswrv 4043 -1 \_ keyring: _uid.4043
    [dhowells@andromeda ~]$ /tmp/newpag
    [dhowells@andromeda ~]$ keyctl show
    Session Keyring
    -3 --alswrv 4043 4043 keyring: _ses
    1055658746 --alswrv 4043 4043 \_ user: a
    [dhowells@andromeda ~]$ /tmp/newpag hello
    [dhowells@andromeda ~]$ keyctl show
    Session Keyring
    -3 --alswrv 4043 4043 keyring: hello
    340417692 --alswrv 4043 4043 \_ user: a

    Where the test program creates a new session keyring, sticks a user key named
    'a' into it and then installs it on its parent.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Add garbage collection for dead, revoked and expired keys. This involved
    erasing all links to such keys from keyrings that point to them. At that
    point, the key will be deleted in the normal manner.

    Keyrings from which garbage collection occurs are shrunk and their quota
    consumption reduced as appropriate.

    Dead keys (for which the key type has been removed) will be garbage collected
    immediately.

    Revoked and expired keys will hang around for a number of seconds, as set in
    /proc/sys/kernel/keys/gc_delay before being automatically removed. The default
    is 5 minutes.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells