02 Dec, 2013

1 commit

  • The second word of key->payload does not get initialised in key_alloc(), but
    the big_key type is relying on it having been cleared. The problem comes when
    big_key fails to instantiate a large key and doesn't then set the payload. The
    big_key_destroy() op is called from the garbage collector and this assumes that
    the dentry pointer stored in the second word will be NULL if instantiation did
    not complete.

    Therefore just pre-clear the entire struct key on allocation rather than trying
    to be clever and only initialising to 0 only those bits that aren't otherwise
    initialised.

    The lack of initialisation can lead to a bug report like the following if
    big_key failed to initialise its file:

    general protection fault: 0000 [#1] SMP
    Modules linked in: ...
    CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.10.0-53.el7.x86_64 #1
    Hardware name: Dell Inc. PowerEdge 1955/0HC513, BIOS 1.4.4 12/09/2008
    Workqueue: events key_garbage_collector
    task: ffff8801294f5680 ti: ffff8801296e2000 task.ti: ffff8801296e2000
    RIP: 0010:[] dput+0x21/0x2d0
    ...
    Call Trace:
    [] path_put+0x16/0x30
    [] big_key_destroy+0x44/0x60
    [] key_gc_unused_keys.constprop.2+0x5b/0xe0
    [] key_garbage_collector+0x1df/0x3c0
    [] process_one_work+0x17b/0x460
    [] worker_thread+0x11b/0x400
    [] ? rescuer_thread+0x3e0/0x3e0
    [] kthread+0xc0/0xd0
    [] ? kthread_create_on_node+0x110/0x110
    [] ret_from_fork+0x7c/0xb0
    [] ? kthread_create_on_node+0x110/0x110

    Reported-by: Patrik Kis
    Signed-off-by: David Howells
    Reviewed-by: Stephen Gallagher

    David Howells
     

30 Oct, 2013

1 commit

  • key_reject_and_link() marking a key as negative and setting the error with
    which it was negated races with keyring searches and other things that read
    that error.

    The fix is to switch the order in which the assignments are done in
    key_reject_and_link() and to use memory barriers.

    Kudos to Dave Wysochanski and Scott Mayhew
    for tracking this down.

    This may be the cause of:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
    IP: [] wait_for_key_construction+0x31/0x80
    PGD c6b2c3067 PUD c59879067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
    CPU 0
    Modules linked in: ...

    Pid: 13359, comm: amqzxma0 Not tainted 2.6.32-358.20.1.el6.x86_64 #1 IBM System x3650 M3 -[7945PSJ]-/00J6159
    RIP: 0010:[] wait_for_key_construction+0x31/0x80
    RSP: 0018:ffff880c6ab33758 EFLAGS: 00010246
    RAX: ffffffff81219080 RBX: 0000000000000000 RCX: 0000000000000002
    RDX: ffffffff81219060 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff880c6ab33768 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff880adfcbce40
    R13: ffffffffa03afb84 R14: ffff880adfcbce40 R15: ffff880adfcbce43
    FS: 00007f29b8042700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000070 CR3: 0000000c613dc000 CR4: 00000000000007f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process amqzxma0 (pid: 13359, threadinfo ffff880c6ab32000, task ffff880c610deae0)
    Stack:
    ffff880adfcbce40 0000000000000000 ffff880c6ab337b8 ffffffff81219695
    0000000000000000 ffff880a000000d0 ffff880c6ab337a8 000000000000000f
    ffffffffa03afb93 000000000000000f ffff88186c7882c0 0000000000000014
    Call Trace:
    [] request_key+0x65/0xa0
    [] nfs_idmap_request_key+0xc5/0x170 [nfs]
    [] nfs_idmap_lookup_id+0x34/0x80 [nfs]
    [] nfs_map_group_to_gid+0x75/0xa0 [nfs]
    [] decode_getfattr_attrs+0xbdd/0xfb0 [nfs]
    [] ? __dequeue_entity+0x30/0x50
    [] ? __switch_to+0x26e/0x320
    [] decode_getfattr+0x83/0xe0 [nfs]
    [] ? nfs4_xdr_dec_getattr+0x0/0xa0 [nfs]
    [] nfs4_xdr_dec_getattr+0x8f/0xa0 [nfs]
    [] rpcauth_unwrap_resp+0x84/0xb0 [sunrpc]
    [] ? nfs4_xdr_dec_getattr+0x0/0xa0 [nfs]
    [] call_decode+0x1b3/0x800 [sunrpc]
    [] ? wake_bit_function+0x0/0x50
    [] ? call_decode+0x0/0x800 [sunrpc]
    [] __rpc_execute+0x77/0x350 [sunrpc]
    [] ? bit_waitqueue+0x17/0xd0
    [] rpc_execute+0x61/0xa0 [sunrpc]
    [] rpc_run_task+0x75/0x90 [sunrpc]
    [] rpc_call_sync+0x42/0x70 [sunrpc]
    [] _nfs4_call_sync+0x30/0x40 [nfs]
    [] _nfs4_proc_getattr+0xac/0xc0 [nfs]
    [] ? futex_wait+0x227/0x380
    [] nfs4_proc_getattr+0x56/0x80 [nfs]
    [] __nfs_revalidate_inode+0xe3/0x220 [nfs]
    [] nfs_revalidate_mapping+0x4e/0x170 [nfs]
    [] nfs_file_read+0x77/0x130 [nfs]
    [] do_sync_read+0xfa/0x140
    [] ? autoremove_wake_function+0x0/0x40
    [] ? apic_timer_interrupt+0xe/0x20
    [] ? common_interrupt+0xe/0x13
    [] ? selinux_file_permission+0xfb/0x150
    [] ? security_file_permission+0x16/0x20
    [] vfs_read+0xb5/0x1a0
    [] sys_read+0x51/0x90
    [] ? __audit_syscall_exit+0x265/0x290
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Howells
    cc: Dave Wysochanski
    cc: Scott Mayhew

    David Howells
     

26 Sep, 2013

1 commit

  • Add KEY_FLAG_TRUSTED to indicate that a key either comes from a trusted source
    or had a cryptographic signature chain that led back to a trusted key the
    kernel already possessed.

    Add KEY_FLAGS_TRUSTED_ONLY to indicate that a keyring will only accept links to
    keys marked with KEY_FLAGS_TRUSTED.

    Signed-off-by: David Howells
    Reviewed-by: Kees Cook

    David Howells
     

24 Sep, 2013

4 commits

  • Expand the capacity of a keyring to be able to hold a lot more keys by using
    the previously added associative array implementation. Currently the maximum
    capacity is:

    (PAGE_SIZE - sizeof(header)) / sizeof(struct key *)

    which, on a 64-bit system, is a little more 500. However, since this is being
    used for the NFS uid mapper, we need more than that. The new implementation
    gives us effectively unlimited capacity.

    With some alterations, the keyutils testsuite runs successfully to completion
    after this patch is applied. The alterations are because (a) keyrings that
    are simply added to no longer appear ordered and (b) some of the errors have
    changed a bit.

    Signed-off-by: David Howells

    David Howells
     
  • Drop the permissions argument from __keyring_search_one() as the only caller
    passes 0 here - which causes all checks to be skipped.

    Signed-off-by: David Howells

    David Howells
     
  • Define a __key_get() wrapper to use rather than atomic_inc() on the key usage
    count as this makes it easier to hook in refcount error debugging.

    Signed-off-by: David Howells

    David Howells
     
  • Consolidate the concept of an 'index key' for accessing keys. The index key
    is the search term needed to find a key directly - basically the key type and
    the key description. We can add to that the description length.

    This will be useful when turning a keyring into an associative array rather
    than just a pointer block.

    Signed-off-by: David Howells

    David Howells
     

17 Dec, 2012

1 commit

  • Pull security subsystem updates from James Morris:
    "A quiet cycle for the security subsystem with just a few maintenance
    updates."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    Smack: create a sysfs mount point for smackfs
    Smack: use select not depends in Kconfig
    Yama: remove locking from delete path
    Yama: add RCU to drop read locking
    drivers/char/tpm: remove tasklet and cleanup
    KEYS: Use keyring_alloc() to create special keyrings
    KEYS: Reduce initial permissions on keys
    KEYS: Make the session and process keyrings per-thread
    seccomp: Make syscall skipping and nr changes more consistent
    key: Fix resource leak
    keys: Fix unreachable code
    KEYS: Add payload preparsing opportunity prior to key instantiate or update

    Linus Torvalds
     

15 Oct, 2012

1 commit

  • Pull module signing support from Rusty Russell:
    "module signing is the highlight, but it's an all-over David Howells frenzy..."

    Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

    * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
    X.509: Fix indefinite length element skip error handling
    X.509: Convert some printk calls to pr_devel
    asymmetric keys: fix printk format warning
    MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
    MODSIGN: Make mrproper should remove generated files.
    MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
    MODSIGN: Use the same digest for the autogen key sig as for the module sig
    MODSIGN: Sign modules during the build process
    MODSIGN: Provide a script for generating a key ID from an X.509 cert
    MODSIGN: Implement module signature checking
    MODSIGN: Provide module signing public keys to the kernel
    MODSIGN: Automatically generate module signing keys if missing
    MODSIGN: Provide Kconfig options
    MODSIGN: Provide gitignore and make clean rules for extra files
    MODSIGN: Add FIPS policy
    module: signature checking hook
    X.509: Add a crypto key parser for binary (DER) X.509 certificates
    MPILIB: Provide a function to read raw data into an MPI
    X.509: Add an ASN.1 decoder
    X.509: Add simple ASN.1 grammar compiler
    ...

    Linus Torvalds
     

08 Oct, 2012

1 commit

  • Give the key type the opportunity to preparse the payload prior to the
    instantiation and update routines being called. This is done with the
    provision of two new key type operations:

    int (*preparse)(struct key_preparsed_payload *prep);
    void (*free_preparse)(struct key_preparsed_payload *prep);

    If the first operation is present, then it is called before key creation (in
    the add/update case) or before the key semaphore is taken (in the update and
    instantiate cases). The second operation is called to clean up if the first
    was called.

    preparse() is given the opportunity to fill in the following structure:

    struct key_preparsed_payload {
    char *description;
    void *type_data[2];
    void *payload;
    const void *data;
    size_t datalen;
    size_t quotalen;
    };

    Before the preparser is called, the first three fields will have been cleared,
    the payload pointer and size will be stored in data and datalen and the default
    quota size from the key_type struct will be stored into quotalen.

    The preparser may parse the payload in any way it likes and may store data in
    the type_data[] and payload fields for use by the instantiate() and update()
    ops.

    The preparser may also propose a description for the key by attaching it as a
    string to the description field. This can be used by passing a NULL or ""
    description to the add_key() system call or the key_create_or_update()
    function. This cannot work with request_key() as that required the description
    to tell the upcall about the key to be created.

    This, for example permits keys that store PGP public keys to generate their own
    name from the user ID and public key fingerprint in the key.

    The instantiate() and update() operations are then modified to look like this:

    int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
    int (*update)(struct key *key, struct key_preparsed_payload *prep);

    and the new payload data is passed in *prep, whether or not it was preparsed.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     

03 Oct, 2012

3 commits

  • Signed-off-by: David Howells

    David Howells
     
  • Reduce the initial permissions on new keys to grant the possessor everything,
    view permission only to the user (so the keys can be seen in /proc/keys) and
    nothing else.

    This gives the creator a chance to adjust the permissions mask before other
    processes can access the new key or create a link to it.

    To aid with this, keyring_alloc() now takes a permission argument rather than
    setting the permissions itself.

    The following permissions are now set:

    (1) The user and user-session keyrings grant the user that owns them full
    permissions and grant a possessor everything bar SETATTR.

    (2) The process and thread keyrings grant the possessor full permissions but
    only grant the user VIEW. This permits the user to see them in
    /proc/keys, but not to do anything with them.

    (3) Anonymous session keyrings grant the possessor full permissions, but only
    grant the user VIEW and READ. This means that the user can see them in
    /proc/keys and can list them, but nothing else. Possibly READ shouldn't
    be provided either.

    (4) Named session keyrings grant everything an anonymous session keyring does,
    plus they grant the user LINK permission. The whole point of named
    session keyrings is that others can also subscribe to them. Possibly this
    should be a separate permission to LINK.

    (5) The temporary session keyring created by call_sbin_request_key() gets the
    same permissions as an anonymous session keyring.

    (6) Keys created by add_key() get VIEW, SEARCH, LINK and SETATTR for the
    possessor, plus READ and/or WRITE if the key type supports them. The used
    only gets VIEW now.

    (7) Keys created by request_key() now get the same as those created by
    add_key().

    Reported-by: Lennart Poettering
    Reported-by: Stef Walter
    Signed-off-by: David Howells

    David Howells
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

14 Sep, 2012

1 commit

  • - Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
    - Use from_kuid to generate key descriptions
    - Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
    - Avoid potential problems with file descriptor passing by displaying
    keys in the user namespace of the opener of key status proc files.

    Cc: linux-security-module@vger.kernel.org
    Cc: keyrings@linux-nfs.org
    Cc: David Howells
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

13 Sep, 2012

1 commit

  • Give the key type the opportunity to preparse the payload prior to the
    instantiation and update routines being called. This is done with the
    provision of two new key type operations:

    int (*preparse)(struct key_preparsed_payload *prep);
    void (*free_preparse)(struct key_preparsed_payload *prep);

    If the first operation is present, then it is called before key creation (in
    the add/update case) or before the key semaphore is taken (in the update and
    instantiate cases). The second operation is called to clean up if the first
    was called.

    preparse() is given the opportunity to fill in the following structure:

    struct key_preparsed_payload {
    char *description;
    void *type_data[2];
    void *payload;
    const void *data;
    size_t datalen;
    size_t quotalen;
    };

    Before the preparser is called, the first three fields will have been cleared,
    the payload pointer and size will be stored in data and datalen and the default
    quota size from the key_type struct will be stored into quotalen.

    The preparser may parse the payload in any way it likes and may store data in
    the type_data[] and payload fields for use by the instantiate() and update()
    ops.

    The preparser may also propose a description for the key by attaching it as a
    string to the description field. This can be used by passing a NULL or ""
    description to the add_key() system call or the key_create_or_update()
    function. This cannot work with request_key() as that required the description
    to tell the upcall about the key to be created.

    This, for example permits keys that store PGP public keys to generate their own
    name from the user ID and public key fingerprint in the key.

    The instantiate() and update() operations are then modified to look like this:

    int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
    int (*update)(struct key *key, struct key_preparsed_payload *prep);

    and the new payload data is passed in *prep, whether or not it was preparsed.

    Signed-off-by: David Howells

    David Howells
     

21 Aug, 2012

1 commit

  • system_nrt[_freezable]_wq are now spurious. Mark them deprecated and
    convert all users to system[_freezable]_wq.

    If you're cc'd and wondering what's going on: Now all workqueues are
    non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
    Please use system[_freezable]_wq instead.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-By: Lai Jiangshan

    Cc: Jens Axboe
    Cc: David Airlie
    Cc: Jiri Kosina
    Cc: "David S. Miller"
    Cc: Rusty Russell
    Cc: "Paul E. McKenney"
    Cc: David Howells

    Tejun Heo
     

24 May, 2012

1 commit

  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     

11 May, 2012

2 commits

  • Add support for invalidating a key - which renders it immediately invisible to
    further searches and causes the garbage collector to immediately wake up,
    remove it from keyrings and then destroy it when it's no longer referenced.

    It's better not to do this with keyctl_revoke() as that marks the key to start
    returning -EKEYREVOKED to searches when what is actually desired is to have the
    key refetched.

    To invalidate a key the caller must be granted SEARCH permission by the key.
    This may be too strict. It may be better to also permit invalidation if the
    caller has any of READ, WRITE or SETATTR permission.

    The primary use for this is to evict keys that are cached in special keyrings,
    such as the DNS resolver or an ID mapper.

    Signed-off-by: David Howells

    David Howells
     
  • Announce the (un)registration of a key type in the core key code rather than
    in the callers.

    Signed-off-by: David Howells
    Acked-by: Mimi Zohar

    David Howells
     

08 Apr, 2012

1 commit


02 Mar, 2012

1 commit


18 Jan, 2012

1 commit

  • For CIFS, we want to be able to store NTLM credentials (aka username
    and password) in the keyring. We do not, however want to allow users
    to fetch those keys back out of the keyring since that would be a
    security risk.

    Unfortunately, due to the nuances of key permission bits, it's not
    possible to do this. We need to grant search permissions so the kernel
    can find these keys, but that also implies permissions to read the
    payload.

    Resolve this by adding a new key_type. This key type is essentially
    the same as key_type_user, but does not define a .read op. This
    prevents the payload from ever being visible from userspace. This
    key type also vets the description to ensure that it's "qualified"
    by checking to ensure that it has a ':' in it that is preceded by
    other characters.

    Acked-by: David Howells
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

17 Nov, 2011

1 commit


23 Aug, 2011

3 commits

  • unregister_key_type() has code to mark a key as dead and make it unavailable in
    one loop and then destroy all those unavailable key payloads in the next loop.
    However, the loop to mark keys dead renders the key undetectable to the second
    loop by changing the key type pointer also.

    Fix this by the following means:

    (1) The key code has two garbage collectors: one deletes unreferenced keys and
    the other alters keyrings to delete links to old dead, revoked and expired
    keys. They can end up holding each other up as both want to scan the key
    serial tree under spinlock. Combine these into a single routine.

    (2) Move the dead key marking, dead link removal and dead key removal into the
    garbage collector as a three phase process running over the three cycles
    of the normal garbage collection procedure. This is tracked by the
    KEY_GC_REAPING_DEAD_1, _2 and _3 state flags.

    unregister_key_type() then just unlinks the key type from the list, wakes
    up the garbage collector and waits for the third phase to complete.

    (3) Downgrade the key types sem in unregister_key_type() once it has deleted
    the key type from the list so that it doesn't block the keyctl() syscall.

    (4) Dead keys that cannot be simply removed in the third phase have their
    payloads destroyed with the key's semaphore write-locked to prevent
    interference by the keyctl() syscall. There should be no in-kernel users
    of dead keys of that type by the point of unregistration, though keyctl()
    may be holding a reference.

    (5) Only perform timer recalculation in the GC if the timer actually expired.
    If it didn't, we'll get another cycle when it goes off - and if the key
    that actually triggered it has been removed, it's not a problem.

    (6) Only garbage collect link if the timer expired or if we're doing dead key
    clean up phase 2.

    (7) As only key_garbage_collector() is permitted to use rb_erase() on the key
    serial tree, it doesn't need to revalidate its cursor after dropping the
    spinlock as the node the cursor points to must still exist in the tree.

    (8) Drop the spinlock in the GC if there is contention on it or if we need to
    reschedule. After dealing with that, get the spinlock again and resume
    scanning.

    This has been tested in the following ways:

    (1) Run the keyutils testsuite against it.

    (2) Using the AF_RXRPC and RxKAD modules to test keytype removal:

    Load the rxrpc_s key type:

    # insmod /tmp/af-rxrpc.ko
    # insmod /tmp/rxkad.ko

    Create a key (http://people.redhat.com/~dhowells/rxrpc/listen.c):

    # /tmp/listen &
    [1] 8173

    Find the key:

    # grep rxrpc_s /proc/keys
    091086e1 I--Q-- 1 perm 39390000 0 0 rxrpc_s 52:2

    Link it to a session keyring, preferably one with a higher serial number:

    # keyctl link 0x20e36251 @s

    Kill the process (the key should remain as it's linked to another place):

    # fg
    /tmp/listen
    ^C

    Remove the key type:

    rmmod rxkad
    rmmod af-rxrpc

    This can be made a more effective test by altering the following part of
    the patch:

    if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2)) {
    /* Make sure everyone revalidates their keys if we marked a
    * bunch as being dead and make sure all keyring ex-payloads
    * are destroyed.
    */
    kdebug("dead sync");
    synchronize_rcu();

    To call synchronize_rcu() in GC phase 1 instead. That causes that the
    keyring's old payload content to hang around longer until it's RCU
    destroyed - which usually happens after GC phase 3 is complete. This
    allows the destroy_dead_key branch to be tested.

    Reported-by: Benjamin Coddington
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Make the key reaper non-reentrant by sticking it on the appropriate system work
    queue when we queue it. This will allow it to have global state and drop
    locks. It should probably be non-reentrant already as it may spend a long time
    holding the key serial spinlock, and so multiple entrants can spend long
    periods of time just sitting there spinning, waiting to get the lock.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Move the unreferenced key reaper function to the keys garbage collector file
    as that's a more appropriate place with the dead key link reaper.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

08 Mar, 2011

2 commits

  • Add a new keyctl op to reject a key with a specified error code. This works
    much the same as negating a key, and so keyctl_negate_key() is made a special
    case of keyctl_reject_key(). The difference is that keyctl_negate_key()
    selects ENOKEY as the error to be reported.

    Typically the key would be rejected with EKEYEXPIRED, EKEYREVOKED or
    EKEYREJECTED, but this is not mandatory.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Add a key type operation to permit the key type to vet the description of a new
    key that key_alloc() is about to allocate. The operation may reject the
    description if it wishes with an error of its choosing. If it does this, the
    key will not be allocated.

    Signed-off-by: David Howells
    Reviewed-by: Mimi Zohar
    Signed-off-by: James Morris

    David Howells
     

26 Jan, 2011

1 commit

  • Fix __key_link_end()'s attempt to fix up the quota if an error occurs.

    There are two erroneous cases: Firstly, we always decrease the quota if
    the preallocated replacement keyring needs cleaning up, irrespective of
    whether or not we should (we may have replaced a pointer rather than
    adding another pointer).

    Secondly, we never clean up the quota if we added a pointer without the
    keyring storage being extended (we allocate multiple pointers at a time,
    even if we're not going to use them all immediately).

    We handle this by setting the bottom bit of the preallocation pointer in
    __key_link_begin() to indicate that the quota needs fixing up, which is
    then passed to __key_link() (which clears the whole thing) and
    __key_link_end().

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

22 Jan, 2011

2 commits

  • Fix up comments in the key management code. No functional changes.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Do a bit of a style clean up in the key management code. No functional
    changes.

    Done using:

    perl -p -i -e 's!^/[*]*/\n!!' security/keys/*.c
    perl -p -i -e 's!} /[*] end [a-z0-9_]*[(][)] [*]/\n!}\n!' security/keys/*.c
    sed -i -s -e ": next" -e N -e 's/^\n[}]$/}/' -e t -e P -e 's/^.*\n//' -e "b next" security/keys/*.c

    To remove /*****/ lines, remove comments on the closing brace of a
    function to name the function and remove blank lines before the closing
    brace of a function.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

06 May, 2010

1 commit

  • Do preallocation for __key_link() so that the various callers in request_key.c
    can deal with any errors from this source before attempting to construct a key.
    This allows them to assume that the actual linkage step is guaranteed to be
    successful.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

23 Apr, 2010

1 commit


15 Sep, 2009

1 commit

  • Fix a number of problems with the new key garbage collector:

    (1) A rogue semicolon in keyring_gc() was causing the initial count of dead
    keys to be miscalculated.

    (2) A missing return in keyring_gc() meant that under certain circumstances,
    the keyring semaphore would be unlocked twice.

    (3) The key serial tree iterator (key_garbage_collector()) part of the garbage
    collector has been modified to:

    (a) Complete each scan of the keyrings before setting the new timer.

    (b) Only set the new timer for keys that have yet to expire. This means
    that the new timer is now calculated correctly, and the gc doesn't
    get into a loop continually scanning for keys that have expired, and
    preventing other things from happening, like RCU cleaning up the old
    keyring contents.

    (c) Perform an extra scan if any keys were garbage collected in this one
    as a key might become garbage during a scan, and (b) could mean we
    don't set the timer again.

    (4) Made key_schedule_gc() take the time at which to do a collection run,
    rather than the time at which the key expires. This means the collection
    of dead keys (key type unregistered) can happen immediately.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

02 Sep, 2009

3 commits

  • Add garbage collection for dead, revoked and expired keys. This involved
    erasing all links to such keys from keyrings that point to them. At that
    point, the key will be deleted in the normal manner.

    Keyrings from which garbage collection occurs are shrunk and their quota
    consumption reduced as appropriate.

    Dead keys (for which the key type has been removed) will be garbage collected
    immediately.

    Revoked and expired keys will hang around for a number of seconds, as set in
    /proc/sys/kernel/keys/gc_delay before being automatically removed. The default
    is 5 minutes.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Set the KEY_FLAG_DEAD flag on keys for which the type has been removed. This
    causes the key_permission() function to return EKEYREVOKED in response to
    various commands. It does not, however, prevent unlinking or clearing of
    keyrings from detaching the key.

    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Allow keys for which the key type has been removed to be unlinked. Currently
    dead-type keys can only be disposed of by completely clearing the keyrings
    that point to them.

    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     

27 Feb, 2009

1 commit

  • per-uid keys were looked by uid only. Use the user namespace
    to distinguish the same uid in different namespaces.

    This does not address key_permission. So a task can for instance
    try to join a keyring owned by the same uid in another namespace.
    That will be handled by a separate patch.

    Signed-off-by: Serge E. Hallyn
    Acked-by: David Howells
    Signed-off-by: James Morris

    Serge E. Hallyn
     

14 Nov, 2008

2 commits

  • Inaugurate copy-on-write credentials management. This uses RCU to manage the
    credentials pointer in the task_struct with respect to accesses by other tasks.
    A process may only modify its own credentials, and so does not need locking to
    access or modify its own credentials.

    A mutex (cred_replace_mutex) is added to the task_struct to control the effect
    of PTRACE_ATTACHED on credential calculations, particularly with respect to
    execve().

    With this patch, the contents of an active credentials struct may not be
    changed directly; rather a new set of credentials must be prepared, modified
    and committed using something like the following sequence of events:

    struct cred *new = prepare_creds();
    int ret = blah(new);
    if (ret < 0) {
    abort_creds(new);
    return ret;
    }
    return commit_creds(new);

    There are some exceptions to this rule: the keyrings pointed to by the active
    credentials may be instantiated - keyrings violate the COW rule as managing
    COW keyrings is tricky, given that it is possible for a task to directly alter
    the keys in a keyring in use by another task.

    To help enforce this, various pointers to sets of credentials, such as those in
    the task_struct, are declared const. The purpose of this is compile-time
    discouragement of altering credentials through those pointers. Once a set of
    credentials has been made public through one of these pointers, it may not be
    modified, except under special circumstances:

    (1) Its reference count may incremented and decremented.

    (2) The keyrings to which it points may be modified, but not replaced.

    The only safe way to modify anything else is to create a replacement and commit
    using the functions described in Documentation/credentials.txt (which will be
    added by a later patch).

    This patch and the preceding patches have been tested with the LTP SELinux
    testsuite.

    This patch makes several logical sets of alteration:

    (1) execve().

    This now prepares and commits credentials in various places in the
    security code rather than altering the current creds directly.

    (2) Temporary credential overrides.

    do_coredump() and sys_faccessat() now prepare their own credentials and
    temporarily override the ones currently on the acting thread, whilst
    preventing interference from other threads by holding cred_replace_mutex
    on the thread being dumped.

    This will be replaced in a future patch by something that hands down the
    credentials directly to the functions being called, rather than altering
    the task's objective credentials.

    (3) LSM interface.

    A number of functions have been changed, added or removed:

    (*) security_capset_check(), ->capset_check()
    (*) security_capset_set(), ->capset_set()

    Removed in favour of security_capset().

    (*) security_capset(), ->capset()

    New. This is passed a pointer to the new creds, a pointer to the old
    creds and the proposed capability sets. It should fill in the new
    creds or return an error. All pointers, barring the pointer to the
    new creds, are now const.

    (*) security_bprm_apply_creds(), ->bprm_apply_creds()

    Changed; now returns a value, which will cause the process to be
    killed if it's an error.

    (*) security_task_alloc(), ->task_alloc_security()

    Removed in favour of security_prepare_creds().

    (*) security_cred_free(), ->cred_free()

    New. Free security data attached to cred->security.

    (*) security_prepare_creds(), ->cred_prepare()

    New. Duplicate any security data attached to cred->security.

    (*) security_commit_creds(), ->cred_commit()

    New. Apply any security effects for the upcoming installation of new
    security by commit_creds().

    (*) security_task_post_setuid(), ->task_post_setuid()

    Removed in favour of security_task_fix_setuid().

    (*) security_task_fix_setuid(), ->task_fix_setuid()

    Fix up the proposed new credentials for setuid(). This is used by
    cap_set_fix_setuid() to implicitly adjust capabilities in line with
    setuid() changes. Changes are made to the new credentials, rather
    than the task itself as in security_task_post_setuid().

    (*) security_task_reparent_to_init(), ->task_reparent_to_init()

    Removed. Instead the task being reparented to init is referred
    directly to init's credentials.

    NOTE! This results in the loss of some state: SELinux's osid no
    longer records the sid of the thread that forked it.

    (*) security_key_alloc(), ->key_alloc()
    (*) security_key_permission(), ->key_permission()

    Changed. These now take cred pointers rather than task pointers to
    refer to the security context.

    (4) sys_capset().

    This has been simplified and uses less locking. The LSM functions it
    calls have been merged.

    (5) reparent_to_kthreadd().

    This gives the current thread the same credentials as init by simply using
    commit_thread() to point that way.

    (6) __sigqueue_alloc() and switch_uid()

    __sigqueue_alloc() can't stop the target task from changing its creds
    beneath it, so this function gets a reference to the currently applicable
    user_struct which it then passes into the sigqueue struct it returns if
    successful.

    switch_uid() is now called from commit_creds(), and possibly should be
    folded into that. commit_creds() should take care of protecting
    __sigqueue_alloc().

    (7) [sg]et[ug]id() and co and [sg]et_current_groups.

    The set functions now all use prepare_creds(), commit_creds() and
    abort_creds() to build and check a new set of credentials before applying
    it.

    security_task_set[ug]id() is called inside the prepared section. This
    guarantees that nothing else will affect the creds until we've finished.

    The calling of set_dumpable() has been moved into commit_creds().

    Much of the functionality of set_user() has been moved into
    commit_creds().

    The get functions all simply access the data directly.

    (8) security_task_prctl() and cap_task_prctl().

    security_task_prctl() has been modified to return -ENOSYS if it doesn't
    want to handle a function, or otherwise return the return value directly
    rather than through an argument.

    Additionally, cap_task_prctl() now prepares a new set of credentials, even
    if it doesn't end up using it.

    (9) Keyrings.

    A number of changes have been made to the keyrings code:

    (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
    all been dropped and built in to the credentials functions directly.
    They may want separating out again later.

    (b) key_alloc() and search_process_keyrings() now take a cred pointer
    rather than a task pointer to specify the security context.

    (c) copy_creds() gives a new thread within the same thread group a new
    thread keyring if its parent had one, otherwise it discards the thread
    keyring.

    (d) The authorisation key now points directly to the credentials to extend
    the search into rather pointing to the task that carries them.

    (e) Installing thread, process or session keyrings causes a new set of
    credentials to be created, even though it's not strictly necessary for
    process or session keyrings (they're shared).

    (10) Usermode helper.

    The usermode helper code now carries a cred struct pointer in its
    subprocess_info struct instead of a new session keyring pointer. This set
    of credentials is derived from init_cred and installed on the new process
    after it has been cloned.

    call_usermodehelper_setup() allocates the new credentials and
    call_usermodehelper_freeinfo() discards them if they haven't been used. A
    special cred function (prepare_usermodeinfo_creds()) is provided
    specifically for call_usermodehelper_setup() to call.

    call_usermodehelper_setkeys() adjusts the credentials to sport the
    supplied keyring as the new session keyring.

    (11) SELinux.

    SELinux has a number of changes, in addition to those to support the LSM
    interface changes mentioned above:

    (a) selinux_setprocattr() no longer does its check for whether the
    current ptracer can access processes with the new SID inside the lock
    that covers getting the ptracer's SID. Whilst this lock ensures that
    the check is done with the ptracer pinned, the result is only valid
    until the lock is released, so there's no point doing it inside the
    lock.

    (12) is_single_threaded().

    This function has been extracted from selinux_setprocattr() and put into
    a file of its own in the lib/ directory as join_session_keyring() now
    wants to use it too.

    The code in SELinux just checked to see whether a task shared mm_structs
    with other tasks (CLONE_VM), but that isn't good enough. We really want
    to know if they're part of the same thread group (CLONE_THREAD).

    (13) nfsd.

    The NFS server daemon now has to use the COW credentials to set the
    credentials it is going to use. It really needs to pass the credentials
    down to the functions it calls, but it can't do that until other patches
    in this series have been applied.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Signed-off-by: James Morris

    David Howells
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells