24 Sep, 2013

1 commit

  • Add support for per-user_namespace registers of persistent per-UID kerberos
    caches held within the kernel.

    This allows the kerberos cache to be retained beyond the life of all a user's
    processes so that the user's cron jobs can work.

    The kerberos cache is envisioned as a keyring/key tree looking something like:

    struct user_namespace
    \___ .krb_cache keyring - The register
    \___ _krb.0 keyring - Root's Kerberos cache
    \___ _krb.5000 keyring - User 5000's Kerberos cache
    \___ _krb.5001 keyring - User 5001's Kerberos cache
    \___ tkt785 big_key - A ccache blob
    \___ tkt12345 big_key - Another ccache blob

    Or possibly:

    struct user_namespace
    \___ .krb_cache keyring - The register
    \___ _krb.0 keyring - Root's Kerberos cache
    \___ _krb.5000 keyring - User 5000's Kerberos cache
    \___ _krb.5001 keyring - User 5001's Kerberos cache
    \___ tkt785 keyring - A ccache
    \___ krbtgt/REDHAT.COM@REDHAT.COM big_key
    \___ http/REDHAT.COM@REDHAT.COM user
    \___ afs/REDHAT.COM@REDHAT.COM user
    \___ nfs/REDHAT.COM@REDHAT.COM user
    \___ krbtgt/KERNEL.ORG@KERNEL.ORG big_key
    \___ http/KERNEL.ORG@KERNEL.ORG big_key

    What goes into a particular Kerberos cache is entirely up to userspace. Kernel
    support is limited to giving you the Kerberos cache keyring that you want.

    The user asks for their Kerberos cache by:

    krb_cache = keyctl_get_krbcache(uid, dest_keyring);

    The uid is -1 or the user's own UID for the user's own cache or the uid of some
    other user's cache (requires CAP_SETUID). This permits rpc.gssd or whatever to
    mess with the cache.

    The cache returned is a keyring named "_krb." that the possessor can read,
    search, clear, invalidate, unlink from and add links to. Active LSMs get a
    chance to rule on whether the caller is permitted to make a link.

    Each uid's cache keyring is created when it first accessed and is given a
    timeout that is extended each time this function is called so that the keyring
    goes away after a while. The timeout is configurable by sysctl but defaults to
    three days.

    Each user_namespace struct gets a lazily-created keyring that serves as the
    register. The cache keyrings are added to it. This means that standard key
    search and garbage collection facilities are available.

    The user_namespace struct's register goes away when it does and anything left
    in it is then automatically gc'd.

    Signed-off-by: David Howells
    Tested-by: Simo Sorce
    cc: Serge E. Hallyn
    cc: Eric W. Biederman

    David Howells
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

17 Dec, 2012

1 commit

  • Pull security subsystem updates from James Morris:
    "A quiet cycle for the security subsystem with just a few maintenance
    updates."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    Smack: create a sysfs mount point for smackfs
    Smack: use select not depends in Kconfig
    Yama: remove locking from delete path
    Yama: add RCU to drop read locking
    drivers/char/tpm: remove tasklet and cleanup
    KEYS: Use keyring_alloc() to create special keyrings
    KEYS: Reduce initial permissions on keys
    KEYS: Make the session and process keyrings per-thread
    seccomp: Make syscall skipping and nr changes more consistent
    key: Fix resource leak
    keys: Fix unreachable code
    KEYS: Add payload preparsing opportunity prior to key instantiate or update

    Linus Torvalds
     

15 Oct, 2012

1 commit

  • Pull module signing support from Rusty Russell:
    "module signing is the highlight, but it's an all-over David Howells frenzy..."

    Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

    * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
    X.509: Fix indefinite length element skip error handling
    X.509: Convert some printk calls to pr_devel
    asymmetric keys: fix printk format warning
    MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
    MODSIGN: Make mrproper should remove generated files.
    MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
    MODSIGN: Use the same digest for the autogen key sig as for the module sig
    MODSIGN: Sign modules during the build process
    MODSIGN: Provide a script for generating a key ID from an X.509 cert
    MODSIGN: Implement module signature checking
    MODSIGN: Provide module signing public keys to the kernel
    MODSIGN: Automatically generate module signing keys if missing
    MODSIGN: Provide Kconfig options
    MODSIGN: Provide gitignore and make clean rules for extra files
    MODSIGN: Add FIPS policy
    module: signature checking hook
    X.509: Add a crypto key parser for binary (DER) X.509 certificates
    MPILIB: Provide a function to read raw data into an MPI
    X.509: Add an ASN.1 decoder
    X.509: Add simple ASN.1 grammar compiler
    ...

    Linus Torvalds
     

08 Oct, 2012

1 commit

  • Give the key type the opportunity to preparse the payload prior to the
    instantiation and update routines being called. This is done with the
    provision of two new key type operations:

    int (*preparse)(struct key_preparsed_payload *prep);
    void (*free_preparse)(struct key_preparsed_payload *prep);

    If the first operation is present, then it is called before key creation (in
    the add/update case) or before the key semaphore is taken (in the update and
    instantiate cases). The second operation is called to clean up if the first
    was called.

    preparse() is given the opportunity to fill in the following structure:

    struct key_preparsed_payload {
    char *description;
    void *type_data[2];
    void *payload;
    const void *data;
    size_t datalen;
    size_t quotalen;
    };

    Before the preparser is called, the first three fields will have been cleared,
    the payload pointer and size will be stored in data and datalen and the default
    quota size from the key_type struct will be stored into quotalen.

    The preparser may parse the payload in any way it likes and may store data in
    the type_data[] and payload fields for use by the instantiate() and update()
    ops.

    The preparser may also propose a description for the key by attaching it as a
    string to the description field. This can be used by passing a NULL or ""
    description to the add_key() system call or the key_create_or_update()
    function. This cannot work with request_key() as that required the description
    to tell the upcall about the key to be created.

    This, for example permits keys that store PGP public keys to generate their own
    name from the user ID and public key fingerprint in the key.

    The instantiate() and update() operations are then modified to look like this:

    int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
    int (*update)(struct key *key, struct key_preparsed_payload *prep);

    and the new payload data is passed in *prep, whether or not it was preparsed.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     

03 Oct, 2012

3 commits

  • Signed-off-by: David Howells

    David Howells
     
  • Make the session keyring per-thread rather than per-process, but still
    inherited from the parent thread to solve a problem with PAM and gdm.

    The problem is that join_session_keyring() will reject attempts to change the
    session keyring of a multithreaded program but gdm is now multithreaded before
    it gets to the point of starting PAM and running pam_keyinit to create the
    session keyring. See:

    https://bugs.freedesktop.org/show_bug.cgi?id=49211

    The reason that join_session_keyring() will only change the session keyring
    under a single-threaded environment is that it's hard to alter the other
    thread's credentials to effect the change in a multi-threaded program. The
    problems are such as:

    (1) How to prevent two threads both running join_session_keyring() from
    racing.

    (2) Another thread's credentials may not be modified directly by this process.

    (3) The number of threads is uncertain whilst we're not holding the
    appropriate spinlock, making preallocation slightly tricky.

    (4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get
    another thread to replace its keyring, but that means preallocating for
    each thread.

    A reasonable way around this is to make the session keyring per-thread rather
    than per-process and just document that if you want a common session keyring,
    you must get it before you spawn any threads - which is the current situation
    anyway.

    Whilst we're at it, we can the process keyring behave in the same way. This
    means we can clean up some of the ickyness in the creds code.

    Basically, after this patch, the session, process and thread keyrings are about
    inheritance rules only and not about sharing changes of keyring.

    Reported-by: Mantas M.
    Signed-off-by: David Howells
    Tested-by: Ray Strode

    David Howells
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

28 Sep, 2012

1 commit


14 Sep, 2012

1 commit

  • - Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
    - Use from_kuid to generate key descriptions
    - Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
    - Avoid potential problems with file descriptor passing by displaying
    keys in the user namespace of the opener of key status proc files.

    Cc: linux-security-module@vger.kernel.org
    Cc: keyrings@linux-nfs.org
    Cc: David Howells
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

13 Sep, 2012

2 commits

  • This reverts commit d35abdb28824cf74f0a106a0f9c6f3ff700a35bf.

    task_lock() was added to ensure exit_mm() and thus exit_task_work() is
    not possible before task_work_add().

    This is wrong, task_lock() must not be nested with write_lock(tasklist).
    And this is no longer needed, task_work_add() now fails if it is called
    after exit_task_work().

    Reported-by: Dave Jones
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20120826191214.GA4231@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Give the key type the opportunity to preparse the payload prior to the
    instantiation and update routines being called. This is done with the
    provision of two new key type operations:

    int (*preparse)(struct key_preparsed_payload *prep);
    void (*free_preparse)(struct key_preparsed_payload *prep);

    If the first operation is present, then it is called before key creation (in
    the add/update case) or before the key semaphore is taken (in the update and
    instantiate cases). The second operation is called to clean up if the first
    was called.

    preparse() is given the opportunity to fill in the following structure:

    struct key_preparsed_payload {
    char *description;
    void *type_data[2];
    void *payload;
    const void *data;
    size_t datalen;
    size_t quotalen;
    };

    Before the preparser is called, the first three fields will have been cleared,
    the payload pointer and size will be stored in data and datalen and the default
    quota size from the key_type struct will be stored into quotalen.

    The preparser may parse the payload in any way it likes and may store data in
    the type_data[] and payload fields for use by the instantiate() and update()
    ops.

    The preparser may also propose a description for the key by attaching it as a
    string to the description field. This can be used by passing a NULL or ""
    description to the add_key() system call or the key_create_or_update()
    function. This cannot work with request_key() as that required the description
    to tell the upcall about the key to be created.

    This, for example permits keys that store PGP public keys to generate their own
    name from the user ID and public key fingerprint in the key.

    The instantiate() and update() operations are then modified to look like this:

    int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
    int (*update)(struct key *key, struct key_preparsed_payload *prep);

    and the new payload data is passed in *prep, whether or not it was preparsed.

    Signed-off-by: David Howells

    David Howells
     

24 Jul, 2012

1 commit

  • Pull security subsystem updates from James Morris:
    "Nothing groundbreaking for this kernel, just cleanups and fixes, and a
    couple of Smack enhancements."

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
    Smack: Maintainer Record
    Smack: don't show empty rules when /smack/load or /smack/load2 is read
    Smack: user access check bounds
    Smack: onlycap limits on CAP_MAC_ADMIN
    Smack: fix smack_new_inode bogosities
    ima: audit is compiled only when enabled
    ima: ima_initialized is set only if successful
    ima: add policy for pseudo fs
    ima: remove unused cleanup functions
    ima: free securityfs violations file
    ima: use full pathnames in measurement list
    security: Fix nommu build.
    samples: seccomp: add .gitignore for untracked executables
    tpm: check the chip reference before using it
    TPM: fix memleak when register hardware fails
    TPM: chip disabled state erronously being reported as error
    MAINTAINERS: TPM maintainers' contacts update
    Merge branches 'next-queue' and 'next' into next
    Remove unused code from MPI library
    Revert "crypto: GnuPG based MPI lib - additional sources (part 4)"
    ...

    Linus Torvalds
     

23 Jul, 2012

3 commits


10 Jun, 2012

1 commit


01 Jun, 2012

3 commits

  • Pull second pile of signal handling patches from Al Viro:
    "This one is just task_work_add() series + remaining prereqs for it.

    There probably will be another pull request from that tree this
    cycle - at least for helpers, to get them out of the way for per-arch
    fixes remaining in the tree."

    Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile
    had brought in commit 97fd75b7b8e0 ("kernel/irq/manage.c: use the
    pr_foo() infrastructure to prefix printks") which changed one of the
    pr_err() calls that this merge moves around.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    keys: kill task_struct->replacement_session_keyring
    keys: kill the dummy key_replace_session_keyring()
    keys: change keyctl_session_to_parent() to use task_work_add()
    genirq: reimplement exit_irq_thread() hook via task_work_add()
    task_work_add: generic process-context callbacks
    avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
    parisc: need to check NOTIFY_RESUME when exiting from syscall
    move key_repace_session_keyring() into tracehook_notify_resume()
    TIF_NOTIFY_RESUME is defined on all targets now

    Linus Torvalds
     
  • A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
    changes made to support CMA in an earlier patch.

    Rather than having an additional check_access parameter to these
    functions, the first paramater type is overloaded to allow the caller to
    specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
    are valid, but do not check the memory that they point to. This is used
    by process_vm_readv/writev where we need to validate that a iovec passed
    to the syscall is valid but do not want to check the memory that it points
    to at this point because it refers to an address space in another process.

    Signed-off-by: Chris Yeoh
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     
  • This allocation may be large. The code is probing to see if it will
    succeed and if not, it falls back to vmalloc(). We should suppress any
    page-allocation failure messages when the fallback happens.

    Reported-by: Dave Jones
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

25 May, 2012

1 commit

  • Fix some sparse warnings in the keyrings code:

    (1) compat_keyctl_instantiate_key_iov() should be static.

    (2) There were a couple of places where a pointer was being compared against
    integer 0 rather than NULL.

    (3) keyctl_instantiate_key_common() should not take a __user-labelled iovec
    pointer as the caller must have copied the iovec to kernel space.

    (4) __key_link_begin() takes and __key_link_end() releases
    keyring_serialise_link_sem under some circumstances and so this should be
    declared.

    Note that adding __acquires() and __releases() for this doesn't help cure
    the warnings messages - something only commenting out both helps.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

24 May, 2012

2 commits

  • Change keyctl_session_to_parent() to use task_work_add() and move
    key_replace_session_keyring() logic into task_work->func().

    Note that we do task_work_cancel() before task_work_add() to ensure that
    only one work can be pending at any time. This is important, we must not
    allow user-space to abuse the parent's ->task_works list.

    The callback, replace_session_keyring(), checks PF_EXITING. I guess this
    is not really needed but looks better.

    As a side effect, this fixes the (unlikely) race. The callers of
    key_replace_session_keyring() and keyctl_session_to_parent() lack the
    necessary barriers, the parent can miss the request.

    Now we can remove task_struct->replacement_session_keyring and related
    code.

    Signed-off-by: Oleg Nesterov
    Acked-by: David Howells
    Cc: Thomas Gleixner
    Cc: Richard Kuo
    Cc: Linus Torvalds
    Cc: Alexander Gordeev
    Cc: Chris Zankel
    Cc: David Smith
    Cc: "Frank Ch. Eigler"
    Cc: Geert Uytterhoeven
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • Signed-off-by: Al Viro

    Al Viro
     

11 May, 2012

1 commit

  • Add support for invalidating a key - which renders it immediately invisible to
    further searches and causes the garbage collector to immediately wake up,
    remove it from keyrings and then destroy it when it's no longer referenced.

    It's better not to do this with keyctl_revoke() as that marks the key to start
    returning -EKEYREVOKED to searches when what is actually desired is to have the
    key refetched.

    To invalidate a key the caller must be granted SEARCH permission by the key.
    This may be too strict. It may be better to also permit invalidation if the
    caller has any of READ, WRITE or SETATTR permission.

    The primary use for this is to evict keys that are cached in special keyrings,
    such as the DNS resolver or an ID mapper.

    Signed-off-by: David Howells

    David Howells
     

23 Mar, 2012

1 commit

  • Pull NFS client updates for Linux 3.4 from Trond Myklebust:
    "New features include:
    - Add NFS client support for containers.

    This should enable most of the necessary functionality, including
    lockd support, and support for rpc.statd, NFSv4 idmapper and
    RPCSEC_GSS upcalls into the correct network namespace from which
    the mount system call was issued.

    - NFSv4 idmapper scalability improvements

    Base the idmapper cache on the keyring interface to allow
    concurrent access to idmapper entries. Start the process of
    migrating users from the single-threaded daemon-based approach to
    the multi-threaded request-key based approach.

    - NFSv4.1 implementation id.

    Allows the NFSv4.1 client and server to mutually identify each
    other for logging and debugging purposes.

    - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
    having to use the more counterintuitive 'vers=4,minorversion=1'.

    - SUNRPC tracepoints.

    Start the process of adding tracepoints in order to improve
    debugging of the RPC layer.

    - pNFS object layout support for autologin.

    Important bugfixes include:

    - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
    fail to wake up all tasks when applied to priority waitqueues.

    - Ensure that we handle read delegations correctly, when we try to
    truncate a file.

    - A number of fixes for NFSv4 state manager loops (mostly to do with
    delegation recovery)."

    * tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
    NFS: fix sb->s_id in nfs debug prints
    xprtrdma: Remove assumption that each segment is ls_state in release_lockowner
    NFS: ncommit count is being double decremented
    SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
    Try using machine credentials for RENEW calls
    NFSv4.1: Fix a few issues in filelayout_commit_pagelist
    NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
    ...

    Linus Torvalds
     

02 Mar, 2012

1 commit


19 Jan, 2012

1 commit

  • The kernel contains some special internal keyrings, for instance the DNS
    resolver keyring :

    2a93faf1 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: empty

    It would occasionally be useful to allow the contents of such keyrings to be
    flushed by root (cache invalidation).

    Allow a flag to be set on a keyring to mark that someone possessing the
    sysadmin capability can clear the keyring, even without normal write access to
    the keyring.

    Set this flag on the special keyrings created by the DNS resolver, the NFS
    identity mapper and the CIFS identity mapper.

    Signed-off-by: David Howells
    Acked-by: Jeff Layton
    Acked-by: Steve Dickson
    Signed-off-by: James Morris

    David Howells
     

01 Nov, 2011

1 commit

  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

17 Mar, 2011

1 commit

  • Make request_key() and co. return an error for a negative or rejected key. If
    the key was simply negated, then return ENOKEY, otherwise return the error
    with which it was rejected.

    Without this patch, the following command returns a key number (with the latest
    keyutils):

    [root@andromeda ~]# keyctl request2 user debug:foo rejected @s
    586569904

    Trying to print the key merely gets you a permission denied error:

    [root@andromeda ~]# keyctl print 586569904
    keyctl_read_alloc: Permission denied

    Doing another request_key() call does get you the error, as long as it hasn't
    expired yet:

    [root@andromeda ~]# keyctl request user debug:foo
    request_key: Key was rejected by service

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

08 Mar, 2011

2 commits

  • Add a keyctl op (KEYCTL_INSTANTIATE_IOV) that is like KEYCTL_INSTANTIATE, but
    takes an iovec array and concatenates the data in-kernel into one buffer.
    Since the KEYCTL_INSTANTIATE copies the data anyway, this isn't too much of a
    problem.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Add a new keyctl op to reject a key with a specified error code. This works
    much the same as negating a key, and so keyctl_negate_key() is made a special
    case of keyctl_reject_key(). The difference is that keyctl_negate_key()
    selects ENOKEY as the error to be reported.

    Typically the key would be rejected with EKEYEXPIRED, EKEYREVOKED or
    EKEYREJECTED, but this is not mandatory.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

22 Jan, 2011

2 commits

  • Fix up comments in the key management code. No functional changes.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Do a bit of a style clean up in the key management code. No functional
    changes.

    Done using:

    perl -p -i -e 's!^/[*]*/\n!!' security/keys/*.c
    perl -p -i -e 's!} /[*] end [a-z0-9_]*[(][)] [*]/\n!}\n!' security/keys/*.c
    sed -i -s -e ": next" -e N -e 's/^\n[}]$/}/' -e t -e P -e 's/^.*\n//' -e "b next" security/keys/*.c

    To remove /*****/ lines, remove comments on the closing brace of a
    function to name the function and remove blank lines before the closing
    brace of a function.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

10 Sep, 2010

2 commits

  • Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
    of the parent process's session keyring whether or not the parent has a session
    keyring [CVE-2010-2960].

    This results in the following oops:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
    IP: [] keyctl_session_to_parent+0x251/0x443
    ...
    Call Trace:
    [] ? keyctl_session_to_parent+0x67/0x443
    [] ? __do_fault+0x24b/0x3d0
    [] sys_keyctl+0xb4/0xb8
    [] system_call_fastpath+0x16/0x1b

    if the parent process has no session keyring.

    If the system is using pam_keyinit then it mostly protected against this as all
    processes derived from a login will have inherited the session keyring created
    by pam_keyinit during the log in procedure.

    To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

    Reported-by: Tavis Ormandy
    Signed-off-by: David Howells
    Acked-by: Tavis Ormandy
    Signed-off-by: Linus Torvalds

    David Howells
     
  • There's an protected access to the parent process's credentials in the middle
    of keyctl_session_to_parent(). This results in the following RCU warning:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by keyctl-session-/2137:
    #0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236

    stack backtrace:
    Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] keyctl_session_to_parent+0xed/0x236
    [] sys_keyctl+0xb4/0xb6
    [] system_call_fastpath+0x16/0x1b

    The code should take the RCU read lock to make sure the parents credentials
    don't go away, even though it's holding a spinlock and has IRQ disabled.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

02 Aug, 2010

2 commits

  • keyctl_describe_key() turns the key reference it gets into a usable key pointer
    and assigns that to a variable called 'key', which it then ignores in favour of
    recomputing the key pointer each time it needs it. Make it use the precomputed
    pointer instead.

    Without this patch, gcc 4.6 reports that the variable key is set but not used:

    building with gcc 4.6 I'm getting a warning message:
    CC security/keys/keyctl.o
    security/keys/keyctl.c: In function 'keyctl_describe_key':
    security/keys/keyctl.c:472:14: warning: variable 'key' set but not used

    Reported-by: Justin P. Mattock
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Authorise a process to perform keyctl_set_timeout() on an uninstantiated key if
    that process has the authorisation key for it.

    This allows the instantiator to set the timeout on a key it is instantiating -
    provided it does it before instantiating the key.

    For instance, the test upcall script provided with the keyutils package could
    be modified to set the expiry to an hour hence before instantiating the key:

    [/usr/share/keyutils/request-key-debug.sh]
    if [ "$3" != "neg" ]
    then
    + keyctl timeout $1 3600
    keyctl instantiate $1 "Debug $3" $4 || exit 1
    else

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

27 Jun, 2010

1 commit

  • This is from a Smatch check I'm writing.

    strncpy_from_user() returns -EFAULT on error so the first change just
    silences a warning but doesn't change how the code works.

    The other change is a bug fix because install_thread_keyring_to_cred()
    can return a variety of errors such as -EINVAL, -EEXIST, -ENOMEM or
    -EKEYREVOKED.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     

28 May, 2010

1 commit


23 Apr, 2010

1 commit