12 Apr, 2016

1 commit

  • Add a facility whereby proposed new links to be added to a keyring can be
    vetted, permitting them to be rejected if necessary. This can be used to
    block public keys from which the signature cannot be verified or for which
    the signature verification fails. It could also be used to provide
    blacklisting.

    This affects operations like add_key(), KEYCTL_LINK and KEYCTL_INSTANTIATE.

    To this end:

    (1) A function pointer is added to the key struct that, if set, points to
    the vetting function. This is called as:

    int (*restrict_link)(struct key *keyring,
    const struct key_type *key_type,
    unsigned long key_flags,
    const union key_payload *key_payload),

    where 'keyring' will be the keyring being added to, key_type and
    key_payload will describe the key being added and key_flags[*] can be
    AND'ed with KEY_FLAG_TRUSTED.

    [*] This parameter will be removed in a later patch when
    KEY_FLAG_TRUSTED is removed.

    The function should return 0 to allow the link to take place or an
    error (typically -ENOKEY, -ENOPKG or -EKEYREJECTED) to reject the
    link.

    The pointer should not be set directly, but rather should be set
    through keyring_alloc().

    Note that if called during add_key(), preparse is called before this
    method, but a key isn't actually allocated until after this function
    is called.

    (2) KEY_ALLOC_BYPASS_RESTRICTION is added. This can be passed to
    key_create_or_update() or key_instantiate_and_link() to bypass the
    restriction check.

    (3) KEY_FLAG_TRUSTED_ONLY is removed. The entire contents of a keyring
    with this restriction emplaced can be considered 'trustworthy' by
    virtue of being in the keyring when that keyring is consulted.

    (4) key_alloc() and keyring_alloc() take an extra argument that will be
    used to set restrict_link in the new key. This ensures that the
    pointer is set before the key is published, thus preventing a window
    of unrestrictedness. Normally this argument will be NULL.

    (5) As a temporary affair, keyring_restrict_trusted_only() is added. It
    should be passed to keyring_alloc() as the extra argument instead of
    setting KEY_FLAG_TRUSTED_ONLY on a keyring. This will be replaced in
    a later patch with functions that look in the appropriate places for
    authoritative keys.

    Signed-off-by: David Howells
    Reviewed-by: Mimi Zohar

    David Howells
     

20 Jan, 2016

1 commit

  • This fixes CVE-2016-0728.

    If a thread is asked to join as a session keyring the keyring that's already
    set as its session, we leak a keyring reference.

    This can be tested with the following program:

    #include
    #include
    #include
    #include

    int main(int argc, const char *argv[])
    {
    int i = 0;
    key_serial_t serial;

    serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING,
    "leaked-keyring");
    if (serial < 0) {
    perror("keyctl");
    return -1;
    }

    if (keyctl(KEYCTL_SETPERM, serial,
    KEY_POS_ALL | KEY_USR_ALL) < 0) {
    perror("keyctl");
    return -1;
    }

    for (i = 0; i < 100; i++) {
    serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING,
    "leaked-keyring");
    if (serial < 0) {
    perror("keyctl");
    return -1;
    }
    }

    return 0;
    }

    If, after the program has run, there something like the following line in
    /proc/keys:

    3f3d898f I--Q--- 100 perm 3f3f0000 0 0 keyring leaked-keyring: empty

    with a usage count of 100 * the number of times the program has been run,
    then the kernel is malfunctioning. If leaked-keyring has zero usages or
    has been garbage collected, then the problem is fixed.

    Reported-by: Yevgeny Pats
    Signed-off-by: David Howells
    Acked-by: Don Zickus
    Acked-by: Prarit Bhargava
    Acked-by: Jarod Wilson
    Signed-off-by: James Morris

    Yevgeny Pats
     

21 Oct, 2015

1 commit

  • Merge the type-specific data with the payload data into one four-word chunk
    as it seems pointless to keep them separate.

    Use user_key_payload() for accessing the payloads of overloaded
    user-defined keys.

    Signed-off-by: David Howells
    cc: linux-cifs@vger.kernel.org
    cc: ecryptfs@vger.kernel.org
    cc: linux-ext4@vger.kernel.org
    cc: linux-f2fs-devel@lists.sourceforge.net
    cc: linux-nfs@vger.kernel.org
    cc: ceph-devel@vger.kernel.org
    cc: linux-ima-devel@lists.sourceforge.net

    David Howells
     

05 Sep, 2015

1 commit

  • Credit where credit is due: this idea comes from Christoph Lameter with
    a lot of valuable input from Serge Hallyn. This patch is heavily based
    on Christoph's patch.

    ===== The status quo =====

    On Linux, there are a number of capabilities defined by the kernel. To
    perform various privileged tasks, processes can wield capabilities that
    they hold.

    Each task has four capability masks: effective (pE), permitted (pP),
    inheritable (pI), and a bounding set (X). When the kernel checks for a
    capability, it checks pE. The other capability masks serve to modify
    what capabilities can be in pE.

    Any task can remove capabilities from pE, pP, or pI at any time. If a
    task has a capability in pP, it can add that capability to pE and/or pI.
    If a task has CAP_SETPCAP, then it can add any capability to pI, and it
    can remove capabilities from X.

    Tasks are not the only things that can have capabilities; files can also
    have capabilities. A file can have no capabilty information at all [1].
    If a file has capability information, then it has a permitted mask (fP)
    and an inheritable mask (fI) as well as a single effective bit (fE) [2].
    File capabilities modify the capabilities of tasks that execve(2) them.

    A task that successfully calls execve has its capabilities modified for
    the file ultimately being excecuted (i.e. the binary itself if that
    binary is ELF or for the interpreter if the binary is a script.) [3] In
    the capability evolution rules, for each mask Z, pZ represents the old
    value and pZ' represents the new value. The rules are:

    pP' = (X & fP) | (pI & fI)
    pI' = pI
    pE' = (fE ? pP' : 0)
    X is unchanged

    For setuid binaries, fP, fI, and fE are modified by a moderately
    complicated set of rules that emulate POSIX behavior. Similarly, if
    euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
    (primary, fP and fI usually end up being the full set). For nonroot
    users executing binaries with neither setuid nor file caps, fI and fP
    are empty and fE is false.

    As an extra complication, if you execute a process as nonroot and fE is
    set, then the "secure exec" rules are in effect: AT_SECURE gets set,
    LD_PRELOAD doesn't work, etc.

    This is rather messy. We've learned that making any changes is
    dangerous, though: if a new kernel version allows an unprivileged
    program to change its security state in a way that persists cross
    execution of a setuid program or a program with file caps, this
    persistent state is surprisingly likely to allow setuid or file-capped
    programs to be exploited for privilege escalation.

    ===== The problem =====

    Capability inheritance is basically useless.

    If you aren't root and you execute an ordinary binary, fI is zero, so
    your capabilities have no effect whatsoever on pP'. This means that you
    can't usefully execute a helper process or a shell command with elevated
    capabilities if you aren't root.

    On current kernels, you can sort of work around this by setting fI to
    the full set for most or all non-setuid executable files. This causes
    pP' = pI for nonroot, and inheritance works. No one does this because
    it's a PITA and it isn't even supported on most filesystems.

    If you try this, you'll discover that every nonroot program ends up with
    secure exec rules, breaking many things.

    This is a problem that has bitten many people who have tried to use
    capabilities for anything useful.

    ===== The proposed change =====

    This patch adds a fifth capability mask called the ambient mask (pA).
    pA does what most people expect pI to do.

    pA obeys the invariant that no bit can ever be set in pA if it is not
    set in both pP and pI. Dropping a bit from pP or pI drops that bit from
    pA. This ensures that existing programs that try to drop capabilities
    still do so, with a complication. Because capability inheritance is so
    broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
    then calling execve effectively drops capabilities. Therefore,
    setresuid from root to nonroot conditionally clears pA unless
    SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can
    re-add bits to pA afterwards.

    The capability evolution rules are changed:

    pA' = (file caps or setuid or setgid ? 0 : pA)
    pP' = (X & fP) | (pI & fI) | pA'
    pI' = pI
    pE' = (fE ? pP' : pA')
    X is unchanged

    If you are nonroot but you have a capability, you can add it to pA. If
    you do so, your children get that capability in pA, pP, and pE. For
    example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
    automatically bind low-numbered ports. Hallelujah!

    Unprivileged users can create user namespaces, map themselves to a
    nonzero uid, and create both privileged (relative to their namespace)
    and unprivileged process trees. This is currently more or less
    impossible. Hallelujah!

    You cannot use pA to try to subvert a setuid, setgid, or file-capped
    program: if you execute any such program, pA gets cleared and the
    resulting evolution rules are unchanged by this patch.

    Users with nonzero pA are unlikely to unintentionally leak that
    capability. If they run programs that try to drop privileges, dropping
    privileges will still work.

    It's worth noting that the degree of paranoia in this patch could
    possibly be reduced without causing serious problems. Specifically, if
    we allowed pA to persist across executing non-pA-aware setuid binaries
    and across setresuid, then, naively, the only capabilities that could
    leak as a result would be the capabilities in pA, and any attacker
    *already* has those capabilities. This would make me nervous, though --
    setuid binaries that tried to privilege-separate might fail to do so,
    and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
    unexpected side effects. (Whether these unexpected side effects would
    be exploitable is an open question.) I've therefore taken the more
    paranoid route. We can revisit this later.

    An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
    ambient capabilities. I think that this would be annoying and would
    make granting otherwise unprivileged users minor ambient capabilities
    (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
    it is with this patch.

    ===== Footnotes =====

    [1] Files that are missing the "security.capability" xattr or that have
    unrecognized values for that xattr end up with has_cap set to false.
    The code that does that appears to be complicated for no good reason.

    [2] The libcap capability mask parsers and formatters are dangerously
    misleading and the documentation is flat-out wrong. fE is *not* a mask;
    it's a single bit. This has probably confused every single person who
    has tried to use file capabilities.

    [3] Linux very confusingly processes both the script and the interpreter
    if applicable, for reasons that elude me. The results from thinking
    about a script's file capabilities and/or setuid bits are mostly
    discarded.

    Preliminary userspace code is here, but it needs updating:
    https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2

    Here is a test program that can be used to verify the functionality
    (from Christoph):

    /*
    * Test program for the ambient capabilities. This program spawns a shell
    * that allows running processes with a defined set of capabilities.
    *
    * (C) 2015 Christoph Lameter
    * Released under: GPL v3 or later.
    *
    *
    * Compile using:
    *
    * gcc -o ambient_test ambient_test.o -lcap-ng
    *
    * This program must have the following capabilities to run properly:
    * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
    *
    * A command to equip the binary with the right caps is:
    *
    * setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
    *
    *
    * To get a shell with additional caps that can be inherited by other processes:
    *
    * ./ambient_test /bin/bash
    *
    *
    * Verifying that it works:
    *
    * From the bash spawed by ambient_test run
    *
    * cat /proc/$$/status
    *
    * and have a look at the capabilities.
    */

    #include
    #include
    #include
    #include
    #include
    #include

    /*
    * Definitions from the kernel header files. These are going to be removed
    * when the /usr/include files have these defined.
    */
    #define PR_CAP_AMBIENT 47
    #define PR_CAP_AMBIENT_IS_SET 1
    #define PR_CAP_AMBIENT_RAISE 2
    #define PR_CAP_AMBIENT_LOWER 3
    #define PR_CAP_AMBIENT_CLEAR_ALL 4

    static void set_ambient_cap(int cap)
    {
    int rc;

    capng_get_caps_process();
    rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
    if (rc) {
    printf("Cannot add inheritable cap\n");
    exit(2);
    }
    capng_apply(CAPNG_SELECT_CAPS);

    /* Note the two 0s at the end. Kernel checks for these */
    if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
    perror("Cannot set cap");
    exit(1);
    }
    }

    int main(int argc, char **argv)
    {
    int rc;

    set_ambient_cap(CAP_NET_RAW);
    set_ambient_cap(CAP_NET_ADMIN);
    set_ambient_cap(CAP_SYS_NICE);

    printf("Ambient_test forking shell\n");
    if (execv(argv[1], argv + 1))
    perror("Cannot exec");

    return 0;
    }

    Signed-off-by: Christoph Lameter # Original author
    Signed-off-by: Andy Lutomirski
    Acked-by: Serge E. Hallyn
    Acked-by: Kees Cook
    Cc: Jonathan Corbet
    Cc: Aaron Jones
    Cc: Ted Ts'o
    Cc: Andrew G. Morgan
    Cc: Mimi Zohar
    Cc: Austin S Hemmelgarn
    Cc: Markku Savela
    Cc: Jarkko Sakkinen
    Cc: Michael Kerrisk
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

17 Sep, 2014

2 commits

  • Make the key matching functions pointed to by key_match_data::cmp return bool
    rather than int.

    Signed-off-by: David Howells
    Acked-by: Vivek Goyal

    David Howells
     
  • Preparse the match data. This provides several advantages:

    (1) The preparser can reject invalid criteria up front.

    (2) The preparser can convert the criteria to binary data if necessary (the
    asymmetric key type really wants to do binary comparison of the key IDs).

    (3) The preparser can set the type of search to be performed. This means
    that it's not then a one-off setting in the key type.

    (4) The preparser can set an appropriate comparator function.

    Signed-off-by: David Howells
    Acked-by: Vivek Goyal

    David Howells
     

26 Sep, 2013

1 commit


24 Sep, 2013

3 commits

  • Define a __key_get() wrapper to use rather than atomic_inc() on the key usage
    count as this makes it easier to hook in refcount error debugging.

    Signed-off-by: David Howells

    David Howells
     
  • Search functions pass around a bunch of arguments, each of which gets copied
    with each call. Introduce a search context structure to hold these.

    Whilst we're at it, create a search flag that indicates whether the search
    should be directly to the description or whether it should iterate through all
    keys looking for a non-description match.

    This will be useful when keyrings use a generic data struct with generic
    routines to manage their content as the search terms can just be passed
    through to the iterator callback function.

    Also, for future use, the data to be supplied to the match function is
    separated from the description pointer in the search context. This makes it
    clear which is being supplied.

    Signed-off-by: David Howells

    David Howells
     
  • Skip key state checks (invalidation, revocation and expiration) when checking
    for possession. Without this, keys that have been marked invalid, revoked
    keys and expired keys are not given a possession attribute - which means the
    possessor is not granted any possession permits and cannot do anything with
    them unless they also have one a user, group or other permit.

    This causes failures in the keyutils test suite's revocation and expiration
    tests now that commit 96b5c8fea6c0861621051290d705ec2e971963f1 reduced the
    initial permissions granted to a key.

    The failures are due to accesses to revoked and expired keys being given
    EACCES instead of EKEYREVOKED or EKEYEXPIRED.

    Signed-off-by: David Howells

    David Howells
     

12 Mar, 2013

1 commit

  • This fixes CVE-2013-1792.

    There is a race in install_user_keyrings() that can cause a NULL pointer
    dereference when called concurrently for the same user if the uid and
    uid-session keyrings are not yet created. It might be possible for an
    unprivileged user to trigger this by calling keyctl() from userspace in
    parallel immediately after logging in.

    Assume that we have two threads both executing lookup_user_key(), both
    looking for KEY_SPEC_USER_SESSION_KEYRING.

    THREAD A THREAD B
    =============================== ===============================
    ==>call install_user_keyrings();
    if (!cred->user->session_keyring)
    ==>call install_user_keyrings()
    ...
    user->uid_keyring = uid_keyring;
    if (user->uid_keyring)
    return 0;
    user->session_keyring [== NULL]
    user->session_keyring = session_keyring;
    atomic_inc(&key->usage); [oops]

    At the point thread A dereferences cred->user->session_keyring, thread B
    hasn't updated user->session_keyring yet, but thread A assumes it is
    populated because install_user_keyrings() returned ok.

    The race window is really small but can be exploited if, for example,
    thread B is interrupted or preempted after initializing uid_keyring, but
    before doing setting session_keyring.

    This couldn't be reproduced on a stock kernel. However, after placing
    systemtap probe on 'user->session_keyring = session_keyring;' that
    introduced some delay, the kernel could be crashed reliably.

    Fix this by checking both pointers before deciding whether to return.
    Alternatively, the test could be done away with entirely as it is checked
    inside the mutex - but since the mutex is global, that may not be the best
    way.

    Signed-off-by: David Howells
    Reported-by: Mateusz Guzik
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: James Morris

    David Howells
     

04 Mar, 2013

1 commit

  • Dave Jones writes:
    > Just hit this on Linus' current tree.
    >
    > [ 89.621770] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
    > [ 89.623111] IP: [] commit_creds+0x250/0x2f0
    > [ 89.624062] PGD 122bfd067 PUD 122bfe067 PMD 0
    > [ 89.624901] Oops: 0000 [#1] PREEMPT SMP
    > [ 89.625678] Modules linked in: caif_socket caif netrom bridge hidp 8021q garp stp mrp rose llc2 af_rxrpc phonet af_key binfmt_misc bnep l2tp_ppp can_bcm l2tp_core pppoe pppox can_raw scsi_transport_iscsi ppp_generic slhc nfnetlink can ipt_ULOG ax25 decnet irda nfc rds x25 crc_ccitt appletalk atm ipx p8023 psnap p8022 llc lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btusb bluetooth snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm vhost_net snd_page_alloc snd_timer tun macvtap usb_debug snd rfkill microcode macvlan edac_core pcspkr serio_raw kvm_amd soundcore kvm r8169 mii
    > [ 89.637846] CPU 2
    > [ 89.638175] Pid: 782, comm: trinity-main Not tainted 3.8.0+ #63 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
    > [ 89.639850] RIP: 0010:[] [] commit_creds+0x250/0x2f0
    > [ 89.641161] RSP: 0018:ffff880115657eb8 EFLAGS: 00010207
    > [ 89.641984] RAX: 00000000000003e8 RBX: ffff88012688b000 RCX: 0000000000000000
    > [ 89.643069] RDX: 0000000000000000 RSI: ffffffff81c32960 RDI: ffff880105839600
    > [ 89.644167] RBP: ffff880115657ed8 R08: 0000000000000000 R09: 0000000000000000
    > [ 89.645254] R10: 0000000000000001 R11: 0000000000000246 R12: ffff880105839600
    > [ 89.646340] R13: ffff88011beea490 R14: ffff88011beea490 R15: 0000000000000000
    > [ 89.647431] FS: 00007f3ac063b740(0000) GS:ffff88012b200000(0000) knlGS:0000000000000000
    > [ 89.648660] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > [ 89.649548] CR2: 00000000000000c8 CR3: 0000000122bfc000 CR4: 00000000000007e0
    > [ 89.650635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > [ 89.651723] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > [ 89.652812] Process trinity-main (pid: 782, threadinfo ffff880115656000, task ffff88011beea490)
    > [ 89.654128] Stack:
    > [ 89.654433] 0000000000000000 ffff8801058396a0 ffff880105839600 ffff88011beeaa78
    > [ 89.655769] ffff880115657ef8 ffffffff812c7d9b ffffffff82079be0 0000000000000000
    > [ 89.657073] ffff880115657f28 ffffffff8106c665 0000000000000002 ffff880115657f58
    > [ 89.658399] Call Trace:
    > [ 89.658822] [] key_change_session_keyring+0xfb/0x140
    > [ 89.659845] [] task_work_run+0xa5/0xd0
    > [ 89.660698] [] do_notify_resume+0x71/0xb0
    > [ 89.661581] [] int_signal+0x12/0x17
    > [ 89.662385] Code: 24 90 00 00 00 48 8b b3 90 00 00 00 49 8b 4c 24 40 48 39 f2 75 08 e9 83 00 00 00 48 89 ca 48 81 fa 60 29 c3 81 0f 84 41 fe ff ff 8b 8a c8 00 00 00 48 39 ce 75 e4 3b 82 d0 00 00 00 0f 84 4b
    > [ 89.667778] RIP [] commit_creds+0x250/0x2f0
    > [ 89.668733] RSP
    > [ 89.669301] CR2: 00000000000000c8
    >
    > My fastest trinity induced oops yet!
    >
    >
    > Appears to be..
    >
    > if ((set_ns == subset_ns->parent) &&
    > 850: 48 8b 8a c8 00 00 00 mov 0xc8(%rdx),%rcx
    >
    > from the inlined cred_cap_issubset

    By historical accident we have been reading trying to set new->user_ns
    from new->user_ns. Which is totally silly as new->user_ns is NULL (as
    is every other field in new except session_keyring at that point).

    The intent is clearly to copy all of the fields from old to new so copy
    old->user_ns into into new->user_ns.

    Cc: stable@vger.kernel.org
    Reported-by: Dave Jones
    Tested-by: Dave Jones
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

21 Feb, 2013

1 commit

  • A patch to fix some unreachable code in search_my_process_keyrings() got
    applied twice by two different routes upstream as commits e67eab39bee2
    and b010520ab3d2 (both "fix unreachable code").

    Unfortunately, the second application removed something it shouldn't
    have and this wasn't detected by GIT. This is due to the patch not
    having sufficient lines of context to distinguish the two places of
    application.

    The effect of this is relatively minor: inside the kernel, the keyring
    search routines may search multiple keyrings and then prioritise the
    errors if no keys or negative keys are found in any of them. With the
    extra deletion, the presence of a negative key in the thread keyring
    (causing ENOKEY) is incorrectly overridden by an error searching the
    process keyring.

    So revert the second application of the patch.

    Signed-off-by: David Howells
    Cc: Jiri Kosina
    Cc: Andrew Morton
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    David Howells
     

21 Dec, 2012

1 commit


17 Dec, 2012

1 commit

  • Pull security subsystem updates from James Morris:
    "A quiet cycle for the security subsystem with just a few maintenance
    updates."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    Smack: create a sysfs mount point for smackfs
    Smack: use select not depends in Kconfig
    Yama: remove locking from delete path
    Yama: add RCU to drop read locking
    drivers/char/tpm: remove tasklet and cleanup
    KEYS: Use keyring_alloc() to create special keyrings
    KEYS: Reduce initial permissions on keys
    KEYS: Make the session and process keyrings per-thread
    seccomp: Make syscall skipping and nr changes more consistent
    key: Fix resource leak
    keys: Fix unreachable code
    KEYS: Add payload preparsing opportunity prior to key instantiate or update

    Linus Torvalds
     

29 Oct, 2012

1 commit


26 Oct, 2012

1 commit


03 Oct, 2012

2 commits

  • Reduce the initial permissions on new keys to grant the possessor everything,
    view permission only to the user (so the keys can be seen in /proc/keys) and
    nothing else.

    This gives the creator a chance to adjust the permissions mask before other
    processes can access the new key or create a link to it.

    To aid with this, keyring_alloc() now takes a permission argument rather than
    setting the permissions itself.

    The following permissions are now set:

    (1) The user and user-session keyrings grant the user that owns them full
    permissions and grant a possessor everything bar SETATTR.

    (2) The process and thread keyrings grant the possessor full permissions but
    only grant the user VIEW. This permits the user to see them in
    /proc/keys, but not to do anything with them.

    (3) Anonymous session keyrings grant the possessor full permissions, but only
    grant the user VIEW and READ. This means that the user can see them in
    /proc/keys and can list them, but nothing else. Possibly READ shouldn't
    be provided either.

    (4) Named session keyrings grant everything an anonymous session keyring does,
    plus they grant the user LINK permission. The whole point of named
    session keyrings is that others can also subscribe to them. Possibly this
    should be a separate permission to LINK.

    (5) The temporary session keyring created by call_sbin_request_key() gets the
    same permissions as an anonymous session keyring.

    (6) Keys created by add_key() get VIEW, SEARCH, LINK and SETATTR for the
    possessor, plus READ and/or WRITE if the key type supports them. The used
    only gets VIEW now.

    (7) Keys created by request_key() now get the same as those created by
    add_key().

    Reported-by: Lennart Poettering
    Reported-by: Stef Walter
    Signed-off-by: David Howells

    David Howells
     
  • Make the session keyring per-thread rather than per-process, but still
    inherited from the parent thread to solve a problem with PAM and gdm.

    The problem is that join_session_keyring() will reject attempts to change the
    session keyring of a multithreaded program but gdm is now multithreaded before
    it gets to the point of starting PAM and running pam_keyinit to create the
    session keyring. See:

    https://bugs.freedesktop.org/show_bug.cgi?id=49211

    The reason that join_session_keyring() will only change the session keyring
    under a single-threaded environment is that it's hard to alter the other
    thread's credentials to effect the change in a multi-threaded program. The
    problems are such as:

    (1) How to prevent two threads both running join_session_keyring() from
    racing.

    (2) Another thread's credentials may not be modified directly by this process.

    (3) The number of threads is uncertain whilst we're not holding the
    appropriate spinlock, making preallocation slightly tricky.

    (4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get
    another thread to replace its keyring, but that means preallocating for
    each thread.

    A reasonable way around this is to make the session keyring per-thread rather
    than per-process and just document that if you want a common session keyring,
    you must get it before you spawn any threads - which is the current situation
    anyway.

    Whilst we're at it, we can the process keyring behave in the same way. This
    means we can clean up some of the ickyness in the creds code.

    Basically, after this patch, the session, process and thread keyrings are about
    inheritance rules only and not about sharing changes of keyring.

    Reported-by: Mantas M.
    Signed-off-by: David Howells
    Tested-by: Ray Strode

    David Howells
     

28 Sep, 2012

1 commit


14 Sep, 2012

1 commit

  • - Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
    - Use from_kuid to generate key descriptions
    - Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
    - Avoid potential problems with file descriptor passing by displaying
    keys in the user namespace of the opener of key status proc files.

    Cc: linux-security-module@vger.kernel.org
    Cc: keyrings@linux-nfs.org
    Cc: David Howells
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

23 Jul, 2012

2 commits


24 May, 2012

2 commits

  • Change keyctl_session_to_parent() to use task_work_add() and move
    key_replace_session_keyring() logic into task_work->func().

    Note that we do task_work_cancel() before task_work_add() to ensure that
    only one work can be pending at any time. This is important, we must not
    allow user-space to abuse the parent's ->task_works list.

    The callback, replace_session_keyring(), checks PF_EXITING. I guess this
    is not really needed but looks better.

    As a side effect, this fixes the (unlikely) race. The callers of
    key_replace_session_keyring() and keyctl_session_to_parent() lack the
    necessary barriers, the parent can miss the request.

    Now we can remove task_struct->replacement_session_keyring and related
    code.

    Signed-off-by: Oleg Nesterov
    Acked-by: David Howells
    Cc: Thomas Gleixner
    Cc: Richard Kuo
    Cc: Linus Torvalds
    Cc: Alexander Gordeev
    Cc: Chris Zankel
    Cc: David Smith
    Cc: "Frank Ch. Eigler"
    Cc: Geert Uytterhoeven
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     

11 May, 2012

1 commit

  • Do an LRU discard in keyrings that are full rather than returning ENFILE. To
    perform this, a time_t is added to the key struct and updated by the creation
    of a link to a key and by a key being found as the result of a search. At the
    completion of a successful search, the keyrings in the path between the root of
    the search and the first found link to it also have their last-used times
    updated.

    Note that discarding a link to a key from a keyring does not necessarily
    destroy the key as there may be references held by other places.

    An alternate discard method that might suffice is to perform FIFO discard from
    the keyring, using the spare 2-byte hole in the keylist header as the index of
    the next link to be discarded.

    This is useful when using a keyring as a cache for DNS results or foreign
    filesystem IDs.

    This can be tested by the following. As root do:

    echo 1000 >/proc/sys/kernel/keys/root_maxkeys

    kr=`keyctl newring foo @s`
    for ((i=0; i

    David Howells
     

08 Apr, 2012

2 commits


07 Mar, 2012

1 commit

  • The test for "if (cred->request_key_auth->flags & KEY_FLAG_REVOKED) {"
    should actually testing that the (1 << KEY_FLAG_REVOKED) bit is set.
    The current code actually checks for KEY_FLAG_DEAD.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    Dan Carpenter
     

23 Aug, 2011

2 commits

  • The keyctl call:

    keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1)

    should create a session keyring if the process doesn't have one of its own
    because the create flag argument is set - rather than subscribing to and
    returning the user-session keyring as:

    keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0)

    will do.

    This can be tested by commenting out pam_keyinit in the /etc/pam.d files and
    running the following program a couple of times in a row:

    #include
    #include
    #include
    int main(int argc, char *argv[])
    {
    key_serial_t uk, usk, sk, nsk;
    uk = keyctl_get_keyring_ID(KEY_SPEC_USER_KEYRING, 0);
    usk = keyctl_get_keyring_ID(KEY_SPEC_USER_SESSION_KEYRING, 0);
    sk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0);
    nsk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1);
    printf("keys: %08x %08x %08x %08x\n", uk, usk, sk, nsk);
    return 0;
    }

    Without this patch, I see:

    keys: 3975ddc7 119c0c66 119c0c66 119c0c66
    keys: 3975ddc7 119c0c66 119c0c66 119c0c66

    With this patch, I see:

    keys: 2cb4997b 34112878 34112878 17db2ce3
    keys: 2cb4997b 34112878 34112878 39f3c73e

    As can be seen, the session keyring starts off the same as the user-session
    keyring each time, but with the patch a new session keyring is created when
    the create flag is set.

    Reported-by: Greg Wettstein
    Signed-off-by: David Howells
    Tested-by: Greg Wettstein
    Signed-off-by: James Morris

    David Howells
     
  • If install_session_keyring() is given a keyring, it should install it rather
    than just creating a new one anyway. This was accidentally broken in:

    commit d84f4f992cbd76e8f39c488cf0c5d123843923b1
    Author: David Howells
    Date: Fri Nov 14 10:39:23 2008 +1100
    Subject: CRED: Inaugurate COW credentials

    The impact of that commit is that pam_keyinit no longer works correctly if
    'force' isn't specified against a login process. This is because:

    keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0)

    now always creates a new session keyring and thus the check whether the session
    keyring and the user-session keyring are the same is always false. This leads
    pam_keyinit to conclude that a session keyring is installed and it shouldn't be
    revoked by pam_keyinit here if 'revoke' is specified.

    Any system that specifies 'force' against pam_keyinit in the PAM configuration
    files for login methods (login, ssh, su -l, kdm, etc.) is not affected since
    that bypasses the broken check and forces the creation of a new session keyring
    anyway (for which the revoke flag is not cleared) - and any subsequent call to
    pam_keyinit really does have a session keyring already installed, and so the
    check works correctly there.

    Reverting to the previous behaviour will cause the kernel to subscribe the
    process to the user-session keyring as its session keyring if it doesn't have a
    session keyring of its own. pam_keyinit will detect this and install a new
    session keyring anyway (and won't clear the revert flag).

    This can be tested by commenting out pam_keyinit in the /etc/pam.d files and
    running the following program a couple of times in a row:

    #include
    #include
    #include
    int main(int argc, char *argv[])
    {
    key_serial_t uk, usk, sk;
    uk = keyctl_get_keyring_ID(KEY_SPEC_USER_KEYRING, 0);
    usk = keyctl_get_keyring_ID(KEY_SPEC_USER_SESSION_KEYRING, 0);
    sk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0);
    printf("keys: %08x %08x %08x\n", uk, usk, sk);
    return 0;
    }

    Without the patch, I see:

    keys: 3884e281 24c4dfcf 22825f8e
    keys: 3884e281 24c4dfcf 068772be

    With the patch, I see:

    keys: 26be9c83 0e755ce0 0e755ce0
    keys: 26be9c83 0e755ce0 0e755ce0

    As can be seen, with the patch, the session keyring is the same as the
    user-session keyring each time; without the patch a new session keyring is
    generated each time.

    Reported-by: Greg Wettstein
    Signed-off-by: David Howells
    Tested-by: Greg Wettstein
    Signed-off-by: James Morris

    David Howells
     

27 May, 2011

1 commit

  • Since this cred was not created with copy_creds(), it needs to get
    initialized. Otherwise use of syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
    can lead to a NULL deref. Thanks to Robert for finding this.

    But introduced by commit 47a150edc2a ("Cache user_ns in struct cred").

    Signed-off-by: Serge E. Hallyn
    Reported-by: Robert Święcki
    Cc: David Howells
    Cc: stable@kernel.org (2.6.39)
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

17 Mar, 2011

1 commit

  • Improve /proc/keys by:

    (1) Don't attempt to summarise the payload of a negated key. It won't have
    one. To this end, a helper function - key_is_instantiated() has been
    added that allows the caller to find out whether the key is positively
    instantiated (as opposed to being uninstantiated or negatively
    instantiated).

    (2) Do show keys that are negative, expired or revoked rather than hiding
    them. This requires an override flag (no_state_check) to be passed to
    search_my_process_keyrings() and keyring_search_aux() to suppress this
    check.

    Without this, keys that are possessed by the caller, but only grant
    permissions to the caller if possessed are skipped as the possession check
    fails.

    Keys that are visible due to user, group or other checks are visible with
    or without this patch.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

22 Jan, 2011

2 commits

  • Fix up comments in the key management code. No functional changes.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Do a bit of a style clean up in the key management code. No functional
    changes.

    Done using:

    perl -p -i -e 's!^/[*]*/\n!!' security/keys/*.c
    perl -p -i -e 's!} /[*] end [a-z0-9_]*[(][)] [*]/\n!}\n!' security/keys/*.c
    sed -i -s -e ": next" -e N -e 's/^\n[}]$/}/' -e t -e P -e 's/^.*\n//' -e "b next" security/keys/*.c

    To remove /*****/ lines, remove comments on the closing brace of a
    function to name the function and remove blank lines before the closing
    brace of a function.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

29 Oct, 2010

1 commit


02 Aug, 2010

1 commit

  • Make /proc/keys check to see if the calling process possesses each key before
    performing the security check. The possession check can be skipped if the key
    doesn't have the possessor-view permission bit set.

    This causes the keys a process possesses to show up in /proc/keys, even if they
    don't have matching user/group/other view permissions.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

28 May, 2010

1 commit

  • call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
    subprocess_info->cred in advance. Now that we have info->init() we can
    change this code to set tgcred->session_keyring in context of execing
    kernel thread.

    Note: since currently call_usermodehelper_keys() is never called with
    UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
    are not really needed, we could rely on install_session_keyring_to_cred()
    which does key_get() on success.

    Signed-off-by: Oleg Nesterov
    Acked-by: Neil Horman
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

18 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo