25 Sep, 2015

1 commit

  • There appears to be a race between:

    (1) key_gc_unused_keys() which frees key->security and then calls
    keyring_destroy() to unlink the name from the name list

    (2) find_keyring_by_name() which calls key_permission(), thus accessing
    key->security, on a key before checking to see whether the key usage is 0
    (ie. the key is dead and might be cleaned up).

    Fix this by calling ->destroy() before cleaning up the core key data -
    including key->security.

    Reported-by: Petr Matousek
    Signed-off-by: David Howells

    David Howells
     

17 Sep, 2015

1 commit


12 Sep, 2015

1 commit


11 Sep, 2015

1 commit

  • With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
    structs should be constant.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Oleg Nesterov
    Cc: "H. Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Cc: Minchan Kim
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

09 Sep, 2015

1 commit

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - PKCS#7 support added to support signed kexec, also utilized for
    module signing. See comments in 3f1e1bea.

    ** NOTE: this requires linking against the OpenSSL library, which
    must be installed, e.g. the openssl-devel on Fedora **

    - Smack
    - add IPv6 host labeling; ignore labels on kernel threads
    - support smack labeling mounts which use binary mount data

    - SELinux:
    - add ioctl whitelisting (see
    http://kernsec.org/files/lss2015/vanderstoep.pdf)
    - fix mprotect PROT_EXEC regression caused by mm change

    - Seccomp:
    - add ptrace options for suspend/resume"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
    PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
    Documentation/Changes: Now need OpenSSL devel packages for module signing
    scripts: add extract-cert and sign-file to .gitignore
    modsign: Handle signing key in source tree
    modsign: Use if_changed rule for extracting cert from module signing key
    Move certificate handling to its own directory
    sign-file: Fix warning about BIO_reset() return value
    PKCS#7: Add MODULE_LICENSE() to test module
    Smack - Fix build error with bringup unconfigured
    sign-file: Document dependency on OpenSSL devel libraries
    PKCS#7: Appropriately restrict authenticated attributes and content type
    KEYS: Add a name for PKEY_ID_PKCS7
    PKCS#7: Improve and export the X.509 ASN.1 time object decoder
    modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
    extract-cert: Cope with multiple X.509 certificates in a single file
    sign-file: Generate CMS message as signature instead of PKCS#7
    PKCS#7: Support CMS messages also [RFC5652]
    X.509: Change recorded SKID & AKID to not include Subject or Issuer
    PKCS#7: Check content type and versions
    MAINTAINERS: The keyrings mailing list has moved
    ...

    Linus Torvalds
     

05 Sep, 2015

3 commits

  • Many file systems that implement the show_options hook fail to correctly
    escape their output which could lead to unescaped characters (e.g. new
    lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
    could lead to confusion, spoofed entries (resulting in things like
    systemd issuing false d-bus "mount" notifications), and who knows what
    else. This looks like it would only be the root user stepping on
    themselves, but it's possible weird things could happen in containers or
    in other situations with delegated mount privileges.

    Here's an example using overlay with setuid fusermount trusting the
    contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
    of "sudo" is something more sneaky:

    $ BASE="ovl"
    $ MNT="$BASE/mnt"
    $ LOW="$BASE/lower"
    $ UP="$BASE/upper"
    $ WORK="$BASE/work/ 0 0
    none /proc fuse.pwn user_id=1000"
    $ mkdir -p "$LOW" "$UP" "$WORK"
    $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
    $ cat /proc/mounts
    none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
    none /proc fuse.pwn user_id=1000 0 0
    $ fusermount -u /proc
    $ cat /proc/mounts
    cat: /proc/mounts: No such file or directory

    This fixes the problem by adding new seq_show_option and
    seq_show_option_n helpers, and updating the vulnerable show_option
    handlers to use them as needed. Some, like SELinux, need to be open
    coded due to unusual existing escape mechanisms.

    [akpm@linux-foundation.org: add lost chunk, per Kees]
    [keescook@chromium.org: seq_show_option should be using const parameters]
    Signed-off-by: Kees Cook
    Acked-by: Serge Hallyn
    Acked-by: Jan Kara
    Acked-by: Paul Moore
    Cc: J. R. Okajima
    Signed-off-by: Kees Cook
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Per Andrew Morgan's request, add a securebit to allow admins to disable
    PR_CAP_AMBIENT_RAISE. This securebit will prevent processes from adding
    capabilities to their ambient set.

    For simplicity, this disables PR_CAP_AMBIENT_RAISE entirely rather than
    just disabling setting previously cleared bits.

    Signed-off-by: Andy Lutomirski
    Acked-by: Andrew G. Morgan
    Acked-by: Serge Hallyn
    Cc: Kees Cook
    Cc: Christoph Lameter
    Cc: Serge Hallyn
    Cc: Jonathan Corbet
    Cc: Aaron Jones
    Cc: Ted Ts'o
    Cc: Andrew G. Morgan
    Cc: Mimi Zohar
    Cc: Austin S Hemmelgarn
    Cc: Markku Savela
    Cc: Jarkko Sakkinen
    Cc: Michael Kerrisk
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Credit where credit is due: this idea comes from Christoph Lameter with
    a lot of valuable input from Serge Hallyn. This patch is heavily based
    on Christoph's patch.

    ===== The status quo =====

    On Linux, there are a number of capabilities defined by the kernel. To
    perform various privileged tasks, processes can wield capabilities that
    they hold.

    Each task has four capability masks: effective (pE), permitted (pP),
    inheritable (pI), and a bounding set (X). When the kernel checks for a
    capability, it checks pE. The other capability masks serve to modify
    what capabilities can be in pE.

    Any task can remove capabilities from pE, pP, or pI at any time. If a
    task has a capability in pP, it can add that capability to pE and/or pI.
    If a task has CAP_SETPCAP, then it can add any capability to pI, and it
    can remove capabilities from X.

    Tasks are not the only things that can have capabilities; files can also
    have capabilities. A file can have no capabilty information at all [1].
    If a file has capability information, then it has a permitted mask (fP)
    and an inheritable mask (fI) as well as a single effective bit (fE) [2].
    File capabilities modify the capabilities of tasks that execve(2) them.

    A task that successfully calls execve has its capabilities modified for
    the file ultimately being excecuted (i.e. the binary itself if that
    binary is ELF or for the interpreter if the binary is a script.) [3] In
    the capability evolution rules, for each mask Z, pZ represents the old
    value and pZ' represents the new value. The rules are:

    pP' = (X & fP) | (pI & fI)
    pI' = pI
    pE' = (fE ? pP' : 0)
    X is unchanged

    For setuid binaries, fP, fI, and fE are modified by a moderately
    complicated set of rules that emulate POSIX behavior. Similarly, if
    euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
    (primary, fP and fI usually end up being the full set). For nonroot
    users executing binaries with neither setuid nor file caps, fI and fP
    are empty and fE is false.

    As an extra complication, if you execute a process as nonroot and fE is
    set, then the "secure exec" rules are in effect: AT_SECURE gets set,
    LD_PRELOAD doesn't work, etc.

    This is rather messy. We've learned that making any changes is
    dangerous, though: if a new kernel version allows an unprivileged
    program to change its security state in a way that persists cross
    execution of a setuid program or a program with file caps, this
    persistent state is surprisingly likely to allow setuid or file-capped
    programs to be exploited for privilege escalation.

    ===== The problem =====

    Capability inheritance is basically useless.

    If you aren't root and you execute an ordinary binary, fI is zero, so
    your capabilities have no effect whatsoever on pP'. This means that you
    can't usefully execute a helper process or a shell command with elevated
    capabilities if you aren't root.

    On current kernels, you can sort of work around this by setting fI to
    the full set for most or all non-setuid executable files. This causes
    pP' = pI for nonroot, and inheritance works. No one does this because
    it's a PITA and it isn't even supported on most filesystems.

    If you try this, you'll discover that every nonroot program ends up with
    secure exec rules, breaking many things.

    This is a problem that has bitten many people who have tried to use
    capabilities for anything useful.

    ===== The proposed change =====

    This patch adds a fifth capability mask called the ambient mask (pA).
    pA does what most people expect pI to do.

    pA obeys the invariant that no bit can ever be set in pA if it is not
    set in both pP and pI. Dropping a bit from pP or pI drops that bit from
    pA. This ensures that existing programs that try to drop capabilities
    still do so, with a complication. Because capability inheritance is so
    broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
    then calling execve effectively drops capabilities. Therefore,
    setresuid from root to nonroot conditionally clears pA unless
    SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can
    re-add bits to pA afterwards.

    The capability evolution rules are changed:

    pA' = (file caps or setuid or setgid ? 0 : pA)
    pP' = (X & fP) | (pI & fI) | pA'
    pI' = pI
    pE' = (fE ? pP' : pA')
    X is unchanged

    If you are nonroot but you have a capability, you can add it to pA. If
    you do so, your children get that capability in pA, pP, and pE. For
    example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
    automatically bind low-numbered ports. Hallelujah!

    Unprivileged users can create user namespaces, map themselves to a
    nonzero uid, and create both privileged (relative to their namespace)
    and unprivileged process trees. This is currently more or less
    impossible. Hallelujah!

    You cannot use pA to try to subvert a setuid, setgid, or file-capped
    program: if you execute any such program, pA gets cleared and the
    resulting evolution rules are unchanged by this patch.

    Users with nonzero pA are unlikely to unintentionally leak that
    capability. If they run programs that try to drop privileges, dropping
    privileges will still work.

    It's worth noting that the degree of paranoia in this patch could
    possibly be reduced without causing serious problems. Specifically, if
    we allowed pA to persist across executing non-pA-aware setuid binaries
    and across setresuid, then, naively, the only capabilities that could
    leak as a result would be the capabilities in pA, and any attacker
    *already* has those capabilities. This would make me nervous, though --
    setuid binaries that tried to privilege-separate might fail to do so,
    and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
    unexpected side effects. (Whether these unexpected side effects would
    be exploitable is an open question.) I've therefore taken the more
    paranoid route. We can revisit this later.

    An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
    ambient capabilities. I think that this would be annoying and would
    make granting otherwise unprivileged users minor ambient capabilities
    (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
    it is with this patch.

    ===== Footnotes =====

    [1] Files that are missing the "security.capability" xattr or that have
    unrecognized values for that xattr end up with has_cap set to false.
    The code that does that appears to be complicated for no good reason.

    [2] The libcap capability mask parsers and formatters are dangerously
    misleading and the documentation is flat-out wrong. fE is *not* a mask;
    it's a single bit. This has probably confused every single person who
    has tried to use file capabilities.

    [3] Linux very confusingly processes both the script and the interpreter
    if applicable, for reasons that elude me. The results from thinking
    about a script's file capabilities and/or setuid bits are mostly
    discarded.

    Preliminary userspace code is here, but it needs updating:
    https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2

    Here is a test program that can be used to verify the functionality
    (from Christoph):

    /*
    * Test program for the ambient capabilities. This program spawns a shell
    * that allows running processes with a defined set of capabilities.
    *
    * (C) 2015 Christoph Lameter
    * Released under: GPL v3 or later.
    *
    *
    * Compile using:
    *
    * gcc -o ambient_test ambient_test.o -lcap-ng
    *
    * This program must have the following capabilities to run properly:
    * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
    *
    * A command to equip the binary with the right caps is:
    *
    * setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
    *
    *
    * To get a shell with additional caps that can be inherited by other processes:
    *
    * ./ambient_test /bin/bash
    *
    *
    * Verifying that it works:
    *
    * From the bash spawed by ambient_test run
    *
    * cat /proc/$$/status
    *
    * and have a look at the capabilities.
    */

    #include
    #include
    #include
    #include
    #include
    #include

    /*
    * Definitions from the kernel header files. These are going to be removed
    * when the /usr/include files have these defined.
    */
    #define PR_CAP_AMBIENT 47
    #define PR_CAP_AMBIENT_IS_SET 1
    #define PR_CAP_AMBIENT_RAISE 2
    #define PR_CAP_AMBIENT_LOWER 3
    #define PR_CAP_AMBIENT_CLEAR_ALL 4

    static void set_ambient_cap(int cap)
    {
    int rc;

    capng_get_caps_process();
    rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
    if (rc) {
    printf("Cannot add inheritable cap\n");
    exit(2);
    }
    capng_apply(CAPNG_SELECT_CAPS);

    /* Note the two 0s at the end. Kernel checks for these */
    if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
    perror("Cannot set cap");
    exit(1);
    }
    }

    int main(int argc, char **argv)
    {
    int rc;

    set_ambient_cap(CAP_NET_RAW);
    set_ambient_cap(CAP_NET_ADMIN);
    set_ambient_cap(CAP_SYS_NICE);

    printf("Ambient_test forking shell\n");
    if (execv(argv[1], argv + 1))
    perror("Cannot exec");

    return 0;
    }

    Signed-off-by: Christoph Lameter # Original author
    Signed-off-by: Andy Lutomirski
    Acked-by: Serge E. Hallyn
    Acked-by: Kees Cook
    Cc: Jonathan Corbet
    Cc: Aaron Jones
    Cc: Ted Ts'o
    Cc: Andrew G. Morgan
    Cc: Mimi Zohar
    Cc: Austin S Hemmelgarn
    Cc: Markku Savela
    Cc: Jarkko Sakkinen
    Cc: Michael Kerrisk
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

04 Sep, 2015

1 commit

  • f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()")
    introduced a bug by incorrectly inverting the condition when moving from
    rcu_lockdep_assert() to RCU_LOCKDEP_WARN(). This commit therefore fixes
    the inversion.

    Reported-by: Felipe Balbi
    Reported-by: Tejun Heo
    Signed-off-by: Paul E. McKenney
    Acked-by: Serge Hallyn
    Tested-by: Josh Boyer

    Paul E. McKenney
     

02 Sep, 2015

1 commit

  • Pull user namespace updates from Eric Biederman:
    "This finishes up the changes to ensure proc and sysfs do not start
    implementing executable files, as the there are application today that
    are only secure because such files do not exist.

    It akso fixes a long standing misfeature of /proc//mountinfo that
    did not show the proper source for files bind mounted from
    /proc//ns/*.

    It also straightens out the handling of clone flags related to user
    namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER)
    when files such as /proc//environ are read while is calling
    unshare. This winds up fixing a minor bug in unshare flag handling
    that dates back to the first version of unshare in the kernel.

    Finally, this fixes a minor regression caused by the introduction of
    sysfs_create_mount_point, which broke someone's in house application,
    by restoring the size of /sys/fs/cgroup to 0 bytes. Apparently that
    application uses the directory size to determine if a tmpfs is mounted
    on /sys/fs/cgroup.

    The bind mount escape fixes are present in Al Viros for-next branch.
    and I expect them to come from there. The bind mount escape is the
    last of the user namespace related security bugs that I am aware of"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    fs: Set the size of empty dirs to 0.
    userns,pidns: Force thread group sharing, not signal handler sharing.
    unshare: Unsharing a thread does not require unsharing a vm
    nsfs: Add a show_path method to fix mountinfo
    mnt: fs_fully_visible enforce noexec and nosuid if !SB_I_NOEXEC
    vfs: Commit to never having exectuables on proc and sysfs.

    Linus Torvalds
     

01 Sep, 2015

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main RCU changes in this cycle are:

    - the combination of tree geometry-initialization simplifications and
    OS-jitter-reduction changes to expedited grace periods. These two
    are stacked due to the large number of conflicts that would
    otherwise result.

    - privatize smp_mb__after_unlock_lock().

    This commit moves the definition of smp_mb__after_unlock_lock() to
    kernel/rcu/tree.h, in recognition of the fact that RCU is the only
    thing using this, that nothing else is likely to use it, and that
    it is likely to go away completely.

    - documentation updates.

    - torture-test updates.

    - misc fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    rcu,locking: Privatize smp_mb__after_unlock_lock()
    rcu: Silence lockdep false positive for expedited grace periods
    rcu: Don't disable CPU hotplug during OOM notifiers
    scripts: Make checkpatch.pl warn on expedited RCU grace periods
    rcu: Update MAINTAINERS entry
    rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
    rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
    rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
    rcu: Make rcu_is_watching() really notrace
    cpu: Wait for RCU grace periods concurrently
    rcu: Create a synchronize_rcu_mult()
    rcu: Fix obsolete priority-boosting comment
    rcu: Use WRITE_ONCE in RCU_INIT_POINTER
    rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
    rcu: Add RCU-sched flavors of get-state and cond-sync
    rcu: Add fastpath bypassing funnel locking
    rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
    rcu: Pull out wait_event*() condition into helper function
    documentation: Describe new expedited stall warnings
    rcu: Add stall warnings to synchronize_sched_expedited()
    ...

    Linus Torvalds
     

26 Aug, 2015

1 commit

  • While in most cases commit b1d9e6b064 ("LSM: Switch to lists of hooks")
    retained previous error returns, in three cases it altered them without
    any explanation in the commit message. Restore all of them - in the
    security_old_inode_init_security() case this led to reiserfs using
    uninitialized data, sooner or later crashing the system (the only other
    user of this function - ocfs2 - was unaffected afaict, since it passes
    pre-initialized structures).

    Signed-off-by: Jan Beulich
    Signed-off-by: Casey Schaufler
    Signed-off-by: James Morris

    Jan Beulich
     

15 Aug, 2015

1 commit


14 Aug, 2015

1 commit


13 Aug, 2015

1 commit


12 Aug, 2015

1 commit

  • …k/linux-rcu into core/rcu

    Pull RCU changes from Paul E. McKenney:

    - The combination of tree geometry-initialization simplifications
    and OS-jitter-reduction changes to expedited grace periods.
    These two are stacked due to the large number of conflicts
    that would otherwise result.

    [ With one addition, a temporary commit to silence a lockdep false
    positive. Additional changes to the expedited grace-period
    primitives (queued for 4.4) remove the cause of this false
    positive, and therefore include a revert of this temporary commit. ]

    - Documentation updates.

    - Torture-test updates.

    - Miscellaneous fixes.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

11 Aug, 2015

2 commits


03 Aug, 2015

1 commit


01 Aug, 2015

1 commit


28 Jul, 2015

3 commits

  • IPv6 appears to be (finally) coming of age with the
    influx of autonomous devices. In support of this, add
    the ability to associate a Smack label with IPv6 addresses.

    This patch also cleans up some of the conditional
    compilation associated with the introduction of
    secmark processing. It's now more obvious which bit
    of code goes with which feature.

    Signed-off-by: Casey Schaufler

    Casey Schaufler
     
  • Now that minor LSMs can cleanly stack with major LSMs, remove the unneeded
    config for Yama to be made to explicitly stack. Just selecting the main
    Yama CONFIG will allow it to work, regardless of the major LSM. Since
    distros using Yama are already forcing it to stack, this is effectively
    a no-op change.

    Additionally add MAINTAINERS entry.

    Signed-off-by: Kees Cook
    Signed-off-by: James Morris

    Kees Cook
     
  • __key_link_end is not freeing the associated array edit structure
    and this leads to a 512 byte memory leak each time an identical
    existing key is added with add_key().

    The reason the add_key() system call returns okay is that
    key_create_or_update() calls __key_link_begin() before checking to see
    whether it can update a key directly rather than adding/replacing - which
    it turns out it can. Thus __key_link() is not called through
    __key_instantiate_and_link() and __key_link_end() must cancel the edit.

    CVE-2015-1333

    Signed-off-by: Colin Ian King
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    Colin Ian King
     

23 Jul, 2015

3 commits

  • This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
    consistency with the WARN() series of macros. This also requires
    inverting the sense of the conditional, which this commit also does.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Ingo Molnar

    Paul E. McKenney
     
  • security/smack/smackfs.c:2251:1-4: WARNING: end returns can be
    simpified and declaration on line 2250 can be dropped

    Simplify a trivial if-return sequence. Possibly combine with a
    preceding function call.

    Generated by: scripts/coccinelle/misc/simple_return.cocci

    Signed-off-by: Fengguang Wu
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler

    kbuild test robot
     
  • Add support for setting smack mount labels(using smackfsdef, smackfsroot,
    smackfshat, smackfsfloor, smackfstransmute) for filesystems with binary
    mount data like NFS.

    To achieve this, implement sb_parse_opts_str and sb_set_mnt_opts security
    operations in smack LSM similar to SELinux.

    Signed-off-by: Vivek Trivedi
    Signed-off-by: Amit Sahrawat
    Acked-by: Casey Schaufler

    Vivek Trivedi
     

14 Jul, 2015

6 commits

  • Create a common helper function to determine the label for a new inode.
    This is then used by:

    - may_create()
    - selinux_dentry_init_security()
    - selinux_inode_init_security()

    This will change the behaviour of the functions slightly, bringing them
    all into line.

    Suggested-by: Stephen Smalley
    Signed-off-by: David Howells
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    David Howells
     
  • Ensure that we catch any cases where tclass == 0.

    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Stephen Smalley
     
  • Initialize the security class of sock security structures
    to the generic socket class. This is similar to what is
    already done in inode_alloc_security for files. Generally
    the sclass field will later by set by socket_post_create
    or sk_clone or sock_graft, but for protocol implementations
    that fail to call any of these for newly accepted sockets,
    we want some sane default that will yield a legitimate
    avc denied message with non-garbage values for class and
    permission.

    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Stephen Smalley
     
  • The inode_free_security() function just took the superblock's isec_lock
    before checking and trying to remove the inode security struct from the
    linked list. In many cases, the list was empty and so the lock taking
    is wasteful as no useful work is done. On multi-socket systems with
    a large number of CPUs, there can also be a fair amount of spinlock
    contention on the isec_lock if many tasks are exiting at the same time.

    This patch changes the code to check the state of the list first before
    taking the lock and attempting to dequeue it. The list_del_init()
    can be called more than once on the same list with no harm as long
    as they are properly serialized. It should not be possible to have
    inode_free_security() called concurrently with list_add(). For better
    safety, however, we use list_empty_careful() here even though it is
    still not completely safe in case that happens.

    Signed-off-by: Waiman Long
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Waiman Long
     
  • Add extended permissions logic to selinux. Extended permissions
    provides additional permissions in 256 bit increments. Extend the
    generic ioctl permission check to use the extended permissions for
    per-command filtering. Source/target/class sets including the ioctl
    permission may additionally include a set of commands. Example:

    allowxperm : ioctl unpriv_app_socket_cmds
    auditallowxperm : ioctl priv_gpu_cmds

    Where unpriv_app_socket_cmds and priv_gpu_cmds are macros
    representing commonly granted sets of ioctl commands.

    When ioctl commands are omitted only the permissions are checked.
    This feature is intended to provide finer granularity for the ioctl
    permission that may be too imprecise. For example, the same driver
    may use ioctls to provide important and benign functionality such as
    driver version or socket type as well as dangerous capabilities such
    as debugging features, read/write/execute to physical memory or
    access to sensitive data. Per-command filtering provides a mechanism
    to reduce the attack surface of the kernel, and limit applications
    to the subset of commands required.

    The format of the policy binary has been modified to include ioctl
    commands, and the policy version number has been incremented to
    POLICYDB_VERSION_XPERMS_IOCTL=30 to account for the format
    change.

    The extended permissions logic is deliberately generic to allow
    components to be reused e.g. netlink filters

    Signed-off-by: Jeff Vander Stoep
    Acked-by: Nick Kralevich
    Signed-off-by: Paul Moore

    Jeff Vander Stoep
     
  • Add information about ioctl calls to the LSM audit data. Log the
    file path and command number.

    Signed-off-by: Jeff Vander Stoep
    Acked-by: Nick Kralevich
    [PM: subject line tweak]
    Signed-off-by: Paul Moore

    Jeff Vander Stoep
     

11 Jul, 2015

2 commits

  • James Morris
     
  • commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup
    skip security check and lockdep conflict with XFS") caused a regression
    for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
    shared anonymous mappings. However, even before that regression, the
    checking on such mprotect PROT_EXEC calls was inconsistent with the
    checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a
    mmap, the security hook is passed a NULL file and knows it is dealing
    with an anonymous mapping and therefore applies an execmem check and no
    file checks. On a mprotect, the security hook is passed a vma with a
    non-NULL vm_file (as this was set from the internally-created shmem
    file during mmap) and therefore applies the file-based execute check
    and no execmem check. Since the aforementioned commit now marks the
    shmem zero inode with the S_PRIVATE flag, the file checks are disabled
    and we have no checking at all on mprotect PROT_EXEC. Add a test to
    the mprotect hook logic for such private inodes, and apply an execmem
    check in that case. This makes the mmap and mprotect checking
    consistent for shared anonymous mappings, as well as for /dev/zero and
    ashmem.

    Cc: # 4.1.x
    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Stephen Smalley
     

10 Jul, 2015

2 commits

  • Today proc and sysfs do not contain any executable files. Several
    applications today mount proc or sysfs without noexec and nosuid and
    then depend on there being no exectuables files on proc or sysfs.
    Having any executable files show on proc or sysfs would cause
    a user space visible regression, and most likely security problems.

    Therefore commit to never allowing executables on proc and sysfs by
    adding a new flag to mark them as filesystems without executables and
    enforce that flag.

    Test the flag where MNT_NOEXEC is tested today, so that the only user
    visible effect will be that exectuables will be treated as if the
    execute bit is cleared.

    The filesystems proc and sysfs do not currently incoporate any
    executable files so this does not result in any user visible effects.

    This makes it unnecessary to vet changes to proc and sysfs tightly for
    adding exectuable files or changes to chattr that would modify
    existing files, as no matter what the individual file say they will
    not be treated as exectuable files by the vfs.

    Not having to vet changes to closely is important as without this we
    are only one proc_create call (or another goof up in the
    implementation of notify_change) from having problematic executables
    on proc. Those mistakes are all too easy to make and would create
    a situation where there are security issues or the assumptions of
    some program having to be broken (and cause userspace regressions).

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • At present we don't create efficient ebitmaps when importing NetLabel
    category bitmaps. This can present a problem when comparing ebitmaps
    since ebitmap_cmp() is very strict about these things and considers
    these wasteful ebitmaps not equal when compared to their more
    efficient counterparts, even if their values are the same. This isn't
    likely to cause problems on 64-bit systems due to a bit of luck on
    how NetLabel/CIPSO works and the default ebitmap size, but it can be
    a problem on 32-bit systems.

    This patch fixes this problem by being a bit more intelligent when
    importing NetLabel category bitmaps by skipping over empty sections
    which should result in a nice, efficient ebitmap.

    Cc: stable@vger.kernel.org # 3.17
    Signed-off-by: Paul Moore

    Paul Moore
     

05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

04 Jul, 2015

1 commit

  • Pull user namespace updates from Eric Biederman:
    "Long ago and far away when user namespaces where young it was realized
    that allowing fresh mounts of proc and sysfs with only user namespace
    permissions could violate the basic rule that only root gets to decide
    if proc or sysfs should be mounted at all.

    Some hacks were put in place to reduce the worst of the damage could
    be done, and the common sense rule was adopted that fresh mounts of
    proc and sysfs should allow no more than bind mounts of proc and
    sysfs. Unfortunately that rule has not been fully enforced.

    There are two kinds of gaps in that enforcement. Only filesystems
    mounted on empty directories of proc and sysfs should be ignored but
    the test for empty directories was insufficient. So in my tree
    directories on proc, sysctl and sysfs that will always be empty are
    created specially. Every other technique is imperfect as an ordinary
    directory can have entries added even after a readdir returns and
    shows that the directory is empty. Special creation of directories
    for mount points makes the code in the kernel a smidge clearer about
    it's purpose. I asked container developers from the various container
    projects to help test this and no holes were found in the set of mount
    points on proc and sysfs that are created specially.

    This set of changes also starts enforcing the mount flags of fresh
    mounts of proc and sysfs are consistent with the existing mount of
    proc and sysfs. I expected this to be the boring part of the work but
    unfortunately unprivileged userspace winds up mounting fresh copies of
    proc and sysfs with noexec and nosuid clear when root set those flags
    on the previous mount of proc and sysfs. So for now only the atime,
    read-only and nodev attributes which userspace happens to keep
    consistent are enforced. Dealing with the noexec and nosuid
    attributes remains for another time.

    This set of changes also addresses an issue with how open file
    descriptors from /proc//ns/* are displayed. Recently readlink of
    /proc//fd has been triggering a WARN_ON that has not been
    meaningful since it was added (as all of the code in the kernel was
    converted) and is not now actively wrong.

    There is also a short list of issues that have not been fixed yet that
    I will mention briefly.

    It is possible to rename a directory from below to above a bind mount.
    At which point any directory pointers below the renamed directory can
    be walked up to the root directory of the filesystem. With user
    namespaces enabled a bind mount of the bind mount can be created
    allowing the user to pick a directory whose children they can rename
    to outside of the bind mount. This is challenging to fix and doubly
    so because all obvious solutions must touch code that is in the
    performance part of pathname resolution.

    As mentioned above there is also a question of how to ensure that
    developers by accident or with purpose do not introduce exectuable
    files on sysfs and proc and in doing so introduce security regressions
    in the current userspace that will not be immediately obvious and as
    such are likely to require breaking userspace in painful ways once
    they are recognized"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Remove incorrect debugging WARN in prepend_path
    mnt: Update fs_fully_visible to test for permanently empty directories
    sysfs: Create mountpoints with sysfs_create_mount_point
    sysfs: Add support for permanently empty directories to serve as mount points.
    kernfs: Add support for always empty directories.
    proc: Allow creating permanently empty directories that serve as mount points
    sysctl: Allow creating permanently empty directories that serve as mountpoints.
    fs: Add helper functions for permanently empty directories.
    vfs: Ignore unlocked mounts in fs_fully_visible
    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
    mnt: Refactor the logic for mounting sysfs and proc in a user namespace

    Linus Torvalds
     

02 Jul, 2015

1 commit

  • Pull module updates from Rusty Russell:
    "Main excitement here is Peter Zijlstra's lockless rbtree optimization
    to speed module address lookup. He found some abusers of the module
    lock doing that too.

    A little bit of parameter work here too; including Dan Streetman's
    breaking up the big param mutex so writing a parameter can load
    another module (yeah, really). Unfortunately that broke the usual
    suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
    appended too"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
    modules: only use mod->param_lock if CONFIG_MODULES
    param: fix module param locks when !CONFIG_SYSFS.
    rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
    module: add per-module param_lock
    module: make perm const
    params: suppress unused variable error, warn once just in case code changes.
    modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
    kernel/module.c: avoid ifdefs for sig_enforce declaration
    kernel/workqueue.c: remove ifdefs over wq_power_efficient
    kernel/params.c: export param_ops_bool_enable_only
    kernel/params.c: generalize bool_enable_only
    kernel/module.c: use generic module param operaters for sig_enforce
    kernel/params: constify struct kernel_param_ops uses
    sysfs: tightened sysfs permission checks
    module: Rework module_addr_{min,max}
    module: Use __module_address() for module_address_lookup()
    module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
    module: Optimize __module_address() using a latched RB-tree
    rbtree: Implement generic latch_tree
    seqlock: Introduce raw_read_seqcount_latch()
    ...

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • This allows for better documentation in the code and
    it allows for a simpler and fully correct version of
    fs_fully_visible to be written.

    The mount points converted and their filesystems are:
    /sys/hypervisor/s390/ s390_hypfs
    /sys/kernel/config/ configfs
    /sys/kernel/debug/ debugfs
    /sys/firmware/efi/efivars/ efivarfs
    /sys/fs/fuse/connections/ fusectl
    /sys/fs/pstore/ pstore
    /sys/kernel/tracing/ tracefs
    /sys/fs/cgroup/ cgroup
    /sys/kernel/security/ securityfs
    /sys/fs/selinux/ selinuxfs
    /sys/fs/smackfs/ smackfs

    Cc: stable@vger.kernel.org
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman