24 Feb, 2021

2 commits

  • Pull keyring updates from David Howells:
    "Here's a set of minor keyrings fixes/cleanups that I've collected from
    various people for the upcoming merge window.

    A couple of them might, in theory, be visible to userspace:

    - Make blacklist_vet_description() reject uppercase letters as they
    don't match the all-lowercase hex string generated for a blacklist
    search.

    This may want reconsideration in the future, but, currently, you
    can't add to the blacklist keyring from userspace and the only
    source of blacklist keys generates lowercase descriptions.

    - Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that
    it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP
    into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag.

    This isn't currently a problem as the blacklist keyring isn't
    currently writable by userspace.

    The rest of the patches are cleanups and I don't think they should
    have any visible effect"

    * tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    watch_queue: rectify kernel-doc for init_watch()
    certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID
    certs: Fix blacklist flag type confusion
    PKCS#7: Fix missing include
    certs: Fix blacklisted hexadecimal hash string check
    certs/blacklist: fix kernel doc interface issue
    crypto: public_key: Remove redundant header file from public_key.h
    keys: remove trailing semicolon in macro definition
    crypto: pkcs7: Use match_string() helper to simplify the code
    PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one
    encrypted-keys: Replace HTTP links with HTTPS ones
    crypto: asymmetric_keys: fix some comments in pkcs7_parser.h
    KEYS: remove redundant memset
    security: keys: delete repeated words in comments
    KEYS: asymmetric: Fix kerneldoc
    security/keys: use kvfree_sensitive()
    watch_queue: Drop references to /dev/watch_queue
    keys: Remove outdated __user annotations
    security: keys: Fix fall-through warnings for Clang

    Linus Torvalds
     
  • Pull idmapped mounts from Christian Brauner:
    "This introduces idmapped mounts which has been in the making for some
    time. Simply put, different mounts can expose the same file or
    directory with different ownership. This initial implementation comes
    with ports for fat, ext4 and with Christoph's port for xfs with more
    filesystems being actively worked on by independent people and
    maintainers.

    Idmapping mounts handle a wide range of long standing use-cases. Here
    are just a few:

    - Idmapped mounts make it possible to easily share files between
    multiple users or multiple machines especially in complex
    scenarios. For example, idmapped mounts will be used in the
    implementation of portable home directories in
    systemd-homed.service(8) where they allow users to move their home
    directory to an external storage device and use it on multiple
    computers where they are assigned different uids and gids. This
    effectively makes it possible to assign random uids and gids at
    login time.

    - It is possible to share files from the host with unprivileged
    containers without having to change ownership permanently through
    chown(2).

    - It is possible to idmap a container's rootfs and without having to
    mangle every file. For example, Chromebooks use it to share the
    user's Download folder with their unprivileged containers in their
    Linux subsystem.

    - It is possible to share files between containers with
    non-overlapping idmappings.

    - Filesystem that lack a proper concept of ownership such as fat can
    use idmapped mounts to implement discretionary access (DAC)
    permission checking.

    - They allow users to efficiently changing ownership on a per-mount
    basis without having to (recursively) chown(2) all files. In
    contrast to chown (2) changing ownership of large sets of files is
    instantenous with idmapped mounts. This is especially useful when
    ownership of a whole root filesystem of a virtual machine or
    container is changed. With idmapped mounts a single syscall
    mount_setattr syscall will be sufficient to change the ownership of
    all files.

    - Idmapped mounts always take the current ownership into account as
    idmappings specify what a given uid or gid is supposed to be mapped
    to. This contrasts with the chown(2) syscall which cannot by itself
    take the current ownership of the files it changes into account. It
    simply changes the ownership to the specified uid and gid. This is
    especially problematic when recursively chown(2)ing a large set of
    files which is commong with the aforementioned portable home
    directory and container and vm scenario.

    - Idmapped mounts allow to change ownership locally, restricting it
    to specific mounts, and temporarily as the ownership changes only
    apply as long as the mount exists.

    Several userspace projects have either already put up patches and
    pull-requests for this feature or will do so should you decide to pull
    this:

    - systemd: In a wide variety of scenarios but especially right away
    in their implementation of portable home directories.

    https://systemd.io/HOME_DIRECTORY/

    - container runtimes: containerd, runC, LXD:To share data between
    host and unprivileged containers, unprivileged and privileged
    containers, etc. The pull request for idmapped mounts support in
    containerd, the default Kubernetes runtime is already up for quite
    a while now: https://github.com/containerd/containerd/pull/4734

    - The virtio-fs developers and several users have expressed interest
    in using this feature with virtual machines once virtio-fs is
    ported.

    - ChromeOS: Sharing host-directories with unprivileged containers.

    I've tightly synced with all those projects and all of those listed
    here have also expressed their need/desire for this feature on the
    mailing list. For more info on how people use this there's a bunch of
    talks about this too. Here's just two recent ones:

    https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
    https://fosdem.org/2021/schedule/event/containers_idmap/

    This comes with an extensive xfstests suite covering both ext4 and
    xfs:

    https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

    It covers truncation, creation, opening, xattrs, vfscaps, setid
    execution, setgid inheritance and more both with idmapped and
    non-idmapped mounts. It already helped to discover an unrelated xfs
    setgid inheritance bug which has since been fixed in mainline. It will
    be sent for inclusion with the xfstests project should you decide to
    merge this.

    In order to support per-mount idmappings vfsmounts are marked with
    user namespaces. The idmapping of the user namespace will be used to
    map the ids of vfs objects when they are accessed through that mount.
    By default all vfsmounts are marked with the initial user namespace.
    The initial user namespace is used to indicate that a mount is not
    idmapped. All operations behave as before and this is verified in the
    testsuite.

    Based on prior discussions we want to attach the whole user namespace
    and not just a dedicated idmapping struct. This allows us to reuse all
    the helpers that already exist for dealing with idmappings instead of
    introducing a whole new range of helpers. In addition, if we decide in
    the future that we are confident enough to enable unprivileged users
    to setup idmapped mounts the permission checking can take into account
    whether the caller is privileged in the user namespace the mount is
    currently marked with.

    The user namespace the mount will be marked with can be specified by
    passing a file descriptor refering to the user namespace as an
    argument to the new mount_setattr() syscall together with the new
    MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
    of extensibility.

    The following conditions must be met in order to create an idmapped
    mount:

    - The caller must currently have the CAP_SYS_ADMIN capability in the
    user namespace the underlying filesystem has been mounted in.

    - The underlying filesystem must support idmapped mounts.

    - The mount must not already be idmapped. This also implies that the
    idmapping of a mount cannot be altered once it has been idmapped.

    - The mount must be a detached/anonymous mount, i.e. it must have
    been created by calling open_tree() with the OPEN_TREE_CLONE flag
    and it must not already have been visible in the filesystem.

    The last two points guarantee easier semantics for userspace and the
    kernel and make the implementation significantly simpler.

    By default vfsmounts are marked with the initial user namespace and no
    behavioral or performance changes are observed.

    The manpage with a detailed description can be found here:

    https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8

    In order to support idmapped mounts, filesystems need to be changed
    and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
    patches to convert individual filesystem are not very large or
    complicated overall as can be seen from the included fat, ext4, and
    xfs ports. Patches for other filesystems are actively worked on and
    will be sent out separately. The xfstestsuite can be used to verify
    that port has been done correctly.

    The mount_setattr() syscall is motivated independent of the idmapped
    mounts patches and it's been around since July 2019. One of the most
    valuable features of the new mount api is the ability to perform
    mounts based on file descriptors only.

    Together with the lookup restrictions available in the openat2()
    RESOLVE_* flag namespace which we added in v5.6 this is the first time
    we are close to hardened and race-free (e.g. symlinks) mounting and
    path resolution.

    While userspace has started porting to the new mount api to mount
    proper filesystems and create new bind-mounts it is currently not
    possible to change mount options of an already existing bind mount in
    the new mount api since the mount_setattr() syscall is missing.

    With the addition of the mount_setattr() syscall we remove this last
    restriction and userspace can now fully port to the new mount api,
    covering every use-case the old mount api could. We also add the
    crucial ability to recursively change mount options for a whole mount
    tree, both removing and adding mount options at the same time. This
    syscall has been requested multiple times by various people and
    projects.

    There is a simple tool available at

    https://github.com/brauner/mount-idmapped

    that allows to create idmapped mounts so people can play with this
    patch series. I'll add support for the regular mount binary should you
    decide to pull this in the following weeks:

    Here's an example to a simple idmapped mount of another user's home
    directory:

    u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

    u1001@f2-vm:/$ ls -al /home/ubuntu/
    total 28
    drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
    drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
    -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
    -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
    -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
    -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
    -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
    -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

    u1001@f2-vm:/$ ls -al /mnt/
    total 28
    drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
    drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
    -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
    -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
    -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
    -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
    -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
    -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo

    u1001@f2-vm:/$ touch /mnt/my-file

    u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

    u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

    u1001@f2-vm:/$ ls -al /mnt/my-file
    -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

    u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
    -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

    u1001@f2-vm:/$ getfacl /mnt/my-file
    getfacl: Removing leading '/' from absolute path names
    # file: mnt/my-file
    # owner: u1001
    # group: u1001
    user::rw-
    user:u1001:rwx
    group::rw-
    mask::rwx
    other::r--

    u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
    getfacl: Removing leading '/' from absolute path names
    # file: home/ubuntu/my-file
    # owner: ubuntu
    # group: ubuntu
    user::rw-
    user:ubuntu:rwx
    group::rw-
    mask::rwx
    other::r--"

    * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
    xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
    xfs: support idmapped mounts
    ext4: support idmapped mounts
    fat: handle idmapped mounts
    tests: add mount_setattr() selftests
    fs: introduce MOUNT_ATTR_IDMAP
    fs: add mount_setattr()
    fs: add attr_flags_to_mnt_flags helper
    fs: split out functions to hold writers
    namespace: only take read lock in do_reconfigure_mnt()
    mount: make {lock,unlock}_mount_hash() static
    namespace: take lock_mount_hash() directly when changing flags
    nfs: do not export idmapped mounts
    overlayfs: do not mount on top of idmapped mounts
    ecryptfs: do not mount on top of idmapped mounts
    ima: handle idmapped mounts
    apparmor: handle idmapped mounts
    fs: make helpers idmap mount aware
    exec: handle idmapped mounts
    would_dump: handle idmapped mounts
    ...

    Linus Torvalds
     

23 Feb, 2021

2 commits

  • …/ebiederm/user-namespace

    Pull user namespace update from Eric Biederman:
    "There are several pieces of active development, but only a single
    change made it through the gauntlet to be ready for v5.12. That change
    is tightening up the semantics of the v3 capabilities xattr. It is
    just short of being a bug-fix/security issue as no user space is known
    to even generate the problem case"

    * 'userns-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    capabilities: Don't allow writing ambiguous v3 file capabilities

    Linus Torvalds
     
  • Pull RCU-safe common_lsm_audit() from Al Viro:
    "Make common_lsm_audit() non-blocking and usable from RCU pathwalk
    context.

    We don't really need to grab/drop dentry in there - rcu_read_lock() is
    enough. There's a couple of followups using that to simplify the
    logics in selinux, but those hadn't soaked in -next yet, so they'll
    have to go in next window"

    * 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    make dump_common_audit_data() safe to be called from RCU pathwalk
    new helper: d_find_alias_rcu()

    Linus Torvalds
     

22 Feb, 2021

5 commits

  • …/git/jarkko/linux-tpmdd

    Pull tpm updates from Jarkko Sakkinen:
    "New features:

    - Cr50 I2C TPM driver

    - sysfs exports of PCR registers in TPM 2.0 chips

    Bug fixes:

    - bug fixes for tpm_tis driver, which had a racy wait for hardware
    state change to be ready to send a command to the TPM chip. The bug
    has existed already since 2006, but has only made itself known in
    recent past. This is the same as the "last time" :-)

    - Otherwise there's bunch of fixes for not as alarming regressions. I
    think the list is about the same as last time, except I added fixes
    for some disjoint bugs in trusted keys that I found some time ago"

    * tag 'tpmdd-next-v5.12-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
    KEYS: trusted: Reserve TPM for seal and unseal operations
    KEYS: trusted: Fix migratable=1 failing
    KEYS: trusted: Fix incorrect handling of tpm_get_random()
    tpm/ppi: Constify static struct attribute_group
    ABI: add sysfs description for tpm exports of PCR registers
    tpm: add sysfs exports for all banks of PCR registers
    keys: Update comment for restrict_link_by_key_or_keyring_chain
    tpm: Remove tpm_dev_wq_lock
    char: tpm: add i2c driver for cr50
    tpm: Fix fall-through warnings for Clang
    tpm_tis: Clean up locality release
    tpm_tis: Fix check_locality for correct locality acquisition

    Linus Torvalds
     
  • Pull smack updates from Casey Schaufler:
    "Bounds checking for writes to smackfs interfaces"

    * tag 'Smack-for-v5.12' of git://github.com/cschaufler/smack-next:
    smackfs: restrict bytes count in smackfs write functions

    Linus Torvalds
     
  • Pull IMA updates from Mimi Zohar:
    "New is IMA support for measuring kernel critical data, as per usual
    based on policy. The first example measures the in memory SELinux
    policy. The second example measures the kernel version.

    In addition are four bug fixes to address memory leaks and a missing
    'static' function declaration"

    * tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
    integrity: Make function integrity_add_key() static
    ima: Free IMA measurement buffer after kexec syscall
    ima: Free IMA measurement buffer on error
    IMA: Measure kernel version in early boot
    selinux: include a consumer of the new IMA critical data hook
    IMA: define a builtin critical data measurement policy
    IMA: extend critical data hook to limit the measurement based on a label
    IMA: limit critical data measurement based on a label
    IMA: add policy rule to measure critical data
    IMA: define a hook to measure kernel integrity critical data
    IMA: add support to measure buffer data hash
    IMA: generalize keyring specific measurement constructs
    evm: Fix memleak in init_desc

    Linus Torvalds
     
  • Pull selinux updates from Paul Moore:
    "We've got a good handful of patches for SELinux this time around; with
    everything passing the selinux-testsuite and applying cleanly to your
    tree as of a few minutes ago. The highlights are:

    - Add support for labeling anonymous inodes, and extend this new
    support to userfaultfd.

    - Fallback to SELinux genfs file labeling if the filesystem does not
    have xattr support. This is useful for virtiofs which can vary in
    its xattr support depending on the backing filesystem.

    - Classify and handle MPTCP the same as TCP in SELinux.

    - Ensure consistent behavior between inode_getxattr and
    inode_listsecurity when the SELinux policy is not loaded. This
    fixes a known problem with overlayfs.

    - A couple of patches to prune some unused variables from the SELinux
    code, mark private variables as static, and mark other variables as
    __ro_after_init or __read_mostly"

    * tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    fs: anon_inodes: rephrase to appropriate kernel-doc
    userfaultfd: use secure anon inodes for userfaultfd
    selinux: teach SELinux about anonymous inodes
    fs: add LSM-supporting anon-inode interface
    security: add inode_init_security_anon() LSM hook
    selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support
    selinux: mark selinux_xfrm_refcount as __read_mostly
    selinux: mark some global variables __ro_after_init
    selinux: make selinuxfs_mount static
    selinux: drop the unnecessary aurule_callback variable
    selinux: remove unused global variables
    selinux: fix inconsistency between inode_getxattr and inode_listsecurity
    selinux: handle MPTCP consistently with TCP

    Linus Torvalds
     
  • Pull tomoyo updates from Tetsuo Handa:
    "Detect kernel thread correctly, and ignore harmless data race"

    * tag 'tomoyo-pr-20210215' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
    tomoyo: recognize kernel threads correctly
    tomoyo: ignore data race while checking quota

    Linus Torvalds
     

16 Feb, 2021

3 commits

  • When TPM 2.0 trusted keys code was moved to the trusted keys subsystem,
    the operations were unwrapped from tpm_try_get_ops() and tpm_put_ops(),
    which are used to take temporarily the ownership of the TPM chip. The
    ownership is only taken inside tpm_send(), but this is not sufficient,
    as in the key load TPM2_CC_LOAD, TPM2_CC_UNSEAL and TPM2_FLUSH_CONTEXT
    need to be done as a one single atom.

    Take the TPM chip ownership before sending anything with
    tpm_try_get_ops() and tpm_put_ops(), and use tpm_transmit_cmd() to send
    TPM commands instead of tpm_send(), reverting back to the old behaviour.

    Fixes: 2e19e10131a0 ("KEYS: trusted: Move TPM2 trusted keys code")
    Reported-by: "James E.J. Bottomley"
    Cc: stable@vger.kernel.org
    Cc: David Howells
    Cc: Mimi Zohar
    Cc: Sumit Garg
    Acked-by Sumit Garg
    Tested-by: Mimi Zohar
    Signed-off-by: Jarkko Sakkinen

    Jarkko Sakkinen
     
  • Consider the following transcript:

    $ keyctl add trusted kmk "new 32 blobauth=helloworld keyhandle=80000000 migratable=1" @u
    add_key: Invalid argument

    The documentation has the following description:

    migratable= 0|1 indicating permission to reseal to new PCR values,
    default 1 (resealing allowed)

    The consequence is that "migratable=1" should succeed. Fix this by
    allowing this condition to pass instead of return -EINVAL.

    [*] Documentation/security/keys/trusted-encrypted.rst

    Cc: stable@vger.kernel.org
    Cc: "James E.J. Bottomley"
    Cc: Mimi Zohar
    Cc: David Howells
    Fixes: d00a1c72f7f4 ("keys: add new trusted key-type")
    Signed-off-by: Jarkko Sakkinen

    Jarkko Sakkinen
     
  • When tpm_get_random() was introduced, it defined the following API for the
    return value:

    1. A positive value tells how many bytes of random data was generated.
    2. A negative value on error.

    However, in the call sites the API was used incorrectly, i.e. as it would
    only return negative values and otherwise zero. Returning he positive read
    counts to the user space does not make any possible sense.

    Fix this by returning -EIO when tpm_get_random() returns a positive value.

    Fixes: 41ab999c80f1 ("tpm: Move tpm_get_random api into the TPM device driver")
    Cc: stable@vger.kernel.org
    Cc: Mimi Zohar
    Cc: "James E.J. Bottomley"
    Cc: David Howells
    Cc: Kent Yoder
    Signed-off-by: Jarkko Sakkinen
    Reviewed-by: Mimi Zohar

    Jarkko Sakkinen
     

13 Feb, 2021

1 commit

  • The sparse tool complains as follows:

    security/integrity/digsig.c:146:12: warning:
    symbol 'integrity_add_key' was not declared. Should it be static?

    This function is not used outside of digsig.c, so this
    commit marks it static.

    Reported-by: Hulk Robot
    Fixes: 60740accf784 ("integrity: Load certs to the platform keyring")
    Signed-off-by: Wei Yongjun
    Reviewed-by: Kees Cook
    Reviewed-by: Nayna Jain
    Signed-off-by: Mimi Zohar

    Wei Yongjun
     

11 Feb, 2021

3 commits

  • Mimi Zohar
     
  • IMA allocates kernel virtual memory to carry forward the measurement
    list, from the current kernel to the next kernel on kexec system call,
    in ima_add_kexec_buffer() function. This buffer is not freed before
    completing the kexec system call resulting in memory leak.

    Add ima_buffer field in "struct kimage" to store the virtual address
    of the buffer allocated for the IMA measurement list.
    Free the memory allocated for the IMA measurement list in
    kimage_file_post_load_cleanup() function.

    Signed-off-by: Lakshmi Ramasubramanian
    Suggested-by: Tyler Hicks
    Reviewed-by: Thiago Jung Bauermann
    Reviewed-by: Tyler Hicks
    Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
    Signed-off-by: Mimi Zohar

    Lakshmi Ramasubramanian
     
  • IMA allocates kernel virtual memory to carry forward the measurement
    list, from the current kernel to the next kernel on kexec system call,
    in ima_add_kexec_buffer() function. In error code paths this memory
    is not freed resulting in memory leak.

    Free the memory allocated for the IMA measurement list in
    the error code paths in ima_add_kexec_buffer() function.

    Signed-off-by: Lakshmi Ramasubramanian
    Suggested-by: Tyler Hicks
    Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
    Signed-off-by: Mimi Zohar

    Lakshmi Ramasubramanian
     

03 Feb, 2021

1 commit

  • syzbot found WARNINGs in several smackfs write operations where
    bytes count is passed to memdup_user_nul which exceeds
    GFP MAX_ORDER. Check count size if bigger than PAGE_SIZE.

    Per smackfs doc, smk_write_net4addr accepts any label or -CIPSO,
    smk_write_net6addr accepts any label or -DELETE. I couldn't find
    any general rule for other label lengths except SMK_LABELLEN,
    SMK_LONGLABEL, SMK_CIPSOMAX which are documented.

    Let's constrain, in general, smackfs label lengths for PAGE_SIZE.
    Although fuzzer crashes write to smackfs/netlabel on 0x400000 length.

    Here is a quick way to reproduce the WARNING:
    python -c "print('A' * 0x400000)" > /sys/fs/smackfs/netlabel

    Reported-by: syzbot+a71a442385a0b2815497@syzkaller.appspotmail.com
    Signed-off-by: Sabyrzhan Tasbolatov
    Signed-off-by: Casey Schaufler

    Sabyrzhan Tasbolatov
     

01 Feb, 2021

2 commits

  • Commit db68ce10c4f0a27c ("new helper: uaccess_kernel()") replaced
    segment_eq(get_fs(), KERNEL_DS) with uaccess_kernel(). But the correct
    method for tomoyo to check whether current is a kernel thread in order
    to assume that kernel threads are privileged for socket operations was
    (current->flags & PF_KTHREAD). Now that uaccess_kernel() became 0 on x86,
    tomoyo has to fix this problem. Do like commit 942cb357ae7d9249 ("Smack:
    Handle io_uring kernel thread privileges") does.

    Signed-off-by: Tetsuo Handa

    Tetsuo Handa
     
  • syzbot is reporting that tomoyo's quota check is racy [1]. But this check
    is tolerant of some degree of inaccuracy. Thus, teach KCSAN to ignore
    this data race.

    [1] https://syzkaller.appspot.com/bug?id=999533deec7ba6337f8aa25d8bd1a4d5f7e50476

    Reported-by: syzbot
    Signed-off-by: Tetsuo Handa

    Tetsuo Handa
     

28 Jan, 2021

1 commit

  • If a capability is stored on disk in v2 format cap_inode_getsecurity() will
    currently return in v2 format unconditionally.

    This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
    and so the same conversions performed on it.

    If the rootid cannot be mapped, v3 is returned unconverted. Fix this so
    that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
    user namespace in case of v2) cannot be mapped into the current user
    namespace.

    Signed-off-by: Miklos Szeredi
    Acked-by: "Eric W. Biederman"

    Miklos Szeredi
     

27 Jan, 2021

1 commit

  • The integrity of a kernel can be verified by the boot loader on cold
    boot, and during kexec, by the current running kernel, before it is
    loaded. However, it is still possible that the new kernel being
    loaded is older than the current kernel, and/or has known
    vulnerabilities. Therefore, it is imperative that an attestation
    service be able to verify the version of the kernel being loaded on
    the client, from cold boot and subsequent kexec system calls,
    ensuring that only kernels with versions known to be good are loaded.

    Measure the kernel version using ima_measure_critical_data() early on
    in the boot sequence, reducing the chances of known kernel
    vulnerabilities being exploited. With IMA being part of the kernel,
    this overall approach makes the measurement itself more trustworthy.

    To enable measuring the kernel version "ima_policy=critical_data"
    needs to be added to the kernel command line arguments.
    For example,
    BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset ima_policy=critical_data

    If runtime measurement of the kernel version is ever needed, the
    following should be added to /etc/ima/ima-policy:

    measure func=CRITICAL_DATA label=kernel_info

    To extract the measured data after boot, the following command can be used:

    grep -m 1 "kernel_version" \
    /sys/kernel/security/integrity/ima/ascii_runtime_measurements

    Sample output from the command above:

    10 a8297d408e9d5155728b619761d0dd4cedf5ef5f ima-buf
    sha256:5660e19945be0119bc19cbbf8d9c33a09935ab5d30dad48aa11f879c67d70988
    kernel_version 352e31312e302d7263332d31363138372d676564623634666537383234342d6469727479

    The above hex-ascii string corresponds to the kernel version
    (e.g. xxd -r -p):

    5.11.0-rc3-16187-gedb64fe78244-dirty

    Signed-off-by: Raphael Gianotti
    Signed-off-by: Mimi Zohar

    Raphael Gianotti
     

24 Jan, 2021

8 commits

  • IMA does sometimes access the inode's i_uid and compares it against the
    rules' fowner. Enable IMA to handle idmapped mounts by passing down the
    mount's user namespace. We simply make use of the helpers we introduced
    before. If the initial user namespace is passed nothing changes so
    non-idmapped mounts will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-27-christian.brauner@ubuntu.com
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The i_uid and i_gid are mostly used when logging for AppArmor. This is
    broken in a bunch of places where the global root id is reported instead
    of the i_uid or i_gid of the file. Nonetheless, be kind and log the
    mapped inode if we're coming from an idmapped mount. If the initial user
    namespace is passed nothing changes so non-idmapped mounts will see
    identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-26-christian.brauner@ubuntu.com
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • Extend some inode methods with an additional user namespace argument. A
    filesystem that is aware of idmapped mounts will receive the user
    namespace the mount has been marked with. This can be used for
    additional permission checking and also to enable filesystems to
    translate between uids and gids if they need to. We have implemented all
    relevant helpers in earlier patches.

    As requested we simply extend the exisiting inode method instead of
    introducing new ones. This is a little more code churn but it's mostly
    mechanical and doesnt't leave us with additional inode methods.

    Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • When interacting with user namespace and non-user namespace aware
    filesystem capabilities the vfs will perform various security checks to
    determine whether or not the filesystem capabilities can be used by the
    caller, whether they need to be removed and so on. The main
    infrastructure for this resides in the capability codepaths but they are
    called through the LSM security infrastructure even though they are not
    technically an LSM or optional. This extends the existing security hooks
    security_inode_removexattr(), security_inode_killpriv(),
    security_inode_getsecurity() to pass down the mount's user namespace and
    makes them aware of idmapped mounts.

    In order to actually get filesystem capabilities from disk the
    capability infrastructure exposes the get_vfs_caps_from_disk() helper.
    For user namespace aware filesystem capabilities a root uid is stored
    alongside the capabilities.

    In order to determine whether the caller can make use of the filesystem
    capability or whether it needs to be ignored it is translated according
    to the superblock's user namespace. If it can be translated to uid 0
    according to that id mapping the caller can use the filesystem
    capabilities stored on disk. If we are accessing the inode that holds
    the filesystem capabilities through an idmapped mount we map the root
    uid according to the mount's user namespace. Afterwards the checks are
    identical to non-idmapped mounts: reading filesystem caps from disk
    enforces that the root uid associated with the filesystem capability
    must have a mapping in the superblock's user namespace and that the
    caller is either in the same user namespace or is a descendant of the
    superblock's user namespace. For filesystems that are mountable inside
    user namespace the caller can just mount the filesystem and won't
    usually need to idmap it. If they do want to idmap it they can create an
    idmapped mount and mark it with a user namespace they created and which
    is thus a descendant of s_user_ns. For filesystems that are not
    mountable inside user namespaces the descendant rule is trivially true
    because the s_user_ns will be the initial user namespace.

    If the initial user namespace is passed nothing changes so non-idmapped
    mounts will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Acked-by: James Morris
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • When interacting with extended attributes the vfs verifies that the
    caller is privileged over the inode with which the extended attribute is
    associated. For posix access and posix default extended attributes a uid
    or gid can be stored on-disk. Let the functions handle posix extended
    attributes on idmapped mounts. If the inode is accessed through an
    idmapped mount we need to map it according to the mount's user
    namespace. Afterwards the checks are identical to non-idmapped mounts.
    This has no effect for e.g. security xattrs since they don't store uids
    or gids and don't perform permission checks on them like posix acls do.

    Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Signed-off-by: Tycho Andersen
    Signed-off-by: Christian Brauner

    Tycho Andersen
     
  • The posix acl permission checking helpers determine whether a caller is
    privileged over an inode according to the acls associated with the
    inode. Add helpers that make it possible to handle acls on idmapped
    mounts.

    The vfs and the filesystems targeted by this first iteration make use of
    posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
    translate basic posix access and default permissions such as the
    ACL_USER and ACL_GROUP type according to the initial user namespace (or
    the superblock's user namespace) to and from the caller's current user
    namespace. Adapt these two helpers to handle idmapped mounts whereby we
    either map from or into the mount's user namespace depending on in which
    direction we're translating.
    Similarly, cap_convert_nscap() is used by the vfs to translate user
    namespace and non-user namespace aware filesystem capabilities from the
    superblock's user namespace to the caller's user namespace. Enable it to
    handle idmapped mounts by accounting for the mount's user namespace.

    In addition the fileystems targeted in the first iteration of this patch
    series make use of the posix_acl_chmod() and, posix_acl_update_mode()
    helpers. Both helpers perform permission checks on the target inode. Let
    them handle idmapped mounts. These two helpers are called when posix
    acls are set by the respective filesystems to handle this case we extend
    the ->set() method to take an additional user namespace argument to pass
    the mount's user namespace down.

    Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The inode_owner_or_capable() helper determines whether the caller is the
    owner of the inode or is capable with respect to that inode. Allow it to
    handle idmapped mounts. If the inode is accessed through an idmapped
    mount it according to the mount's user namespace. Afterwards the checks
    are identical to non-idmapped mounts. If the initial user namespace is
    passed nothing changes so non-idmapped mounts will see identical
    behavior as before.

    Similarly, allow the inode_init_owner() helper to handle idmapped
    mounts. It initializes a new inode on idmapped mounts by mapping the
    fsuid and fsgid of the caller from the mount's user namespace. If the
    initial user namespace is passed nothing changes so non-idmapped mounts
    will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • In order to determine whether a caller holds privilege over a given
    inode the capability framework exposes the two helpers
    privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
    verifies that the inode has a mapping in the caller's user namespace and
    the latter additionally verifies that the caller has the requested
    capability in their current user namespace.
    If the inode is accessed through an idmapped mount map it into the
    mount's user namespace. Afterwards the checks are identical to
    non-idmapped inodes. If the initial user namespace is passed all
    operations are a nop so non-idmapped mounts will not see a change in
    behavior.

    Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Christian Brauner

    Christian Brauner
     

22 Jan, 2021

7 commits

  • KEY_FLAG_KEEP is not meant to be passed to keyring_alloc() or key_alloc(),
    as these only take KEY_ALLOC_* flags. KEY_FLAG_KEEP has the same value as
    KEY_ALLOC_BYPASS_RESTRICTION, but fortunately only key_create_or_update()
    uses it. LSMs using the key_alloc hook don't check that flag.

    KEY_FLAG_KEEP is then ignored but fortunately (again) the root user cannot
    write to the blacklist keyring, so it is not possible to remove a key/hash
    from it.

    Fix this by adding a KEY_ALLOC_SET_KEEP flag that tells key_alloc() to set
    KEY_FLAG_KEEP on the new key. blacklist_init() can then, correctly, pass
    this to keyring_alloc().

    We can also use this in ima_mok_init() rather than setting the flag
    manually.

    Note that this doesn't fix an observable bug with the current
    implementation but it is required to allow addition of new hashes to the
    blacklist in the future without making it possible for them to be removed.

    Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring")
    Reported-by: Mickaël Salaün
    Signed-off-by: David Howells
    cc: Mickaël Salaün
    cc: Mimi Zohar
    Cc: David Woodhouse

    David Howells
     
  • Reviewing use of memset in keyctl_pkey.c

    keyctl_pkey_params_get prologue code to set params up

    memset(params, 0, sizeof(*params));
    params->encoding = "raw";

    keyctl_pkey_query has the same prologue
    and calls keyctl_pkey_params_get.

    So remove the prologue.

    Signed-off-by: Tom Rix
    Signed-off-by: David Howells
    Reviewed-by: Ben Boeckel

    Tom Rix
     
  • Drop repeated words in comments.
    {to, will, the}

    Signed-off-by: Randy Dunlap
    Signed-off-by: David Howells
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Ben Boeckel
    Cc: keyrings@vger.kernel.org
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: linux-security-module@vger.kernel.org

    Randy Dunlap
     
  • Use kvfree_sensitive() instead of open-coding it.

    Signed-off-by: Denis Efremov
    Signed-off-by: David Howells
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Ben Boeckel

    Denis Efremov
     
  • The merged API doesn't use a watch_queue device, but instead relies on
    pipes, so let the documentation reflect that.

    Fixes: f7e47677e39a ("watch_queue: Add a key/keyring notification facility")
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: David Howells
    Acked-by: Jarkko Sakkinen
    Reviewed-by: Ben Boeckel

    Gabriel Krisman Bertazi
     
  • When the semantics of the ->read() handlers were changed such that "buffer"
    is a kernel pointer, some __user annotations survived.
    Since they're wrong now, get rid of them.

    Fixes: d3ec10aa9581 ("KEYS: Don't write out to userspace while holding key semaphore")
    Signed-off-by: Jann Horn
    Signed-off-by: David Howells
    Reviewed-by: Ben Boeckel

    Jann Horn
     
  • In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
    by explicitly adding a break statement instead of letting the code fall
    through to the next case.

    Link: https://github.com/KSPP/linux/issues/115
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David Howells
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Ben Boeckel

    Gustavo A. R. Silva
     

17 Jan, 2021

2 commits


15 Jan, 2021

2 commits

  • SELinux stores the active policy in memory, so the changes to this data
    at runtime would have an impact on the security guarantees provided
    by SELinux. Measuring in-memory SELinux policy through IMA subsystem
    provides a secure way for the attestation service to remotely validate
    the policy contents at runtime.

    Measure the hash of the loaded policy by calling the IMA hook
    ima_measure_critical_data(). Since the size of the loaded policy
    can be large (several MB), measure the hash of the policy instead of
    the entire policy to avoid bloating the IMA log entry.

    To enable SELinux data measurement, the following steps are required:

    1, Add "ima_policy=critical_data" to the kernel command line arguments
    to enable measuring SELinux data at boot time.
    For example,
    BOOT_IMAGE=/boot/vmlinuz-5.10.0-rc1+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data

    2, Add the following rule to /etc/ima/ima-policy
    measure func=CRITICAL_DATA label=selinux

    Sample measurement of the hash of SELinux policy:

    To verify the measured data with the current SELinux policy run
    the following commands and verify the output hash values match.

    sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1

    grep "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6

    Note that the actual verification of SELinux policy would require loading
    the expected policy into an identical kernel on a pristine/known-safe
    system and run the sha256sum /sys/kernel/selinux/policy there to get
    the expected hash.

    Signed-off-by: Lakshmi Ramasubramanian
    Suggested-by: Stephen Smalley
    Acked-by: Paul Moore
    Reviewed-by: Tyler Hicks
    Signed-off-by: Mimi Zohar

    Lakshmi Ramasubramanian
     
  • Define a new critical data builtin policy to allow measuring
    early kernel integrity critical data before a custom IMA policy
    is loaded.

    Update the documentation on kernel parameters to document
    the new critical data builtin policy.

    Signed-off-by: Lakshmi Ramasubramanian
    Reviewed-by: Tyler Hicks
    Signed-off-by: Mimi Zohar

    Lakshmi Ramasubramanian