01 Apr, 2020

2 commits

  • Pull networking updates from David Miller:
    "Highlights:

    1) Fix the iwlwifi regression, from Johannes Berg.

    2) Support BSS coloring and 802.11 encapsulation offloading in
    hardware, from John Crispin.

    3) Fix some potential Spectre issues in qtnfmac, from Sergey
    Matyukevich.

    4) Add TTL decrement action to openvswitch, from Matteo Croce.

    5) Allow paralleization through flow_action setup by not taking the
    RTNL mutex, from Vlad Buslov.

    6) A lot of zero-length array to flexible-array conversions, from
    Gustavo A. R. Silva.

    7) Align XDP statistics names across several drivers for consistency,
    from Lorenzo Bianconi.

    8) Add various pieces of infrastructure for offloading conntrack, and
    make use of it in mlx5 driver, from Paul Blakey.

    9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

    10) Lots of parallelization improvements during configuration changes
    in mlxsw driver, from Ido Schimmel.

    11) Add support to devlink for generic packet traps, which report
    packets dropped during ACL processing. And use them in mlxsw
    driver. From Jiri Pirko.

    12) Support bcmgenet on ACPI, from Jeremy Linton.

    13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
    Starovoitov, and your's truly.

    14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

    15) Fix sysfs permissions when network devices change namespaces, from
    Christian Brauner.

    16) Add a flags element to ethtool_ops so that drivers can more simply
    indicate which coalescing parameters they actually support, and
    therefore the generic layer can validate the user's ethtool
    request. Use this in all drivers, from Jakub Kicinski.

    17) Offload FIFO qdisc in mlxsw, from Petr Machata.

    18) Support UDP sockets in sockmap, from Lorenz Bauer.

    19) Fix stretch ACK bugs in several TCP congestion control modules,
    from Pengcheng Yang.

    20) Support virtual functiosn in octeontx2 driver, from Tomasz
    Duszynski.

    21) Add region operations for devlink and use it in ice driver to dump
    NVM contents, from Jacob Keller.

    22) Add support for hw offload of MACSEC, from Antoine Tenart.

    23) Add support for BPF programs that can be attached to LSM hooks,
    from KP Singh.

    24) Support for multiple paths, path managers, and counters in MPTCP.
    From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
    and others.

    25) More progress on adding the netlink interface to ethtool, from
    Michal Kubecek"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
    net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
    cxgb4/chcr: nic-tls stats in ethtool
    net: dsa: fix oops while probing Marvell DSA switches
    net/bpfilter: remove superfluous testing message
    net: macb: Fix handling of fixed-link node
    net: dsa: ksz: Select KSZ protocol tag
    netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
    net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
    net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
    net: stmmac: create dwmac-intel.c to contain all Intel platform
    net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
    net: dsa: bcm_sf2: Add support for matching VLAN TCI
    net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
    net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
    net: dsa: bcm_sf2: Disable learning for ASP port
    net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
    net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
    net: dsa: b53: Restore VLAN entries upon (re)configuration
    net: dsa: bcm_sf2: Fix overflow checks
    hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
    ...

    Linus Torvalds
     
  • Pull SELinux updates from Paul Moore:
    "We've got twenty SELinux patches for the v5.7 merge window, the
    highlights are below:

    - Deprecate setting /sys/fs/selinux/checkreqprot to 1.

    This flag was originally created to deal with legacy userspace and
    the READ_IMPLIES_EXEC personality flag. We changed the default from
    1 to 0 back in Linux v4.4 and now we are taking the next step of
    deprecating it, at some point in the future we will take the final
    step of rejecting 1.

    - Allow kernfs symlinks to inherit the SELinux label of the parent
    directory. In order to preserve backwards compatibility this is
    protected by the genfs_seclabel_symlinks SELinux policy capability.

    - Optimize how we store filename transitions in the kernel, resulting
    in some significant improvements to policy load times.

    - Do a better job calculating our internal hash table sizes which
    resulted in additional policy load improvements and likely general
    SELinux performance improvements as well.

    - Remove the unused initial SIDs (labels) and improve how we handle
    initial SIDs.

    - Enable per-file labeling for the bpf filesystem.

    - Ensure that we properly label NFS v4.2 filesystems to avoid a
    temporary unlabeled condition.

    - Add some missing XFS quota command types to the SELinux quota
    access controls.

    - Fix a problem where we were not updating the seq_file position
    index correctly in selinuxfs.

    - We consolidate some duplicated code into helper functions.

    - A number of list to array conversions.

    - Update Stephen Smalley's email address in MAINTAINERS"

    * tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    selinux: clean up indentation issue with assignment statement
    NFS: Ensure security label is set for root inode
    MAINTAINERS: Update my email address
    selinux: avtab_init() and cond_policydb_init() return void
    selinux: clean up error path in policydb_init()
    selinux: remove unused initial SIDs and improve handling
    selinux: reduce the use of hard-coded hash sizes
    selinux: Add xfs quota command types
    selinux: optimize storage of filename transitions
    selinux: factor out loop body from filename_trans_read()
    security: selinux: allow per-file labeling for bpffs
    selinux: generalize evaluate_cond_node()
    selinux: convert cond_expr to array
    selinux: convert cond_av_list to array
    selinux: convert cond_list to array
    selinux: sel_avc_get_stat_idx should increase position index
    selinux: allow kernfs symlinks to inherit parent directory context
    selinux: simplify evaluate_cond_node()
    Documentation,selinux: deprecate setting checkreqprot to 1
    selinux: move status variables out of selinux_ss

    Linus Torvalds
     

31 Mar, 2020

2 commits

  • The assignment of e->type_names is indented one level too deep,
    clean this up by removing the extraneous tab.

    Signed-off-by: Colin Ian King
    Signed-off-by: Paul Moore

    Colin Ian King
     
  • Pull EFI updates from Ingo Molnar:
    "The EFI changes in this cycle are much larger than usual, for two
    (positive) reasons:

    - The GRUB project is showing signs of life again, resulting in the
    introduction of the generic Linux/UEFI boot protocol, instead of
    x86 specific hacks which are increasingly difficult to maintain.
    There's hope that all future extensions will now go through that
    boot protocol.

    - Preparatory work for RISC-V EFI support.

    The main changes are:

    - Boot time GDT handling changes

    - Simplify handling of EFI properties table on arm64

    - Generic EFI stub cleanups, to improve command line handling, file
    I/O, memory allocation, etc.

    - Introduce a generic initrd loading method based on calling back
    into the firmware, instead of relying on the x86 EFI handover
    protocol or device tree.

    - Introduce a mixed mode boot method that does not rely on the x86
    EFI handover protocol either, and could potentially be adopted by
    other architectures (if another one ever surfaces where one
    execution mode is a superset of another)

    - Clean up the contents of 'struct efi', and move out everything that
    doesn't need to be stored there.

    - Incorporate support for UEFI spec v2.8A changes that permit
    firmware implementations to return EFI_UNSUPPORTED from UEFI
    runtime services at OS runtime, and expose a mask of which ones are
    supported or unsupported via a configuration table.

    - Partial fix for the lack of by-VA cache maintenance in the
    decompressor on 32-bit ARM.

    - Changes to load device firmware from EFI boot service memory
    regions

    - Various documentation updates and minor code cleanups and fixes"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    efi/libstub/arm: Fix spurious message that an initrd was loaded
    efi/libstub/arm64: Avoid image_base value from efi_loaded_image
    partitions/efi: Fix partition name parsing in GUID partition entry
    efi/x86: Fix cast of image argument
    efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
    efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
    efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
    efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
    efi/x86: Ignore the memory attributes table on i386
    efi/x86: Don't relocate the kernel unless necessary
    efi/x86: Remove extra headroom for setup block
    efi/x86: Add kernel preferred address to PE header
    efi/x86: Decompress at start of PE image load address
    x86/boot/compressed/32: Save the output address instead of recalculating it
    efi/libstub/x86: Deal with exit() boot service returning
    x86/boot: Use unsigned comparison for addresses
    efi/x86: Avoid using code32_start
    efi/x86: Make efi32_pe_entry() more readable
    efi/x86: Respect 32-bit ABI in efi32_pe_entry()
    efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
    ...

    Linus Torvalds
     

30 Mar, 2020

2 commits

  • * The hooks are initialized using the definitions in
    include/linux/lsm_hook_defs.h.
    * The LSM can be enabled / disabled with CONFIG_BPF_LSM.

    Signed-off-by: KP Singh
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Brendan Jackman
    Reviewed-by: Florent Revest
    Acked-by: Kees Cook
    Acked-by: James Morris
    Link: https://lore.kernel.org/bpf/20200329004356.27286-6-kpsingh@chromium.org

    KP Singh
     
  • The information about the different types of LSM hooks is scattered
    in two locations i.e. union security_list_options and
    struct security_hook_heads. Rather than duplicating this information
    even further for BPF_PROG_TYPE_LSM, define all the hooks with the
    LSM_HOOK macro in lsm_hook_defs.h which is then used to generate all
    the data structures required by the LSM framework.

    The LSM hooks are defined as:

    LSM_HOOK(, , , args...)

    with acccessible in security.c as:

    LSM_RET_DEFAULT()

    Signed-off-by: KP Singh
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Brendan Jackman
    Reviewed-by: Florent Revest
    Reviewed-by: Kees Cook
    Reviewed-by: Casey Schaufler
    Acked-by: James Morris
    Link: https://lore.kernel.org/bpf/20200329004356.27286-3-kpsingh@chromium.org

    KP Singh
     

16 Mar, 2020

1 commit

  • Currently, when we add a new user key, the calltrace as below:

    add_key()
    key_create_or_update()
    key_alloc()
    __key_instantiate_and_link
    generic_key_instantiate
    key_payload_reserve
    ......

    Since commit a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly"),
    we can reach max bytes/keys in key_alloc, but we forget to remove this
    limit when we reserver space for payload in key_payload_reserve. So we
    can only reach max keys but not max bytes when having delta between plen
    and type->def_datalen. Remove this limit when instantiating the key, so we
    can keep consistent with key_alloc.

    Also, fix the similar problem in keyctl_chown_key().

    Fixes: 0b77f5bfb45c ("keys: make the keyring quotas controllable through /proc/sys")
    Fixes: a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly")
    Cc: stable@vger.kernel.org # 5.0.x
    Cc: Eric Biggers
    Signed-off-by: Yang Xu
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Eric Biggers
    Signed-off-by: Jarkko Sakkinen

    Yang Xu
     

06 Mar, 2020

2 commits


28 Feb, 2020

2 commits

  • Remove initial SIDs that have never been used or are no longer used by
    the kernel from its string table, which is also used to generate the
    SECINITSID_* symbols referenced in code. Update the code to
    gracefully handle the fact that these can now be NULL. Stop treating
    it as an error if a policy defines additional initial SIDs unknown to
    the kernel. Do not load unused initial SID contexts into the sidtab.
    Fix the incorrect usage of the name from the ocontext in error
    messages when loading initial SIDs since these are not presently
    written to the kernel policy and are therefore always NULL.

    After this change, it is possible to safely reclaim and reuse some of
    the unused initial SIDs without compatibility issues. Specifically,
    unused initial SIDs that were being assigned the same context as the
    unlabeled initial SID in policies can be reclaimed and reused for
    another purpose, with existing policies still treating them as having
    the unlabeled context and future policies having the option of mapping
    them to a more specific context. For example, this could have been
    used when the infiniband labeling support was introduced to define
    initial SIDs for the default pkey and endport SIDs similar to the
    handling of port/netif/node SIDs rather than always using
    SECINITSID_UNLABELED as the default.

    The set of safely reclaimable unused initial SIDs across all known
    policies is igmp_packet (13), icmp_socket (14), tcp_socket (15), kmod
    (24), policy (25), and scmp_packet (26); these initial SIDs were
    assigned the same context as unlabeled in all known policies including
    mls. If only considering non-mls policies (i.e. assuming that mls
    users always upgrade policy with their kernels), the set of safely
    reclaimable unused initial SIDs further includes file_labels (6), init
    (7), sysctl_modprobe (16), and sysctl_fs (18) through sysctl_dev (23).

    Adding new initial SIDs beyond SECINITSID_NUM to policy unfortunately
    became a fatal error in commit 24ed7fdae669 ("selinux: use separate
    table for initial SID lookup") and even before that it could cause
    problems on a policy reload (collision between the new initial SID and
    one allocated at runtime) ever since commit 42596eafdd75 ("selinux:
    load the initial SIDs upon every policy load") so we cannot safely
    start adding new initial SIDs to policies beyond SECINITSID_NUM (27)
    until such a time as all such kernels do not need to be supported and
    only those that include this commit are relevant. That is not a big
    deal since we haven't added a new initial SID since 2004 (v2.6.7) and
    we have plenty of unused ones we can reclaim if we truly need one.

    If we want to avoid the wasted storage in initial_sid_to_string[]
    and/or sidtab->isids[] for the unused initial SIDs, we could introduce
    an indirection between the kernel initial SID values and the policy
    initial SID values and just map the policy SID values in the ocontexts
    to the kernel values during policy_load_isids(). Originally I thought
    we'd do this by preserving the initial SID names in the kernel policy
    and creating a mapping at load time like we do for the security
    classes and permissions but that would require a new kernel policy
    format version and associated changes to libsepol/checkpolicy and I'm
    not sure it is justified. Simpler approach is just to create a fixed
    mapping table in the kernel from the existing fixed policy values to
    the kernel values. Less flexible but probably sufficient.

    A separate selinux userspace change was applied in
    https://github.com/SELinuxProject/selinux/commit/8677ce5e8f592950ae6f14cea1b68a20ddc1ac25
    to enable removal of most of the unused initial SID contexts from
    policies, but there is no dependency between that change and this one.
    That change permits removing all of the unused initial SID contexts
    from policy except for the fs and sysctl SID contexts. The initial
    SID declarations themselves would remain in policy to preserve the
    values of subsequent ones but the contexts can be dropped. If/when
    the kernel decides to reuse one of them, future policies can change
    the name and start assigning a context again without breaking
    compatibility.

    Here is how I would envision staging changes to the initial SIDs in a
    compatible manner after this commit is applied:

    1. At any time after this commit is applied, the kernel could choose
    to reclaim one of the safely reclaimable unused initial SIDs listed
    above for a new purpose (i.e. replace its NULL entry in the
    initial_sid_to_string[] table with a new name and start using the
    newly generated SECINITSID_name symbol in code), and refpolicy could
    at that time rename its declaration of that initial SID to reflect its
    new purpose and start assigning it a context going
    forward. Existing/old policies would map the reclaimed initial SID to
    the unlabeled context, so that would be the initial default behavior
    until policies are updated. This doesn't depend on the selinux
    userspace change; it will work with existing policies and userspace.

    2. In 6 months or so we'll have another SELinux userspace release that
    will include the libsepol/checkpolicy support for omitting unused
    initial SID contexts.

    3. At any time after that release, refpolicy can make that release its
    minimum build requirement and drop the sid context statements (but not
    the sid declarations) for all of the unused initial SIDs except for
    fs and sysctl, which must remain for compatibility on policy
    reload with old kernels and for compatibility with kernels that were
    still using SECINITSID_SYSCTL (< 2.6.39). This doesn't depend on this
    kernel commit; it will work with previous kernels as well.

    4. After N years for some value of N, refpolicy decides that it no
    longer cares about policy reload compatibility for kernels that
    predate this kernel commit, and refpolicy drops the fs and sysctl
    SID contexts from policy too (but retains the declarations).

    5. After M years for some value of M, the kernel decides that it no
    longer cares about compatibility with refpolicies that predate step 4
    (dropping the fs and sysctl SIDs), and those two SIDs also become
    safely reclaimable. This step is optional and need not ever occur unless
    we decide that the need to reclaim those two SIDs outweighs the
    compatibility cost.

    6. After O years for some value of O, refpolicy decides that it no
    longer cares about policy load (not just reload) compatibility for
    kernels that predate this kernel commit, and both kernel and refpolicy
    can then start adding and using new initial SIDs beyond 27. This does
    not depend on the previous change (step 5) and can occur independent
    of it.

    Fixes: https://github.com/SELinuxProject/selinux-kernel/issues/12
    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Stephen Smalley
     
  • Instead allocate hash tables with just the right size based on the
    actual number of elements (which is almost always known beforehand, we
    just need to defer the hashtab allocation to the right time). The only
    case when we don't know the size (with the current policy format) is the
    new filename transitions hashtable. Here I just left the existing value.

    After this patch, the time to load Fedora policy on x86_64 decreases
    from 790 ms to 167 ms. If the unconfined module is removed, it decreases
    from 750 ms to 122 ms. It is also likely that other operations are going
    to be faster, mainly string_to_context_struct() or mls_compute_sid(),
    but I didn't try to quantify that.

    The memory usage of all hash table arrays increases from ~58 KB to
    ~163 KB (with Fedora policy on x86_64).

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

26 Feb, 2020

1 commit

  • Pull EFI updates for v5.7 from Ard Biesheuvel:

    This time, the set of changes for the EFI subsystem is much larger than
    usual. The main reasons are:

    - Get things cleaned up before EFI support for RISC-V arrives, which will
    increase the size of the validation matrix, and therefore the threshold to
    making drastic changes,

    - After years of defunct maintainership, the GRUB project has finally started
    to consider changes from the distros regarding UEFI boot, some of which are
    highly specific to the way x86 does UEFI secure boot and measured boot,
    based on knowledge of both shim internals and the layout of bootparams and
    the x86 setup header. Having this maintenance burden on other architectures
    (which don't need shim in the first place) is hard to justify, so instead,
    we are introducing a generic Linux/UEFI boot protocol.

    Summary of changes:

    - Boot time GDT handling changes (Arvind)

    - Simplify handling of EFI properties table on arm64

    - Generic EFI stub cleanups, to improve command line handling, file I/O,
    memory allocation, etc.

    - Introduce a generic initrd loading method based on calling back into
    the firmware, instead of relying on the x86 EFI handover protocol or
    device tree.

    - Introduce a mixed mode boot method that does not rely on the x86 EFI
    handover protocol either, and could potentially be adopted by other
    architectures (if another one ever surfaces where one execution mode
    is a superset of another)

    - Clean up the contents of struct efi, and move out everything that
    doesn't need to be stored there.

    - Incorporate support for UEFI spec v2.8A changes that permit firmware
    implementations to return EFI_UNSUPPORTED from UEFI runtime services at
    OS runtime, and expose a mask of which ones are supported or unsupported
    via a configuration table.

    - Various documentation updates and minor code cleanups (Heinrich)

    - Partial fix for the lack of by-VA cache maintenance in the decompressor
    on 32-bit ARM. Note that these patches were deliberately put at the
    beginning so they can be used as a stable branch that will be shared with
    a PR containing the complete fix, which I will send to the ARM tree.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Feb, 2020

1 commit


23 Feb, 2020

2 commits

  • Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod
    permission check.

    Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger
    filesystem quotaget permission check.

    Signed-off-by: Richard Haines
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Paul Moore

    Richard Haines
     
  • In these rules, each rule with the same (target type, target class,
    filename) values is (in practice) always mapped to the same result type.
    Therefore, it is much more efficient to group the rules by (ttype,
    tclass, filename).

    Thus, this patch drops the stype field from the key and changes the
    datum to be a linked list of one or more structures that contain a
    result type and an ebitmap of source types that map the given target to
    the given result type under the given filename. The size of the hash
    table is also incremented to 2048 to be more optimal for Fedora policy
    (which currently has ~2500 unique (ttype, tclass, filename) tuples,
    regardless of whether the 'unconfined' module is enabled).

    Not only does this dramtically reduce memory usage when the policy
    contains a lot of unconfined domains (ergo a lot of filename based
    transitions), but it also slightly reduces memory usage of strongly
    confined policies (modeled on Fedora policy with 'unconfined' module
    disabled) and significantly reduces lookup times of these rules on
    Fedora (roughly matches the performance of the rhashtable conversion
    patch [1] posted recently to selinux@vger.kernel.org).

    An obvious next step is to change binary policy format to match this
    layout, so that disk space is also saved. However, since that requires
    more work (including matching userspace changes) and this patch is
    already beneficial on its own, I'm posting it separately.

    Performance/memory usage comparison:

    Kernel | Policy load | Policy load | Mem usage | Mem usage | openbench
    | | (-unconfined) | | (-unconfined) | (createfiles)
    -----------------|-------------|---------------|-----------|---------------|--------------
    reference | 1,30s | 0,91s | 90MB | 77MB | 55 us/file
    rhashtable patch | 0.98s | 0,85s | 85MB | 75MB | 38 us/file
    this patch | 0,95s | 0,87s | 75MB | 75MB | 40 us/file

    (Memory usage is measured after boot. With SELinux disabled the memory
    usage was ~60MB on the same system.)

    [1] https://lore.kernel.org/selinux/20200116213937.77795-1-dev@lynxeye.de/T/

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

21 Feb, 2020

1 commit

  • Pull IMA fixes from Mimi Zohar:
    "Two bug fixes and an associated change for each.

    The one that adds SM3 to the IMA list of supported hash algorithms is
    a simple change, but could be considered a new feature"

    * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
    ima: add sm3 algorithm to hash algorithm configuration list
    crypto: rename sm3-256 to sm3 in hash_algo_name
    efi: Only print errors about failing to get certs if EFI vars are found
    x86/ima: use correct identifier for SetupMode variable

    Linus Torvalds
     

18 Feb, 2020

2 commits

  • sm3 has been supported by the ima hash algorithm, but it is not
    yet in the Kconfig configuration list. After adding, both ima and tpm2
    can support sm3 well.

    Signed-off-by: Tianjia Zhang
    Signed-off-by: Mimi Zohar

    Tianjia Zhang
     
  • If CONFIG_LOAD_UEFI_KEYS is enabled, the kernel attempts to load the certs
    from the db, dbx and MokListRT EFI variables into the appropriate keyrings.

    But it just assumes that the variables will be present and prints an error
    if the certs can't be loaded, even when is possible that the variables may
    not exist. For example the MokListRT variable will only be present if shim
    is used.

    So only print an error message about failing to get the certs list from an
    EFI variable if this is found. Otherwise these printed errors just pollute
    the kernel log ring buffer with confusing messages like the following:

    [ 5.427251] Couldn't get size: 0x800000000000000e
    [ 5.427261] MODSIGN: Couldn't get UEFI db list
    [ 5.428012] Couldn't get size: 0x800000000000000e
    [ 5.428023] Couldn't get UEFI MokListRT

    Reported-by: Hans de Goede
    Signed-off-by: Javier Martinez Canillas
    Tested-by: Hans de Goede
    Acked-by: Ard Biesheuvel
    Signed-off-by: Mimi Zohar

    Javier Martinez Canillas
     

14 Feb, 2020

1 commit


12 Feb, 2020

5 commits


11 Feb, 2020

1 commit


10 Feb, 2020

5 commits

  • If seq_file .next function does not change position index,
    read after some lseek can generate unexpected output.

    $ dd if=/sys/fs/selinux/avc/cache_stats # usual output
    lookups hits misses allocations reclaims frees
    817223 810034 7189 7189 6992 7037
    1934894 1926896 7998 7998 7632 7683
    1322812 1317176 5636 5636 5456 5507
    1560571 1551548 9023 9023 9056 9115
    0+1 records in
    0+1 records out
    189 bytes copied, 5,1564e-05 s, 3,7 MB/s

    $# read after lseek to midle of last line
    $ dd if=/sys/fs/selinux/avc/cache_stats bs=180 skip=1
    dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
    056 9115 <<<< end of last line
    1560571 1551548 9023 9023 9056 9115 <<< whole last line once again
    0+1 records in
    0+1 records out
    45 bytes copied, 8,7221e-05 s, 516 kB/s

    $# read after lseek beyond end of of file
    $ dd if=/sys/fs/selinux/avc/cache_stats bs=1000 skip=1
    dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
    1560571 1551548 9023 9023 9056 9115 <<<< generates whole last line
    0+1 records in
    0+1 records out
    36 bytes copied, 9,0934e-05 s, 396 kB/s

    https://bugzilla.kernel.org/show_bug.cgi?id=206283

    Signed-off-by: Vasily Averin
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Vasily Averin
     
  • Currently symlinks on kernel filesystems, like sysfs, are labeled on
    creation with the parent filesystem root sid.

    Allow symlinks to inherit the parent directory context, so fine-grained
    kernfs labeling can be applied to symlinks too and checking contexts
    doesn't complain about them.

    For backward-compatibility this behavior is contained in a new policy
    capability: genfs_seclabel_symlinks

    Signed-off-by: Christian Göttsche
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Christian Göttsche
     
  • It never fails, so it can just return void.

    Signed-off-by: Ondrej Mosnacek
    Reviewed-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Deprecate setting the SELinux checkreqprot tunable to 1 via kernel
    parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left
    intact for compatibility since Android and some Linux distributions
    do so for security and treat an inability to set it as a fatal error.
    Eventually setting it to 0 will become a no-op and the kernel will
    stop using checkreqprot's value internally altogether.

    checkreqprot was originally introduced as a compatibility mechanism
    for legacy userspace and the READ_IMPLIES_EXEC personality flag.
    However, if set to 1, it weakens security by allowing mappings to be
    made executable without authorization by policy. The default value
    for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed
    from 1 to 0 in commit 2a35d196c160e3 ("selinux: change
    CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android
    and Linux distributions began explicitly setting
    /sys/fs/selinux/checkreqprot to 0 some time ago.

    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Stephen Smalley
     
  • It fits more naturally in selinux_state, since it reflects also global
    state (the enforcing and policyload fields).

    Signed-off-by: Ondrej Mosnacek
    Reviewed-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

09 Feb, 2020

1 commit

  • Pull vfs file system parameter updates from Al Viro:
    "Saner fs_parser.c guts and data structures. The system-wide registry
    of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
    the horror switch() in fs_parse() that would have to grow another case
    every time something got added to that system-wide registry.

    New syntax types can be added by filesystems easily now, and their
    namespace is that of functions - not of system-wide enum members. IOW,
    they can be shared or kept private and if some turn out to be widely
    useful, we can make them common library helpers, etc., without having
    to do anything whatsoever to fs_parse() itself.

    And we already get that kind of requests - the thing that finally
    pushed me into doing that was "oh, and let's add one for timeouts -
    things like 15s or 2h". If some filesystem really wants that, let them
    do it. Without somebody having to play gatekeeper for the variants
    blessed by direct support in fs_parse(), TYVM.

    Quite a bit of boilerplate is gone. And IMO the data structures make a
    lot more sense now. -200LoC, while we are at it"

    * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
    tmpfs: switch to use of invalfc()
    cgroup1: switch to use of errorfc() et.al.
    procfs: switch to use of invalfc()
    hugetlbfs: switch to use of invalfc()
    cramfs: switch to use of errofc() et.al.
    gfs2: switch to use of errorfc() et.al.
    fuse: switch to use errorfc() et.al.
    ceph: use errorfc() and friends instead of spelling the prefix out
    prefix-handling analogues of errorf() and friends
    turn fs_param_is_... into functions
    fs_parse: handle optional arguments sanely
    fs_parse: fold fs_parameter_desc/fs_parameter_spec
    fs_parser: remove fs_parameter_description name field
    add prefix to fs_context->log
    ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
    new primitive: __fs_parse()
    switch rbd and libceph to p_log-based primitives
    struct p_log, variants of warnf() et.al. taking that one instead
    teach logfc() to handle prefices, give it saner calling conventions
    get rid of cg_invalf()
    ...

    Linus Torvalds
     

08 Feb, 2020

2 commits


06 Feb, 2020

4 commits

  • Pull smack fix from Casey Schaufler:
    "One fix for an obscure error found using an old version of ping(1)
    that did not use IPv6 sockets in the documented way"

    * tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next:
    broken ping to ipv6 linklocal addresses on debian buster

    Linus Torvalds
     
  • Avoiding taking a lock in an IRQ context is not enough to prevent
    deadlocks, as discovered by syzbot:

    ===
    WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
    5.5.0-syzkaller #0 Not tainted
    -----------------------------------------------------
    syz-executor.0/8927 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
    ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
    ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: sidtab_sid2str_put.part.0+0x36/0x880 security/selinux/ss/sidtab.c:533

    and this task is already holding:
    ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
    ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: nf_conntrack_lock+0x17/0x70 net/netfilter/nf_conntrack_core.c:91
    which would create a new lock dependency:
    (&(&nf_conntrack_locks[i])->rlock){+.-.} -> (&(&s->cache_lock)->rlock){+.+.}

    but this new dependency connects a SOFTIRQ-irq-safe lock:
    (&(&nf_conntrack_locks[i])->rlock){+.-.}

    [...]

    other info that might help us debug this:

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&(&s->cache_lock)->rlock);
    local_irq_disable();
    lock(&(&nf_conntrack_locks[i])->rlock);
    lock(&(&s->cache_lock)->rlock);

    lock(&(&nf_conntrack_locks[i])->rlock);

    *** DEADLOCK ***
    [...]
    ===

    Fix this by simply locking with irqsave/irqrestore and stop giving up on
    !in_task(). It makes the locking a bit slower, but it shouldn't make a
    big difference in real workloads. Under the scenario from [1] (only
    cache hits) it only increased the runtime overhead from the
    security_secid_to_secctx() function from ~2% to ~3% (it was ~5-65%
    before introducing the cache).

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1733259

    Fixes: d97bd23c2d7d ("selinux: cache the SID -> context string translation")
    Reported-by: syzbot+61cba5033e2072d61806@syzkaller.appspotmail.com
    Signed-off-by: Ondrej Mosnacek
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Correct the filesystem name to "binder" to enable genfscon per-file
    labelling for binderfs.

    Fixes: 7a4b5194747 ("selinux: allow per-file labelling for binderfs")
    Signed-off-by: Hridya Valsaraju
    Acked-by: Stephen Smalley
    [PM: slight style changes to the subj/description]
    Signed-off-by: Paul Moore

    Hridya Valsaraju
     
  • I am seeing ping failures to IPv6 linklocal addresses with Debian
    buster. Easiest example to reproduce is:

    $ ping -c1 -w1 ff02::1%eth1
    connect: Invalid argument

    $ ping -c1 -w1 ff02::1%eth1
    PING ff02::01%eth1(ff02::1%eth1) 56 data bytes
    64 bytes from fe80::e0:f9ff:fe0c:37%eth1: icmp_seq=1 ttl=64 time=0.059 ms

    git bisect traced the failure to
    commit b9ef5513c99b ("smack: Check address length before reading address family")

    Arguably ping is being stupid since the buster version is not setting
    the address family properly (ping on stretch for example does):

    $ strace -e connect ping6 -c1 -w1 ff02::1%eth1
    connect(5, {sa_family=AF_UNSPEC,
    sa_data="\4\1\0\0\0\0\377\2\0\0\0\0\0\0\0\0\0\0\0\0\0\1\3\0\0\0"}, 28)
    = -1 EINVAL (Invalid argument)

    but the command works fine on kernels prior to this commit, so this is
    breakage which goes against the Linux paradigm of "don't break userspace"

    Cc: stable@vger.kernel.org
    Reported-by: David Ahern
    Suggested-by: Tetsuo Handa
    Signed-off-by: Casey Schaufler

     security/smack/smack_lsm.c | 41 +++++++++++++++++++----------------------
    1 file changed, 19 insertions(+), 22 deletions(-)

    Casey Schaufler
     

30 Jan, 2020

2 commits

  • …kernel/git/shuah/linux-kselftest

    Pull Kselftest kunit updates from Shuah Khan:
    "This kunit update consists of:

    - Support for building kunit as a module from Alan Maguire

    - AppArmor KUnit tests for policy unpack from Mike Salvatore"

    * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    kunit: building kunit as a module breaks allmodconfig
    kunit: update documentation to describe module-based build
    kunit: allow kunit to be loaded as a module
    kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
    kunit: allow kunit tests to be loaded as a module
    kunit: hide unexported try-catch interface in try-catch-impl.h
    kunit: move string-stream.h to lib/kunit
    apparmor: add AppArmor KUnit tests for policy unpack

    Linus Torvalds
     
  • Pull openat2 support from Al Viro:
    "This is the openat2() series from Aleksa Sarai.

    I'm afraid that the rest of namei stuff will have to wait - it got
    zero review the last time I'd posted #work.namei, and there had been a
    leak in the posted series I'd caught only last weekend. I was going to
    repost it on Monday, but the window opened and the odds of getting any
    review during that... Oh, well.

    Anyway, openat2 part should be ready; that _did_ get sane amount of
    review and public testing, so here it comes"

    From Aleksa's description of the series:
    "For a very long time, extending openat(2) with new features has been
    incredibly frustrating. This stems from the fact that openat(2) is
    possibly the most famous counter-example to the mantra "don't silently
    accept garbage from userspace" -- it doesn't check whether unknown
    flags are present[1].

    This means that (generally) the addition of new flags to openat(2) has
    been fraught with backwards-compatibility issues (O_TMPFILE has to be
    defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
    kernels gave errors, since it's insecure to silently ignore the
    flag[2]). All new security-related flags therefore have a tough road
    to being added to openat(2).

    Furthermore, the need for some sort of control over VFS's path
    resolution (to avoid malicious paths resulting in inadvertent
    breakouts) has been a very long-standing desire of many userspace
    applications.

    This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
    (which was a variant of David Drysdale's O_BENEATH patchset[4] which
    was a spin-off of the Capsicum project[5]) with a few additions and
    changes made based on the previous discussion within [6] as well as
    others I felt were useful.

    In line with the conclusions of the original discussion of
    AT_NO_JUMPS, the flag has been split up into separate flags. However,
    instead of being an openat(2) flag it is provided through a new
    syscall openat2(2) which provides several other improvements to the
    openat(2) interface (see the patch description for more details). The
    following new LOOKUP_* flags are added:

    LOOKUP_NO_XDEV:

    Blocks all mountpoint crossings (upwards, downwards, or through
    absolute links). Absolute pathnames alone in openat(2) do not
    trigger this. Magic-link traversal which implies a vfsmount jump is
    also blocked (though magic-link jumps on the same vfsmount are
    permitted).

    LOOKUP_NO_MAGICLINKS:

    Blocks resolution through /proc/$pid/fd-style links. This is done
    by blocking the usage of nd_jump_link() during resolution in a
    filesystem. The term "magic-links" is used to match with the only
    reference to these links in Documentation/, but I'm happy to change
    the name.

    It should be noted that this is different to the scope of
    ~LOOKUP_FOLLOW in that it applies to all path components. However,
    you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
    will *not* fail (assuming that no parent component was a
    magic-link), and you will have an fd for the magic-link.

    In order to correctly detect magic-links, the introduction of a new
    LOOKUP_MAGICLINK_JUMPED state flag was required.

    LOOKUP_BENEATH:

    Disallows escapes to outside the starting dirfd's
    tree, using techniques such as ".." or absolute links. Absolute
    paths in openat(2) are also disallowed.

    Conceptually this flag is to ensure you "stay below" a certain
    point in the filesystem tree -- but this requires some additional
    to protect against various races that would allow escape using
    "..".

    Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
    can trivially beam you around the filesystem (breaking the
    protection). In future, there might be similar safety checks done
    as in LOOKUP_IN_ROOT, but that requires more discussion.

    In addition, two new flags are added that expand on the above ideas:

    LOOKUP_NO_SYMLINKS:

    Does what it says on the tin. No symlink resolution is allowed at
    all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
    can still be used with NOFOLLOW to open an fd for the symlink as
    long as no parent path had a symlink component.

    LOOKUP_IN_ROOT:

    This is an extension of LOOKUP_BENEATH that, rather than blocking
    attempts to move past the root, forces all such movements to be
    scoped to the starting point. This provides chroot(2)-like
    protection but without the cost of a chroot(2) for each filesystem
    operation, as well as being safe against race attacks that
    chroot(2) is not.

    If a race is detected (as with LOOKUP_BENEATH) then an error is
    generated, and similar to LOOKUP_BENEATH it is not permitted to
    cross magic-links with LOOKUP_IN_ROOT.

    The primary need for this is from container runtimes, which
    currently need to do symlink scoping in userspace[7] when opening
    paths in a potentially malicious container.

    There is a long list of CVEs that could have bene mitigated by
    having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
    CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
    few).

    In order to make all of the above more usable, I'm working on
    libpathrs[8] which is a C-friendly library for safe path resolution.
    It features a userspace-emulated backend if the kernel doesn't support
    openat2(2). Hopefully we can get userspace to switch to using it, and
    thus get openat2(2) support for free once it's ready.

    Future work would include implementing things like
    RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
    programs to be sure they don't hit DoSes though stale NFS handles)"

    * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    Documentation: path-lookup: include new LOOKUP flags
    selftests: add openat2(2) selftests
    open: introduce openat2(2) syscall
    namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
    namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
    namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
    namei: LOOKUP_NO_XDEV: block mountpoint crossing
    namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
    namei: LOOKUP_NO_SYMLINKS: block symlink resolution
    namei: allow set_root() to produce errors
    namei: allow nd_jump_link() to produce errors
    nsfs: clean-up ns_get_path() signature to return int
    namei: only return -ECHILD from follow_dotdot_rcu()

    Linus Torvalds
     

29 Jan, 2020

1 commit