29 Apr, 2020

1 commit


27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Apr, 2020

2 commits

  • Pull SELinux fix from Paul Moore:
    "One small SELinux fix to ensure we cleanup properly on an error
    condition"

    * tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    selinux: free str on error in str_read()

    Linus Torvalds
     
  • If seq_file .next function does not change position index,
    read after some lseek can generate unexpected output:

    $ dd if=/proc/keys bs=1 # full usual output
    0f6bfdf5 I--Q--- 2 perm 3f010000 1000 1000 user 4af2f79ab8848d0a: 740
    1fb91b32 I--Q--- 3 perm 1f3f0000 1000 65534 keyring _uid.1000: 2
    27589480 I--Q--- 1 perm 0b0b0000 0 0 user invocation_id: 16
    2f33ab67 I--Q--- 152 perm 3f030000 0 0 keyring _ses: 2
    33f1d8fa I--Q--- 4 perm 3f030000 1000 1000 keyring _ses: 1
    3d427fda I--Q--- 2 perm 3f010000 1000 1000 user 69ec44aec7678e5a: 740
    3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1
    521+0 records in
    521+0 records out
    521 bytes copied, 0,00123769 s, 421 kB/s

    But a read after lseek in middle of last line results in the partial
    last line and then a repeat of the final line:

    $ dd if=/proc/keys bs=500 skip=1
    dd: /proc/keys: cannot skip to specified offset
    g _uid_ses.1000: 1
    3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1
    0+1 records in
    0+1 records out
    97 bytes copied, 0,000135035 s, 718 kB/s

    and a read after lseek beyond end of file results in the last line being
    shown:

    $ dd if=/proc/keys bs=1000 skip=1 # read after lseek beyond end of file
    dd: /proc/keys: cannot skip to specified offset
    3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1
    0+1 records in
    0+1 records out
    76 bytes copied, 0,000119981 s, 633 kB/s

    See https://bugzilla.kernel.org/show_bug.cgi?id=206283

    Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
    Signed-off-by: Vasily Averin
    Signed-off-by: David Howells
    Reviewed-by: Jarkko Sakkinen
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Vasily Averin
     

16 Apr, 2020

1 commit

  • In [see "Fixes:"] I missed the fact that str_read() may give back an
    allocated pointer even if it returns an error, causing a potential
    memory leak in filename_trans_read_one(). Fix this by making the
    function free the allocated string whenever it returns a non-zero value,
    which also makes its behavior more obvious and prevents repeating the
    same mistake in the future.

    Reported-by: coverity-bot
    Addresses-Coverity-ID: 1461665 ("Resource leaks")
    Fixes: c3a276111ea2 ("selinux: optimize storage of filename transitions")
    Signed-off-by: Ondrej Mosnacek
    Reviewed-by: Kees Cook
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

05 Apr, 2020

1 commit

  • Pull keyrings fixes from David Howells:
    "Here's a couple of patches that fix a circular dependency between
    holding key->sem and mm->mmap_sem when reading data from a key.

    One potential issue is that a filesystem looking to use a key inside,
    say, ->readpages() could deadlock if the key being read is the key
    that's required and the buffer the key is being read into is on a page
    that needs to be fetched.

    The case actually detected is a bit more involved - with a filesystem
    calling request_key() and locking the target keyring for write - which
    could be being read"

    * tag 'keys-fixes-20200329' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    KEYS: Avoid false positive ENOMEM error on key read
    KEYS: Don't write out to userspace while holding key semaphore

    Linus Torvalds
     

04 Apr, 2020

1 commit

  • Pull SPDX updates from Greg KH:
    "Here are three SPDX patches for 5.7-rc1.

    One fixes up the SPDX tag for a single driver, while the other two go
    through the tree and add SPDX tags for all of the .gitignore files as
    needed.

    Nothing too complex, but you will get a merge conflict with your
    current tree, that should be trivial to handle (one file modified by
    two things, one file deleted.)

    All three of these have been in linux-next for a while, with no
    reported issues other than the merge conflict"

    * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
    ASoC: MT6660: make spdxcheck.py happy
    .gitignore: add SPDX License Identifier
    .gitignore: remove too obvious comments

    Linus Torvalds
     

03 Apr, 2020

1 commit

  • Pull integrity updates from Mimi Zohar:
    "Just a couple of updates for linux-5.7:

    - A new Kconfig option to enable IMA architecture specific runtime
    policy rules needed for secure and/or trusted boot, as requested.

    - Some message cleanup (eg. pr_fmt, additional error messages)"

    * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
    ima: add a new CONFIG for loading arch-specific policies
    integrity: Remove duplicate pr_fmt definitions
    IMA: Add log statements for failure conditions
    IMA: Update KBUILD_MODNAME for IMA files to ima

    Linus Torvalds
     

01 Apr, 2020

2 commits

  • Pull networking updates from David Miller:
    "Highlights:

    1) Fix the iwlwifi regression, from Johannes Berg.

    2) Support BSS coloring and 802.11 encapsulation offloading in
    hardware, from John Crispin.

    3) Fix some potential Spectre issues in qtnfmac, from Sergey
    Matyukevich.

    4) Add TTL decrement action to openvswitch, from Matteo Croce.

    5) Allow paralleization through flow_action setup by not taking the
    RTNL mutex, from Vlad Buslov.

    6) A lot of zero-length array to flexible-array conversions, from
    Gustavo A. R. Silva.

    7) Align XDP statistics names across several drivers for consistency,
    from Lorenzo Bianconi.

    8) Add various pieces of infrastructure for offloading conntrack, and
    make use of it in mlx5 driver, from Paul Blakey.

    9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

    10) Lots of parallelization improvements during configuration changes
    in mlxsw driver, from Ido Schimmel.

    11) Add support to devlink for generic packet traps, which report
    packets dropped during ACL processing. And use them in mlxsw
    driver. From Jiri Pirko.

    12) Support bcmgenet on ACPI, from Jeremy Linton.

    13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
    Starovoitov, and your's truly.

    14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

    15) Fix sysfs permissions when network devices change namespaces, from
    Christian Brauner.

    16) Add a flags element to ethtool_ops so that drivers can more simply
    indicate which coalescing parameters they actually support, and
    therefore the generic layer can validate the user's ethtool
    request. Use this in all drivers, from Jakub Kicinski.

    17) Offload FIFO qdisc in mlxsw, from Petr Machata.

    18) Support UDP sockets in sockmap, from Lorenz Bauer.

    19) Fix stretch ACK bugs in several TCP congestion control modules,
    from Pengcheng Yang.

    20) Support virtual functiosn in octeontx2 driver, from Tomasz
    Duszynski.

    21) Add region operations for devlink and use it in ice driver to dump
    NVM contents, from Jacob Keller.

    22) Add support for hw offload of MACSEC, from Antoine Tenart.

    23) Add support for BPF programs that can be attached to LSM hooks,
    from KP Singh.

    24) Support for multiple paths, path managers, and counters in MPTCP.
    From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
    and others.

    25) More progress on adding the netlink interface to ethtool, from
    Michal Kubecek"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
    net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
    cxgb4/chcr: nic-tls stats in ethtool
    net: dsa: fix oops while probing Marvell DSA switches
    net/bpfilter: remove superfluous testing message
    net: macb: Fix handling of fixed-link node
    net: dsa: ksz: Select KSZ protocol tag
    netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
    net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
    net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
    net: stmmac: create dwmac-intel.c to contain all Intel platform
    net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
    net: dsa: bcm_sf2: Add support for matching VLAN TCI
    net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
    net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
    net: dsa: bcm_sf2: Disable learning for ASP port
    net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
    net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
    net: dsa: b53: Restore VLAN entries upon (re)configuration
    net: dsa: bcm_sf2: Fix overflow checks
    hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
    ...

    Linus Torvalds
     
  • Pull SELinux updates from Paul Moore:
    "We've got twenty SELinux patches for the v5.7 merge window, the
    highlights are below:

    - Deprecate setting /sys/fs/selinux/checkreqprot to 1.

    This flag was originally created to deal with legacy userspace and
    the READ_IMPLIES_EXEC personality flag. We changed the default from
    1 to 0 back in Linux v4.4 and now we are taking the next step of
    deprecating it, at some point in the future we will take the final
    step of rejecting 1.

    - Allow kernfs symlinks to inherit the SELinux label of the parent
    directory. In order to preserve backwards compatibility this is
    protected by the genfs_seclabel_symlinks SELinux policy capability.

    - Optimize how we store filename transitions in the kernel, resulting
    in some significant improvements to policy load times.

    - Do a better job calculating our internal hash table sizes which
    resulted in additional policy load improvements and likely general
    SELinux performance improvements as well.

    - Remove the unused initial SIDs (labels) and improve how we handle
    initial SIDs.

    - Enable per-file labeling for the bpf filesystem.

    - Ensure that we properly label NFS v4.2 filesystems to avoid a
    temporary unlabeled condition.

    - Add some missing XFS quota command types to the SELinux quota
    access controls.

    - Fix a problem where we were not updating the seq_file position
    index correctly in selinuxfs.

    - We consolidate some duplicated code into helper functions.

    - A number of list to array conversions.

    - Update Stephen Smalley's email address in MAINTAINERS"

    * tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    selinux: clean up indentation issue with assignment statement
    NFS: Ensure security label is set for root inode
    MAINTAINERS: Update my email address
    selinux: avtab_init() and cond_policydb_init() return void
    selinux: clean up error path in policydb_init()
    selinux: remove unused initial SIDs and improve handling
    selinux: reduce the use of hard-coded hash sizes
    selinux: Add xfs quota command types
    selinux: optimize storage of filename transitions
    selinux: factor out loop body from filename_trans_read()
    security: selinux: allow per-file labeling for bpffs
    selinux: generalize evaluate_cond_node()
    selinux: convert cond_expr to array
    selinux: convert cond_av_list to array
    selinux: convert cond_list to array
    selinux: sel_avc_get_stat_idx should increase position index
    selinux: allow kernfs symlinks to inherit parent directory context
    selinux: simplify evaluate_cond_node()
    Documentation,selinux: deprecate setting checkreqprot to 1
    selinux: move status variables out of selinux_ss

    Linus Torvalds
     

31 Mar, 2020

2 commits

  • The assignment of e->type_names is indented one level too deep,
    clean this up by removing the extraneous tab.

    Signed-off-by: Colin Ian King
    Signed-off-by: Paul Moore

    Colin Ian King
     
  • Pull EFI updates from Ingo Molnar:
    "The EFI changes in this cycle are much larger than usual, for two
    (positive) reasons:

    - The GRUB project is showing signs of life again, resulting in the
    introduction of the generic Linux/UEFI boot protocol, instead of
    x86 specific hacks which are increasingly difficult to maintain.
    There's hope that all future extensions will now go through that
    boot protocol.

    - Preparatory work for RISC-V EFI support.

    The main changes are:

    - Boot time GDT handling changes

    - Simplify handling of EFI properties table on arm64

    - Generic EFI stub cleanups, to improve command line handling, file
    I/O, memory allocation, etc.

    - Introduce a generic initrd loading method based on calling back
    into the firmware, instead of relying on the x86 EFI handover
    protocol or device tree.

    - Introduce a mixed mode boot method that does not rely on the x86
    EFI handover protocol either, and could potentially be adopted by
    other architectures (if another one ever surfaces where one
    execution mode is a superset of another)

    - Clean up the contents of 'struct efi', and move out everything that
    doesn't need to be stored there.

    - Incorporate support for UEFI spec v2.8A changes that permit
    firmware implementations to return EFI_UNSUPPORTED from UEFI
    runtime services at OS runtime, and expose a mask of which ones are
    supported or unsupported via a configuration table.

    - Partial fix for the lack of by-VA cache maintenance in the
    decompressor on 32-bit ARM.

    - Changes to load device firmware from EFI boot service memory
    regions

    - Various documentation updates and minor code cleanups and fixes"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    efi/libstub/arm: Fix spurious message that an initrd was loaded
    efi/libstub/arm64: Avoid image_base value from efi_loaded_image
    partitions/efi: Fix partition name parsing in GUID partition entry
    efi/x86: Fix cast of image argument
    efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
    efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
    efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
    efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
    efi/x86: Ignore the memory attributes table on i386
    efi/x86: Don't relocate the kernel unless necessary
    efi/x86: Remove extra headroom for setup block
    efi/x86: Add kernel preferred address to PE header
    efi/x86: Decompress at start of PE image load address
    x86/boot/compressed/32: Save the output address instead of recalculating it
    efi/libstub/x86: Deal with exit() boot service returning
    x86/boot: Use unsigned comparison for addresses
    efi/x86: Avoid using code32_start
    efi/x86: Make efi32_pe_entry() more readable
    efi/x86: Respect 32-bit ABI in efi32_pe_entry()
    efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
    ...

    Linus Torvalds
     

30 Mar, 2020

2 commits

  • * The hooks are initialized using the definitions in
    include/linux/lsm_hook_defs.h.
    * The LSM can be enabled / disabled with CONFIG_BPF_LSM.

    Signed-off-by: KP Singh
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Brendan Jackman
    Reviewed-by: Florent Revest
    Acked-by: Kees Cook
    Acked-by: James Morris
    Link: https://lore.kernel.org/bpf/20200329004356.27286-6-kpsingh@chromium.org

    KP Singh
     
  • The information about the different types of LSM hooks is scattered
    in two locations i.e. union security_list_options and
    struct security_hook_heads. Rather than duplicating this information
    even further for BPF_PROG_TYPE_LSM, define all the hooks with the
    LSM_HOOK macro in lsm_hook_defs.h which is then used to generate all
    the data structures required by the LSM framework.

    The LSM hooks are defined as:

    LSM_HOOK(, , , args...)

    with acccessible in security.c as:

    LSM_RET_DEFAULT()

    Signed-off-by: KP Singh
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Brendan Jackman
    Reviewed-by: Florent Revest
    Reviewed-by: Kees Cook
    Reviewed-by: Casey Schaufler
    Acked-by: James Morris
    Link: https://lore.kernel.org/bpf/20200329004356.27286-3-kpsingh@chromium.org

    KP Singh
     

29 Mar, 2020

2 commits

  • By allocating a kernel buffer with a user-supplied buffer length, it
    is possible that a false positive ENOMEM error may be returned because
    the user-supplied length is just too large even if the system do have
    enough memory to hold the actual key data.

    Moreover, if the buffer length is larger than the maximum amount of
    memory that can be returned by kmalloc() (2^(MAX_ORDER-1) number of
    pages), a warning message will also be printed.

    To reduce this possibility, we set a threshold (PAGE_SIZE) over which we
    do check the actual key length first before allocating a buffer of the
    right size to hold it. The threshold is arbitrary, it is just used to
    trigger a buffer length check. It does not limit the actual key length
    as long as there is enough memory to satisfy the memory request.

    To further avoid large buffer allocation failure due to page
    fragmentation, kvmalloc() is used to allocate the buffer so that vmapped
    pages can be used when there is not a large enough contiguous set of
    pages available for allocation.

    In the extremely unlikely scenario that the key keeps on being changed
    and made longer (still
    Signed-off-by: David Howells

    Waiman Long
     
  • A lockdep circular locking dependency report was seen when running a
    keyutils test:

    [12537.027242] ======================================================
    [12537.059309] WARNING: possible circular locking dependency detected
    [12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
    [12537.125253] ------------------------------------------------------
    [12537.153189] keyctl/25598 is trying to acquire lock:
    [12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
    [12537.208365]
    [12537.208365] but task is already holding lock:
    [12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
    [12537.270476]
    [12537.270476] which lock already depends on the new lock.
    [12537.270476]
    [12537.307209]
    [12537.307209] the existing dependency chain (in reverse order) is:
    [12537.340754]
    [12537.340754] -> #3 (&type->lock_class){++++}:
    [12537.367434] down_write+0x4d/0x110
    [12537.385202] __key_link_begin+0x87/0x280
    [12537.405232] request_key_and_link+0x483/0xf70
    [12537.427221] request_key+0x3c/0x80
    [12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
    [12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
    [12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
    [12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
    [12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
    [12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
    [12537.601045] kthread+0x30c/0x3d0
    [12537.617906] ret_from_fork+0x3a/0x50
    [12537.636225]
    [12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
    [12537.664525] __mutex_lock+0x105/0x11f0
    [12537.683734] request_key_and_link+0x35a/0xf70
    [12537.705640] request_key+0x3c/0x80
    [12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
    [12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
    [12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
    [12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
    [12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
    [12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
    [12537.873477] kthread+0x30c/0x3d0
    [12537.890281] ret_from_fork+0x3a/0x50
    [12537.908649]
    [12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
    [12537.935225] __mutex_lock+0x105/0x11f0
    [12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
    [12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
    [12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
    [12538.023920] read_pages+0xf5/0x560
    [12538.041583] __do_page_cache_readahead+0x41d/0x4b0
    [12538.067047] ondemand_readahead+0x44c/0xc10
    [12538.092069] filemap_fault+0xec1/0x1830
    [12538.111637] __do_fault+0x82/0x260
    [12538.129216] do_fault+0x419/0xfb0
    [12538.146390] __handle_mm_fault+0x862/0xdf0
    [12538.167408] handle_mm_fault+0x154/0x550
    [12538.187401] __do_page_fault+0x42f/0xa60
    [12538.207395] do_page_fault+0x38/0x5e0
    [12538.225777] page_fault+0x1e/0x30
    [12538.243010]
    [12538.243010] -> #0 (&mm->mmap_sem){++++}:
    [12538.267875] lock_acquire+0x14c/0x420
    [12538.286848] __might_fault+0x119/0x1b0
    [12538.306006] keyring_read_iterator+0x7e/0x170
    [12538.327936] assoc_array_subtree_iterate+0x97/0x280
    [12538.352154] keyring_read+0xe9/0x110
    [12538.370558] keyctl_read_key+0x1b9/0x220
    [12538.391470] do_syscall_64+0xa5/0x4b0
    [12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
    [12538.435535]
    [12538.435535] other info that might help us debug this:
    [12538.435535]
    [12538.472829] Chain exists of:
    [12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
    [12538.472829]
    [12538.524820] Possible unsafe locking scenario:
    [12538.524820]
    [12538.551431] CPU0 CPU1
    [12538.572654] ---- ----
    [12538.595865] lock(&type->lock_class);
    [12538.613737] lock(root_key_user.cons_lock);
    [12538.644234] lock(&type->lock_class);
    [12538.672410] lock(&mm->mmap_sem);
    [12538.687758]
    [12538.687758] *** DEADLOCK ***
    [12538.687758]
    [12538.714455] 1 lock held by keyctl/25598:
    [12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
    [12538.770573]
    [12538.770573] stack backtrace:
    [12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
    [12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
    [12538.881963] Call Trace:
    [12538.892897] dump_stack+0x9a/0xf0
    [12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
    [12538.932891] ? save_trace+0xd6/0x250
    [12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
    [12538.971643] ? keyring_compare_object+0x104/0x190
    [12538.992738] ? check_usage+0x550/0x550
    [12539.009845] ? sched_clock+0x5/0x10
    [12539.025484] ? sched_clock_cpu+0x18/0x1e0
    [12539.043555] __lock_acquire+0x1f12/0x38d0
    [12539.061551] ? trace_hardirqs_on+0x10/0x10
    [12539.080554] lock_acquire+0x14c/0x420
    [12539.100330] ? __might_fault+0xc4/0x1b0
    [12539.119079] __might_fault+0x119/0x1b0
    [12539.135869] ? __might_fault+0xc4/0x1b0
    [12539.153234] keyring_read_iterator+0x7e/0x170
    [12539.172787] ? keyring_read+0x110/0x110
    [12539.190059] assoc_array_subtree_iterate+0x97/0x280
    [12539.211526] keyring_read+0xe9/0x110
    [12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
    [12539.249076] keyctl_read_key+0x1b9/0x220
    [12539.266660] do_syscall_64+0xa5/0x4b0
    [12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

    One way to prevent this deadlock scenario from happening is to not
    allow writing to userspace while holding the key semaphore. Instead,
    an internal buffer is allocated for getting the keys out from the
    read method first before copying them out to userspace without holding
    the lock.

    That requires taking out the __user modifier from all the relevant
    read methods as well as additional changes to not use any userspace
    write helpers. That is,

    1) The put_user() call is replaced by a direct copy.
    2) The copy_to_user() call is replaced by memcpy().
    3) All the fault handling code is removed.

    Compiling on a x86-64 system, the size of the rxrpc_read() function is
    reduced from 3795 bytes to 2384 bytes with this patch.

    Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reviewed-by: Jarkko Sakkinen
    Signed-off-by: Waiman Long
    Signed-off-by: David Howells

    Waiman Long
     

25 Mar, 2020

2 commits


16 Mar, 2020

1 commit

  • Currently, when we add a new user key, the calltrace as below:

    add_key()
    key_create_or_update()
    key_alloc()
    __key_instantiate_and_link
    generic_key_instantiate
    key_payload_reserve
    ......

    Since commit a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly"),
    we can reach max bytes/keys in key_alloc, but we forget to remove this
    limit when we reserver space for payload in key_payload_reserve. So we
    can only reach max keys but not max bytes when having delta between plen
    and type->def_datalen. Remove this limit when instantiating the key, so we
    can keep consistent with key_alloc.

    Also, fix the similar problem in keyctl_chown_key().

    Fixes: 0b77f5bfb45c ("keys: make the keyring quotas controllable through /proc/sys")
    Fixes: a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly")
    Cc: stable@vger.kernel.org # 5.0.x
    Cc: Eric Biggers
    Signed-off-by: Yang Xu
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Eric Biggers
    Signed-off-by: Jarkko Sakkinen

    Yang Xu
     

12 Mar, 2020

1 commit

  • Every time a new architecture defines the IMA architecture specific
    functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the IMA
    include file needs to be updated. To avoid this "noise", this patch
    defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option, allowing
    the different architectures to select it.

    Suggested-by: Linus Torvalds
    Signed-off-by: Nayna Jain
    Acked-by: Ard Biesheuvel
    Acked-by: Philipp Rudo (s390)
    Acked-by: Michael Ellerman (powerpc)
    Signed-off-by: Mimi Zohar

    Nayna Jain
     

06 Mar, 2020

2 commits


29 Feb, 2020

3 commits

  • The #define for formatting log messages, pr_fmt, is duplicated in the
    files under security/integrity.

    This change moves the definition to security/integrity/integrity.h and
    removes the duplicate definitions in the other files under
    security/integrity.

    With this change, the messages in the following files will be prefixed
    with 'integrity'.

    security/integrity/platform_certs/platform_keyring.c
    security/integrity/platform_certs/load_powerpc.c
    security/integrity/platform_certs/load_uefi.c
    security/integrity/iint.c

    e.g. "integrity: Error adding keys to platform keyring %s\n"

    And the messages in the following file will be prefixed with 'ima'.

    security/integrity/ima/ima_mok.c

    e.g. "ima: Allocating IMA blacklist keyring.\n"

    For the rest of the files under security/integrity, there will be no
    change in the message format.

    Suggested-by: Shuah Khan
    Suggested-by: Joe Perches
    Signed-off-by: Tushar Sugandhi
    Reviewed-by: Lakshmi Ramasubramanian
    Signed-off-by: Mimi Zohar

    Tushar Sugandhi
     
  • process_buffer_measurement() does not have log messages for failure
    conditions.

    This change adds a log statement in the above function.

    Suggested-by: Joe Perches
    Signed-off-by: Tushar Sugandhi
    Reviewed-by: Mimi Zohar
    Reviewed-by: Lakshmi Ramasubramanian
    Signed-off-by: Mimi Zohar

    Tushar Sugandhi
     
  • The kbuild Makefile specifies object files for vmlinux in the $(obj-y)
    lists. These lists depend on the kernel configuration[1].

    The kbuild Makefile for IMA combines the object files for IMA into a
    single object file namely ima.o. All the object files for IMA should be
    combined into ima.o. But certain object files are being added to their
    own $(obj-y). This results in the log messages from those modules getting
    prefixed with their respective base file name, instead of "ima". This is
    inconsistent with the log messages from the IMA modules that are combined
    into ima.o.

    This change fixes the above issue.

    [1] Documentation\kbuild\makefiles.rst

    Signed-off-by: Tushar Sugandhi
    Reviewed-by: Mimi Zohar
    Reviewed-by: Lakshmi Ramasubramanian
    Signed-off-by: Mimi Zohar

    Tushar Sugandhi
     

28 Feb, 2020

2 commits

  • Remove initial SIDs that have never been used or are no longer used by
    the kernel from its string table, which is also used to generate the
    SECINITSID_* symbols referenced in code. Update the code to
    gracefully handle the fact that these can now be NULL. Stop treating
    it as an error if a policy defines additional initial SIDs unknown to
    the kernel. Do not load unused initial SID contexts into the sidtab.
    Fix the incorrect usage of the name from the ocontext in error
    messages when loading initial SIDs since these are not presently
    written to the kernel policy and are therefore always NULL.

    After this change, it is possible to safely reclaim and reuse some of
    the unused initial SIDs without compatibility issues. Specifically,
    unused initial SIDs that were being assigned the same context as the
    unlabeled initial SID in policies can be reclaimed and reused for
    another purpose, with existing policies still treating them as having
    the unlabeled context and future policies having the option of mapping
    them to a more specific context. For example, this could have been
    used when the infiniband labeling support was introduced to define
    initial SIDs for the default pkey and endport SIDs similar to the
    handling of port/netif/node SIDs rather than always using
    SECINITSID_UNLABELED as the default.

    The set of safely reclaimable unused initial SIDs across all known
    policies is igmp_packet (13), icmp_socket (14), tcp_socket (15), kmod
    (24), policy (25), and scmp_packet (26); these initial SIDs were
    assigned the same context as unlabeled in all known policies including
    mls. If only considering non-mls policies (i.e. assuming that mls
    users always upgrade policy with their kernels), the set of safely
    reclaimable unused initial SIDs further includes file_labels (6), init
    (7), sysctl_modprobe (16), and sysctl_fs (18) through sysctl_dev (23).

    Adding new initial SIDs beyond SECINITSID_NUM to policy unfortunately
    became a fatal error in commit 24ed7fdae669 ("selinux: use separate
    table for initial SID lookup") and even before that it could cause
    problems on a policy reload (collision between the new initial SID and
    one allocated at runtime) ever since commit 42596eafdd75 ("selinux:
    load the initial SIDs upon every policy load") so we cannot safely
    start adding new initial SIDs to policies beyond SECINITSID_NUM (27)
    until such a time as all such kernels do not need to be supported and
    only those that include this commit are relevant. That is not a big
    deal since we haven't added a new initial SID since 2004 (v2.6.7) and
    we have plenty of unused ones we can reclaim if we truly need one.

    If we want to avoid the wasted storage in initial_sid_to_string[]
    and/or sidtab->isids[] for the unused initial SIDs, we could introduce
    an indirection between the kernel initial SID values and the policy
    initial SID values and just map the policy SID values in the ocontexts
    to the kernel values during policy_load_isids(). Originally I thought
    we'd do this by preserving the initial SID names in the kernel policy
    and creating a mapping at load time like we do for the security
    classes and permissions but that would require a new kernel policy
    format version and associated changes to libsepol/checkpolicy and I'm
    not sure it is justified. Simpler approach is just to create a fixed
    mapping table in the kernel from the existing fixed policy values to
    the kernel values. Less flexible but probably sufficient.

    A separate selinux userspace change was applied in
    https://github.com/SELinuxProject/selinux/commit/8677ce5e8f592950ae6f14cea1b68a20ddc1ac25
    to enable removal of most of the unused initial SID contexts from
    policies, but there is no dependency between that change and this one.
    That change permits removing all of the unused initial SID contexts
    from policy except for the fs and sysctl SID contexts. The initial
    SID declarations themselves would remain in policy to preserve the
    values of subsequent ones but the contexts can be dropped. If/when
    the kernel decides to reuse one of them, future policies can change
    the name and start assigning a context again without breaking
    compatibility.

    Here is how I would envision staging changes to the initial SIDs in a
    compatible manner after this commit is applied:

    1. At any time after this commit is applied, the kernel could choose
    to reclaim one of the safely reclaimable unused initial SIDs listed
    above for a new purpose (i.e. replace its NULL entry in the
    initial_sid_to_string[] table with a new name and start using the
    newly generated SECINITSID_name symbol in code), and refpolicy could
    at that time rename its declaration of that initial SID to reflect its
    new purpose and start assigning it a context going
    forward. Existing/old policies would map the reclaimed initial SID to
    the unlabeled context, so that would be the initial default behavior
    until policies are updated. This doesn't depend on the selinux
    userspace change; it will work with existing policies and userspace.

    2. In 6 months or so we'll have another SELinux userspace release that
    will include the libsepol/checkpolicy support for omitting unused
    initial SID contexts.

    3. At any time after that release, refpolicy can make that release its
    minimum build requirement and drop the sid context statements (but not
    the sid declarations) for all of the unused initial SIDs except for
    fs and sysctl, which must remain for compatibility on policy
    reload with old kernels and for compatibility with kernels that were
    still using SECINITSID_SYSCTL (< 2.6.39). This doesn't depend on this
    kernel commit; it will work with previous kernels as well.

    4. After N years for some value of N, refpolicy decides that it no
    longer cares about policy reload compatibility for kernels that
    predate this kernel commit, and refpolicy drops the fs and sysctl
    SID contexts from policy too (but retains the declarations).

    5. After M years for some value of M, the kernel decides that it no
    longer cares about compatibility with refpolicies that predate step 4
    (dropping the fs and sysctl SIDs), and those two SIDs also become
    safely reclaimable. This step is optional and need not ever occur unless
    we decide that the need to reclaim those two SIDs outweighs the
    compatibility cost.

    6. After O years for some value of O, refpolicy decides that it no
    longer cares about policy load (not just reload) compatibility for
    kernels that predate this kernel commit, and both kernel and refpolicy
    can then start adding and using new initial SIDs beyond 27. This does
    not depend on the previous change (step 5) and can occur independent
    of it.

    Fixes: https://github.com/SELinuxProject/selinux-kernel/issues/12
    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Stephen Smalley
     
  • Instead allocate hash tables with just the right size based on the
    actual number of elements (which is almost always known beforehand, we
    just need to defer the hashtab allocation to the right time). The only
    case when we don't know the size (with the current policy format) is the
    new filename transitions hashtable. Here I just left the existing value.

    After this patch, the time to load Fedora policy on x86_64 decreases
    from 790 ms to 167 ms. If the unconfined module is removed, it decreases
    from 750 ms to 122 ms. It is also likely that other operations are going
    to be faster, mainly string_to_context_struct() or mls_compute_sid(),
    but I didn't try to quantify that.

    The memory usage of all hash table arrays increases from ~58 KB to
    ~163 KB (with Fedora policy on x86_64).

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

26 Feb, 2020

1 commit

  • Pull EFI updates for v5.7 from Ard Biesheuvel:

    This time, the set of changes for the EFI subsystem is much larger than
    usual. The main reasons are:

    - Get things cleaned up before EFI support for RISC-V arrives, which will
    increase the size of the validation matrix, and therefore the threshold to
    making drastic changes,

    - After years of defunct maintainership, the GRUB project has finally started
    to consider changes from the distros regarding UEFI boot, some of which are
    highly specific to the way x86 does UEFI secure boot and measured boot,
    based on knowledge of both shim internals and the layout of bootparams and
    the x86 setup header. Having this maintenance burden on other architectures
    (which don't need shim in the first place) is hard to justify, so instead,
    we are introducing a generic Linux/UEFI boot protocol.

    Summary of changes:

    - Boot time GDT handling changes (Arvind)

    - Simplify handling of EFI properties table on arm64

    - Generic EFI stub cleanups, to improve command line handling, file I/O,
    memory allocation, etc.

    - Introduce a generic initrd loading method based on calling back into
    the firmware, instead of relying on the x86 EFI handover protocol or
    device tree.

    - Introduce a mixed mode boot method that does not rely on the x86 EFI
    handover protocol either, and could potentially be adopted by other
    architectures (if another one ever surfaces where one execution mode
    is a superset of another)

    - Clean up the contents of struct efi, and move out everything that
    doesn't need to be stored there.

    - Incorporate support for UEFI spec v2.8A changes that permit firmware
    implementations to return EFI_UNSUPPORTED from UEFI runtime services at
    OS runtime, and expose a mask of which ones are supported or unsupported
    via a configuration table.

    - Various documentation updates and minor code cleanups (Heinrich)

    - Partial fix for the lack of by-VA cache maintenance in the decompressor
    on 32-bit ARM. Note that these patches were deliberately put at the
    beginning so they can be used as a stable branch that will be shared with
    a PR containing the complete fix, which I will send to the ARM tree.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Feb, 2020

1 commit


23 Feb, 2020

2 commits

  • Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod
    permission check.

    Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger
    filesystem quotaget permission check.

    Signed-off-by: Richard Haines
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Paul Moore

    Richard Haines
     
  • In these rules, each rule with the same (target type, target class,
    filename) values is (in practice) always mapped to the same result type.
    Therefore, it is much more efficient to group the rules by (ttype,
    tclass, filename).

    Thus, this patch drops the stype field from the key and changes the
    datum to be a linked list of one or more structures that contain a
    result type and an ebitmap of source types that map the given target to
    the given result type under the given filename. The size of the hash
    table is also incremented to 2048 to be more optimal for Fedora policy
    (which currently has ~2500 unique (ttype, tclass, filename) tuples,
    regardless of whether the 'unconfined' module is enabled).

    Not only does this dramtically reduce memory usage when the policy
    contains a lot of unconfined domains (ergo a lot of filename based
    transitions), but it also slightly reduces memory usage of strongly
    confined policies (modeled on Fedora policy with 'unconfined' module
    disabled) and significantly reduces lookup times of these rules on
    Fedora (roughly matches the performance of the rhashtable conversion
    patch [1] posted recently to selinux@vger.kernel.org).

    An obvious next step is to change binary policy format to match this
    layout, so that disk space is also saved. However, since that requires
    more work (including matching userspace changes) and this patch is
    already beneficial on its own, I'm posting it separately.

    Performance/memory usage comparison:

    Kernel | Policy load | Policy load | Mem usage | Mem usage | openbench
    | | (-unconfined) | | (-unconfined) | (createfiles)
    -----------------|-------------|---------------|-----------|---------------|--------------
    reference | 1,30s | 0,91s | 90MB | 77MB | 55 us/file
    rhashtable patch | 0.98s | 0,85s | 85MB | 75MB | 38 us/file
    this patch | 0,95s | 0,87s | 75MB | 75MB | 40 us/file

    (Memory usage is measured after boot. With SELinux disabled the memory
    usage was ~60MB on the same system.)

    [1] https://lore.kernel.org/selinux/20200116213937.77795-1-dev@lynxeye.de/T/

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

21 Feb, 2020

1 commit

  • Pull IMA fixes from Mimi Zohar:
    "Two bug fixes and an associated change for each.

    The one that adds SM3 to the IMA list of supported hash algorithms is
    a simple change, but could be considered a new feature"

    * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
    ima: add sm3 algorithm to hash algorithm configuration list
    crypto: rename sm3-256 to sm3 in hash_algo_name
    efi: Only print errors about failing to get certs if EFI vars are found
    x86/ima: use correct identifier for SetupMode variable

    Linus Torvalds
     

18 Feb, 2020

2 commits

  • sm3 has been supported by the ima hash algorithm, but it is not
    yet in the Kconfig configuration list. After adding, both ima and tpm2
    can support sm3 well.

    Signed-off-by: Tianjia Zhang
    Signed-off-by: Mimi Zohar

    Tianjia Zhang
     
  • If CONFIG_LOAD_UEFI_KEYS is enabled, the kernel attempts to load the certs
    from the db, dbx and MokListRT EFI variables into the appropriate keyrings.

    But it just assumes that the variables will be present and prints an error
    if the certs can't be loaded, even when is possible that the variables may
    not exist. For example the MokListRT variable will only be present if shim
    is used.

    So only print an error message about failing to get the certs list from an
    EFI variable if this is found. Otherwise these printed errors just pollute
    the kernel log ring buffer with confusing messages like the following:

    [ 5.427251] Couldn't get size: 0x800000000000000e
    [ 5.427261] MODSIGN: Couldn't get UEFI db list
    [ 5.428012] Couldn't get size: 0x800000000000000e
    [ 5.428023] Couldn't get UEFI MokListRT

    Reported-by: Hans de Goede
    Signed-off-by: Javier Martinez Canillas
    Tested-by: Hans de Goede
    Acked-by: Ard Biesheuvel
    Signed-off-by: Mimi Zohar

    Javier Martinez Canillas
     

14 Feb, 2020

1 commit


12 Feb, 2020

5 commits