29 Oct, 2020

2 commits


26 Oct, 2020

3 commits

  • …nux/kernel/git/mchehab/linux-media") into android-mainline

    Steps on the way to 5.10-rc1

    Resolves conflicts in:
    fs/userfaultfd.c

    Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
    Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af

    Greg Kroah-Hartman
     
  • tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
    fact a "pointer to (pointer to int in userspace) in userspace". So
    sparse rightfully complains about passing a kernel pointer to
    put_user().

    Reported-by: kernel test robot
    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • Pull SafeSetID updates from Micah Morton:
    "The changes are mostly contained to within the SafeSetID LSM, with the
    exception of a few 1-line changes to change some ns_capable() calls to
    ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
    is examined by SafeSetID code and nothing else in the kernel.

    The changes to SafeSetID internally allow for setting up GID
    transition security policies, as already existed for UIDs"

    * tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
    LSM: SafeSetID: Fix warnings reported by test bot
    LSM: SafeSetID: Add GID security policy handling
    LSM: Signal to SafeSetID when setting group IDs

    Linus Torvalds
     

17 Oct, 2020

1 commit


14 Oct, 2020

1 commit

  • For SafeSetID to properly gate set*gid() calls, it needs to know whether
    ns_capable() is being called from within a sys_set*gid() function or is
    being called from elsewhere in the kernel. This allows SafeSetID to deny
    CAP_SETGID to restricted groups when they are attempting to use the
    capability for code paths other than updating GIDs (e.g. setting up
    userns GID mappings). This is the identical approach to what is
    currently done for CAP_SETUID.

    NOTE: We also add signaling to SafeSetID from the setgroups() syscall,
    as we have future plans to restrict a process' ability to set
    supplementary groups in addition to what is added in this series for
    restricting setting of the primary group.

    Signed-off-by: Thomas Cedeno
    Signed-off-by: Micah Morton

    Thomas Cedeno
     

01 Sep, 2020

1 commit


24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

07 Aug, 2020

1 commit


20 Jul, 2020

2 commits

  • This brings consistency with the rest of the prctl() syscall where
    -EPERM is returned when failing a capability check.

    Signed-off-by: Nicolas Viennot
    Signed-off-by: Adrian Reber
    Reviewed-by: Serge Hallyn
    Link: https://lore.kernel.org/r/20200719100418.2112740-7-areber@redhat.com
    Signed-off-by: Christian Brauner

    Nicolas Viennot
     
  • Originally, only a local CAP_SYS_ADMIN could change the exe link,
    making it difficult for doing checkpoint/restore without CAP_SYS_ADMIN.
    This commit adds CAP_CHECKPOINT_RESTORE in addition to CAP_SYS_ADMIN
    for permitting changing the exe link.

    The following describes the history of the /proc/self/exe permission
    checks as it may be difficult to understand what decisions lead to this
    point.

    * [1] May 2012: This commit introduces the ability of changing
    /proc/self/exe if the user is CAP_SYS_RESOURCE capable.
    In the related discussion [2], no clear thread model is presented for
    what could happen if the /proc/self/exe changes multiple times, or why
    would the admin be at the mercy of userspace.

    * [3] Oct 2014: This commit introduces a new API to change
    /proc/self/exe. The permission no longer checks for CAP_SYS_RESOURCE,
    but instead checks if the current user is root (uid=0) in its local
    namespace. In the related discussion [4] it is said that "Controlling
    exe_fd without privileges may turn out to be dangerous. At least
    things like tomoyo examine it for making policy decisions (see
    tomoyo_manager())."

    * [5] Dec 2016: This commit removes the restriction to change
    /proc/self/exe at most once. The related discussion [6] informs that
    the audit subsystem relies on the exe symlink, presumably
    audit_log_d_path_exe() in kernel/audit.c.

    * [7] May 2017: This commit changed the check from uid==0 to local
    CAP_SYS_ADMIN. No discussion.

    * [8] July 2020: A PoC to spoof any program's /proc/self/exe via ptrace
    is demonstrated

    Overall, the concrete points that were made to retain capability checks
    around changing the exe symlink is that tomoyo_manager() and
    audit_log_d_path_exe() uses the exe_file path.

    Christian Brauner said that relying on /proc//exe being immutable (or
    guarded by caps) in a sake of security is a bit misleading. It can only
    be used as a hint without any guarantees of what code is being executed
    once execve() returns to userspace. Christian suggested that in the
    future, we could call audit_log() or similar to inform the admin of all
    exe link changes, instead of attempting to provide security guarantees
    via permission checks. However, this proposed change requires the
    understanding of the security implications in the tomoyo/audit subsystems.

    [1] b32dfe377102 ("c/r: prctl: add ability to set new mm_struct::exe_file")
    [2] https://lore.kernel.org/patchwork/patch/292515/
    [3] f606b77f1a9e ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
    [4] https://lore.kernel.org/patchwork/patch/479359/
    [5] 3fb4afd9a504 ("prctl: remove one-shot limitation for changing exe link")
    [6] https://lore.kernel.org/patchwork/patch/697304/
    [7] 4d28df6152aa ("prctl: Allow local CAP_SYS_ADMIN changing exe_file")
    [8] https://github.com/nviennot/run_as_exe

    Signed-off-by: Nicolas Viennot
    Signed-off-by: Adrian Reber
    Link: https://lore.kernel.org/r/20200719100418.2112740-6-areber@redhat.com
    Signed-off-by: Christian Brauner

    Nicolas Viennot
     

25 Jun, 2020

2 commits


24 Jun, 2020

1 commit


15 Jun, 2020

2 commits

  • Pull SafeSetID update from Micah Morton:
    "Add additional LSM hooks for SafeSetID

    SafeSetID is capable of making allow/deny decisions for set*uid calls
    on a system, and we want to add similar functionality for set*gid
    calls.

    The work to do that is not yet complete, so probably won't make it in
    for v5.8, but we are looking to get this simple patch in for v5.8
    since we have it ready.

    We are planning on the rest of the work for extending the SafeSetID
    LSM being merged during the v5.9 merge window"

    * tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux:
    security: Add LSM hooks to set*gid syscalls

    Linus Torvalds
     
  • The SafeSetID LSM uses the security_task_fix_setuid hook to filter
    set*uid() syscalls according to its configured security policy. In
    preparation for adding analagous support in the LSM for set*gid()
    syscalls, we add the requisite hook here. Tested by putting print
    statements in the security_task_fix_setgid hook and seeing them get hit
    during kernel boot.

    Signed-off-by: Thomas Cedeno
    Signed-off-by: Micah Morton

    Thomas Cedeno
     

12 Jun, 2020

2 commits


10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

03 Jun, 2020

2 commits

  • Merge updates from Andrew Morton:
    "A few little subsystems and a start of a lot of MM patches.

    Subsystems affected by this patch series: squashfs, ocfs2, parisc,
    vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
    swap, memcg, pagemap, memory-failure, vmalloc, kasan"

    * emailed patches from Andrew Morton : (128 commits)
    kasan: move kasan_report() into report.c
    mm/mm_init.c: report kasan-tag information stored in page->flags
    ubsan: entirely disable alignment checks under UBSAN_TRAP
    kasan: fix clang compilation warning due to stack protector
    x86/mm: remove vmalloc faulting
    mm: remove vmalloc_sync_(un)mappings()
    x86/mm/32: implement arch_sync_kernel_mappings()
    x86/mm/64: implement arch_sync_kernel_mappings()
    mm/ioremap: track which page-table levels were modified
    mm/vmalloc: track which page-table levels were modified
    mm: add functions to track page directory modifications
    s390: use __vmalloc_node in stack_alloc
    powerpc: use __vmalloc_node in alloc_vm_stack
    arm64: use __vmalloc_node in arch_alloc_vmap_stack
    mm: remove vmalloc_user_node_flags
    mm: switch the test_vmalloc module to use __vmalloc_node
    mm: remove __vmalloc_node_flags_caller
    mm: remove both instances of __vmalloc_node_flags
    mm: remove the prot argument to __vmalloc_node
    mm: remove the pgprot argument to __vmalloc
    ...

    Linus Torvalds
     
  • PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
    loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
    daemon needs to write to one bdi (the final bdi) in order to free up
    writes queued to another bdi (the client bdi).

    The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
    pages, so that it can still dirty pages after other processses have been
    throttled. The purpose of this is to avoid deadlock that happen when
    the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
    but it is being thottled and cannot write.

    This approach was designed when all threads were blocked equally,
    independently on which device they were writing to, or how fast it was.
    Since that time the writeback algorithm has changed substantially with
    different threads getting different allowances based on non-trivial
    heuristics. This means the simple "add 25%" heuristic is no longer
    reliable.

    The important issue is not that the daemon needs a *larger* dirty page
    allowance, but that it needs a *private* dirty page allowance, so that
    dirty pages for the "client" bdi that it is helping to clear (the bdi
    for an NFS filesystem or loop block device etc) do not affect the
    throttling of the daemon writing to the "final" bdi.

    This patch changes the heuristic so that the task is not throttled when
    the bdi it is writing to has a dirty page count below below (or equal
    to) the free-run threshold for that bdi. This ensures it will always be
    able to have some pages in flight, and so will not deadlock.

    In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
    still be throttled by global threshold, but that is acceptable as it is
    only the deadlock state that is interesting for this flag.

    This approach of "only throttle when target bdi is busy" is consistent
    with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
    it causes attention to be focussed only on the target bdi.

    So this patch
    - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
    - removes the 25% bonus that that flag gives, and
    - If PF_LOCAL_THROTTLE is set, don't delay at all unless the
    global and the local free-run thresholds are exceeded.

    Note that previously realtime threads were treated the same as
    PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
    for real-time threads, so it is now different from the behaviour of nfsd
    and loop tasks. I don't know what is wanted for realtime.

    [akpm@linux-foundation.org: coding style fixes]
    Signed-off-by: NeilBrown
    Signed-off-by: Andrew Morton
    Reviewed-by: Jan Kara
    Acked-by: Chuck Lever [nfsd]
    Cc: Christoph Hellwig
    Cc: Michal Hocko
    Cc: Trond Myklebust
    Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
    Signed-off-by: Linus Torvalds

    NeilBrown
     

26 Apr, 2020

1 commit


16 Mar, 2020

1 commit


04 Mar, 2020

1 commit

  • The sysinfo() syscall includes uptime in seconds but has no correction for
    time namespaces which makes it inconsistent with the /proc/uptime inside of
    a time namespace.

    Add the missing time namespace adjustment call.

    Signed-off-by: Cyril Hrubis
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Dmitry Safonov
    Link: https://lkml.kernel.org/r/20200303150638.7329-1-chrubis@suse.cz

    Cyril Hrubis
     

03 Feb, 2020

1 commit


28 Jan, 2020

1 commit

  • There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
    amd nbd that have userspace components that can run in the IO path. For
    example, iscsi and nbd's userspace deamons may need to recreate a socket
    and/or send IO on it, and dm-multipath's daemon multipathd may need to
    send SG IO or read/write IO to figure out the state of paths and re-set
    them up.

    In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
    memalloc_*_save/restore functions to control the allocation behavior,
    but for userspace we would end up hitting an allocation that ended up
    writing data back to the same device we are trying to allocate for.
    The device is then in a state of deadlock, because to execute IO the
    device needs to allocate memory, but to allocate memory the memory
    layers want execute IO to the device.

    Here is an example with nbd using a local userspace daemon that performs
    network IO to a remote server. We are using XFS on top of the nbd device,
    but it can happen with any FS or other modules layered on top of the nbd
    device that can write out data to free memory. Here a nbd daemon helper
    thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
    a request. This kicks off a reclaim operation which results in a WRITE to
    the nbd device and the nbd thread calling back into the mm layer.

    [ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
    [ 1626.609193] Call Trace:
    [ 1626.609195] ? __schedule+0x29b/0x630
    [ 1626.609197] ? wait_for_completion+0xe0/0x170
    [ 1626.609198] schedule+0x30/0xb0
    [ 1626.609200] schedule_timeout+0x1f6/0x2f0
    [ 1626.609202] ? blk_finish_plug+0x21/0x2e
    [ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
    [ 1626.609206] ? wait_for_completion+0xe0/0x170
    [ 1626.609208] wait_for_completion+0x108/0x170
    [ 1626.609210] ? wake_up_q+0x70/0x70
    [ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
    [ 1626.609214] ? xfs_bwrite+0x25/0x60
    [ 1626.609215] xfs_buf_iowait+0x22/0xf0
    [ 1626.609218] __xfs_buf_submit+0x12e/0x250
    [ 1626.609220] xfs_bwrite+0x25/0x60
    [ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
    [ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
    [ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
    [ 1626.609228] super_cache_scan+0x152/0x1a0
    [ 1626.609231] do_shrink_slab+0x12c/0x2d0
    [ 1626.609233] shrink_slab+0x9c/0x2a0
    [ 1626.609235] shrink_node+0xd7/0x470
    [ 1626.609237] do_try_to_free_pages+0xbf/0x380
    [ 1626.609240] try_to_free_pages+0xd9/0x1f0
    [ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
    [ 1626.609251] ? ___slab_alloc+0x238/0x560
    [ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
    [ 1626.609259] skb_page_frag_refill+0x97/0xd0
    [ 1626.609274] sk_page_frag_refill+0x1d/0x80
    [ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
    [ 1626.609304] tcp_sendmsg+0x27/0x40
    [ 1626.609307] sock_sendmsg+0x54/0x60
    [ 1626.609308] ___sys_sendmsg+0x29f/0x320
    [ 1626.609313] ? sock_poll+0x66/0xb0
    [ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
    [ 1626.609320] ? ep_send_events_proc+0xe6/0x230
    [ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
    [ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
    [ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
    [ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
    [ 1626.609329] ? __hrtimer_init+0xb0/0xb0
    [ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
    [ 1626.609334] ? ep_poll+0x26c/0x4a0
    [ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
    [ 1626.609339] ? release_sock+0x43/0x90
    [ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
    [ 1626.609342] __sys_sendmsg+0x47/0x80
    [ 1626.609347] do_syscall_64+0x5f/0x1c0
    [ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
    [ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This patch adds a new prctl command that daemons can use after they have
    done their initial setup, and before they start to do allocations that
    are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
    flags so both userspace block and FS threads can use it to avoid the
    allocation recursion and try to prevent from being throttled while
    writing out data to free up memory.

    Signed-off-by: Mike Christie
    Acked-by: Michal Hocko
    Tested-by: Masato Suzuki
    Reviewed-by: Damien Le Moal
    Reviewed-by: Bart Van Assche
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
    Signed-off-by: Christian Brauner

    Mike Christie
     

09 Dec, 2019

1 commit


05 Dec, 2019

1 commit

  • Initialization is not guaranteed to zero padding bytes so use an
    explicit memset instead to avoid leaking any kernel content in any
    possible padding bytes.

    Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
    Signed-off-by: Joe Perches
    Cc: Dan Carpenter
    Cc: Julia Lawall
    Cc: Thomas Gleixner
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

15 Nov, 2019

1 commit

  • There are two 'struct timeval' fields in 'struct rusage'.

    Unfortunately the definition of timeval is now ambiguous when used in
    user space with a libc that has a 64-bit time_t, and this also changes
    the 'rusage' definition in user space in a way that is incompatible with
    the system call interface.

    While there is no good solution to avoid all ambiguity here, change
    the definition in the kernel headers to be compatible with the kernel
    ABI, using __kernel_old_timeval as an unambiguous base type.

    In previous discussions, there was also a plan to add a replacement
    for rusage based on 64-bit timestamps and nanosecond resolution,
    i.e. 'struct __kernel_timespec'. I have patches for that as well,
    if anyone thinks we should do that.

    Reviewed-by: Cyrill Gorcunov
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

21 Sep, 2019

1 commit

  • This merges Linus's tree as of commit b41dae061bbd ("Merge tag
    'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
    into android-mainline.

    This "early" merge makes it easier to test and handle merge conflicts
    instead of having to wait until the "end" of the merge window and handle
    all 10000+ commits at once.

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812

    Greg Kroah-Hartman
     

18 Sep, 2019

1 commit

  • Pull core timer updates from Thomas Gleixner:
    "Timers and timekeeping updates:

    - A large overhaul of the posix CPU timer code which is a preparation
    for moving the CPU timer expiry out into task work so it can be
    properly accounted on the task/process.

    An update to the bogus permission checks will come later during the
    merge window as feedback was not complete before heading of for
    travel.

    - Switch the timerqueue code to use cached rbtrees and get rid of the
    homebrewn caching of the leftmost node.

    - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
    single function

    - Implement the separation of hrtimers to be forced to expire in hard
    interrupt context even when PREEMPT_RT is enabled and mark the
    affected timers accordingly.

    - Implement a mechanism for hrtimers and the timer wheel to protect
    RT against priority inversion and live lock issues when a (hr)timer
    which should be canceled is currently executing the callback.
    Instead of infinitely spinning, the task which tries to cancel the
    timer blocks on a per cpu base expiry lock which is held and
    released by the (hr)timer expiry code.

    - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
    resulting in faster access to timekeeping functions.

    - Updates to various clocksource/clockevent drivers and their device
    tree bindings.

    - The usual small improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
    posix-cpu-timers: Fix permission check regression
    posix-cpu-timers: Always clear head pointer on dequeue
    hrtimer: Add a missing bracket and hide `migration_base' on !SMP
    posix-cpu-timers: Make expiry_active check actually work correctly
    posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
    tick: Mark sched_timer to expire in hard interrupt context
    hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
    x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
    posix-cpu-timers: Utilize timerqueue for storage
    posix-cpu-timers: Move state tracking to struct posix_cputimers
    posix-cpu-timers: Deduplicate rlimit handling
    posix-cpu-timers: Remove pointless comparisons
    posix-cpu-timers: Get rid of 64bit divisions
    posix-cpu-timers: Consolidate timer expiry further
    posix-cpu-timers: Get rid of zero checks
    rlimit: Rewrite non-sensical RLIMIT_CPU comment
    posix-cpu-timers: Respect INFINITY for hard RTTIME limit
    posix-cpu-timers: Switch thread group sampling to array
    posix-cpu-timers: Restructure expiry array
    posix-cpu-timers: Remove cputime_expires
    ...

    Linus Torvalds
     

17 Sep, 2019

1 commit

  • Pull x86 cpu-feature updates from Ingo Molnar:

    - Rework the Intel model names symbols/macros, which were decades of
    ad-hoc extensions and added random noise. It's now a coherent, easy
    to follow nomenclature.

    - Add new Intel CPU model IDs:
    - "Tiger Lake" desktop and mobile models
    - "Elkhart Lake" model ID
    - and the "Lightning Mountain" variant of Airmont, plus support code

    - Add the new AVX512_VP2INTERSECT instruction to cpufeatures

    - Remove Intel MPX user-visible APIs and the self-tests, because the
    toolchain (gcc) is not supporting it going forward. This is the
    first, lowest-risk phase of MPX removal.

    - Remove X86_FEATURE_MFENCE_RDTSC

    - Various smaller cleanups and fixes

    * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    x86/cpu: Update init data for new Airmont CPU model
    x86/cpu: Add new Airmont variant to Intel family
    x86/cpu: Add Elkhart Lake to Intel family
    x86/cpu: Add Tiger Lake to Intel family
    x86: Correct misc typos
    x86/intel: Add common OPTDIFFs
    x86/intel: Aggregate microserver naming
    x86/intel: Aggregate big core graphics naming
    x86/intel: Aggregate big core mobile naming
    x86/intel: Aggregate big core client naming
    x86/cpufeature: Explain the macro duplication
    x86/ftrace: Remove mcount() declaration
    x86/PCI: Remove superfluous returns from void functions
    x86/msr-index: Move AMD MSRs where they belong
    x86/cpu: Use constant definitions for CPU models
    lib: Remove redundant ftrace flag removal
    x86/crash: Remove unnecessary comparison
    x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
    x86: Remove X86_FEATURE_MFENCE_RDTSC
    x86/mpx: Remove MPX APIs
    ...

    Linus Torvalds
     

28 Aug, 2019

2 commits

  • Deactivation of the expiry cache is done by setting all clock caches to
    0. That requires to have a check for zero in all places which update the
    expiry cache:

    if (cache == 0 || new < cache)
    cache = new;

    Use U64_MAX as the deactivated value, which allows to remove the zero
    checks when updating the cache and reduces it to the obvious check:

    if (new < cache)
    cache = new;

    This also removes the weird workaround in do_prlimit() which was required
    to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing
    in 0 to the posix CPU timer code would have effectively disarmed it.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de

    Thomas Gleixner
     
  • The comment above the function which arms RLIMIT_CPU in the posix CPU timer
    code makes no sense at all. It claims that the kernel does not return an
    error code when it rejected the attempt to set RLIMIT_CPU. That's clearly
    bogus as the code does an error check and the rlimit is only set and
    activated when the permission checks are ok. In case of a rejection an
    appropriate error code is returned.

    This is a historical and outdated comment which got dragged along even when
    the rlimit handling code was rewritten.

    Replace it with an explanation why the setup function is not called when
    the rlimit value is RLIM_INFINITY and how the 'disarming' is handled.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.185511287@linutronix.de

    Thomas Gleixner
     

21 Aug, 2019

1 commit


07 Aug, 2019

1 commit

  • It is not desirable to relax the ABI to allow tagged user addresses into
    the kernel indiscriminately. This patch introduces a prctl() interface
    for enabling or disabling the tagged ABI with a global sysctl control
    for preventing applications from enabling the relaxed ABI (meant for
    testing user-space prctl() return error checking without reconfiguring
    the kernel). The ABI properties are inherited by threads of the same
    application and fork()'ed children but cleared on execve(). A Kconfig
    option allows the overall disabling of the relaxed ABI.

    The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
    MTE-specific settings like imprecise vs precise exceptions.

    Reviewed-by: Kees Cook
    Signed-off-by: Catalin Marinas
    Signed-off-by: Andrey Konovalov
    Signed-off-by: Will Deacon

    Catalin Marinas
     

22 Jul, 2019

1 commit

  • MPX is being removed from the kernel due to a lack of support in the
    toolchain going forward (gcc).

    The first step is to remove the userspace-visible ABIs so that applications
    will stop using it. The most visible one are the enable/disable prctl()s.
    Remove them first.

    This is the most minimal and least invasive change needed to ensure that
    apps stop using MPX with new kernels.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190705175321.DB42F0AD@viggo.jf.intel.com

    Dave Hansen
     

03 Jun, 2019

1 commit