09 Jan, 2021

1 commit

  • [ Upstream commit f7cfd871ae0c5008d94b6f66834e7845caa93c15 ]

    Recently syzbot reported[0] that there is a deadlock amongst the users
    of exec_update_mutex. The problematic lock ordering found by lockdep
    was:

    perf_event_open (exec_update_mutex -> ovl_i_mutex)
    chown (ovl_i_mutex -> sb_writes)
    sendfile (sb_writes -> p->lock)
    by reading from a proc file and writing to overlayfs
    proc_pid_syscall (p->lock -> exec_update_mutex)

    While looking at possible solutions it occured to me that all of the
    users and possible users involved only wanted to state of the given
    process to remain the same. They are all readers. The only writer is
    exec.

    There is no reason for readers to block on each other. So fix
    this deadlock by transforming exec_update_mutex into a rw_semaphore
    named exec_update_lock that only exec takes for writing.

    Cc: Jann Horn
    Cc: Vasiliy Kulikov
    Cc: Al Viro
    Cc: Bernd Edlinger
    Cc: Oleg Nesterov
    Cc: Christopher Yeoh
    Cc: Cyrill Gorcunov
    Cc: Sargun Dhillon
    Cc: Christian Brauner
    Cc: Arnd Bergmann
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex")
    [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com
    Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com
    Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Sasha Levin

    Eric W. Biederman
     

01 Oct, 2020

1 commit

  • Grab actual references to the files_struct. To avoid circular references
    issues due to this, we add a per-task note that keeps track of what
    io_uring contexts a task has used. When the tasks execs or exits its
    assigned files, we cancel requests based on this tracking.

    With that, we can grab proper references to the files table, and no
    longer need to rely on stashing away ring_fd and ring_file to check
    if the ring_fd may have been closed.

    Cc: stable@vger.kernel.org # v5.5+
    Reviewed-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Aug, 2020

1 commit

  • Pull OpenRISC updates from Stafford Horne:
    "A few patches all over the place during this cycle, mostly bug and
    sparse warning fixes for OpenRISC, but a few enhancements too. Note,
    there are 2 non OpenRISC specific fixups.

    Non OpenRISC fixes:

    - In init we need to align the init_task correctly to fix an issue
    with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I
    kept it on my tree.

    - In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
    Arnd asked to merge it via my tree.

    OpenRISC fixes:

    - Many fixes for OpenRISC sprase warnings.

    - Add support OpenRISC SMP tlb flushing rather than always flushing
    the entire TLB on every CPU.

    - Fix bug when dumping stack via /proc/xxx/stack of user threads"

    * tag 'for-linus' of git://github.com/openrisc/linux:
    openrisc: uaccess: Add user address space check to access_ok
    openrisc: signal: Fix sparse address space warnings
    openrisc: uaccess: Remove unused macro __addr_ok
    openrisc: uaccess: Use static inline function in access_ok
    openrisc: uaccess: Fix sparse address space warnings
    openrisc: io: Fixup defines and move include to the end
    asm-generic/io.h: Fix sparse warnings on big-endian architectures
    openrisc: Implement proper SMP tlb flushing
    openrisc: Fix oops caused when dumping stack
    openrisc: Add support for external initrd images
    init: Align init_task to avoid conflict with MUTEX_FLAGS
    openrisc: fix __user in raw_copy_to_user()'s prototype

    Linus Torvalds
     

11 Aug, 2020

1 commit

  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     

04 Aug, 2020

1 commit

  • When booting on 32-bit machines (seen on OpenRISC) I saw this warning
    with CONFIG_DEBUG_MUTEXES turned on.

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec
    DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00005-g2864e2171db4-dirty #179
    Call trace:
    [] dump_stack+0x34/0x48
    [] __warn+0x104/0x158
    [] ? __mutex_unlock_slowpath+0x328/0x3ec
    [] warn_slowpath_fmt+0x7c/0x94
    [] __mutex_unlock_slowpath+0x328/0x3ec
    [] mutex_unlock+0x18/0x28
    [] __cpuhp_setup_state_cpuslocked.part.0+0x29c/0x2f4
    [] ? page_alloc_cpu_dead+0x0/0x30
    [] ? start_kernel+0x0/0x684
    [] __cpuhp_setup_state+0x4c/0x5c
    [] page_alloc_init+0x34/0x68
    [] ? start_kernel+0x1a0/0x684
    [] ? early_init_dt_scan_nodes+0x60/0x70
    irq event stamp: 0

    I traced this to kernel/locking/mutex.c storing 3 bits of MUTEX_FLAGS in
    the task_struct pointer (mutex.owner). There is a comment saying that
    task_structs are always aligned to L1_CACHE_BYTES. This is not true for
    the init_task.

    On 64-bit machines this is not a problem because symbol addresses are
    naturally aligned to 64-bits providing 3 bits for MUTEX_FLAGS. Howerver,
    for 32-bit machines the symbol address only has 2 bits available.

    Fix this by setting init_task alignment to at least L1_CACHE_BYTES.

    Signed-off-by: Stafford Horne
    Acked-by: Peter Zijlstra (Intel)

    Stafford Horne
     

29 Jul, 2020

1 commit

  • A sequence counter write side critical section must be protected by some
    form of locking to serialize writers. A plain seqcount_t does not
    contain the information of which lock must be held when entering a write
    side critical section.

    Use the new seqcount_spinlock_t data type, which allows to associate a
    spinlock with the sequence counter. This enables lockdep to verify that
    the spinlock used for writer serialization is held when the write side
    critical section is entered.

    If lockdep is disabled this lock association is compiled out and has
    neither storage size nor runtime overhead.

    Signed-off-by: Ahmed S. Darwish
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200720155530.1173732-14-a.darwish@linutronix.de

    Ahmed S. Darwish
     

11 Jul, 2020

1 commit


12 Jun, 2020

1 commit

  • Merge the state of the locking kcsan branch before the read/write_once()
    and the atomics modifications got merged.

    Squash the fallout of the rebase on top of the read/write once and atomic
    fallback work into the merge. The history of the original branch is
    preserved in tag locking-kcsan-2020-06-02.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

10 Jun, 2020

1 commit

  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

02 Jun, 2020

1 commit

  • Pull arm64 updates from Will Deacon:
    "A sizeable pile of arm64 updates for 5.8.

    Summary below, but the big two features are support for Branch Target
    Identification and Clang's Shadow Call stack. The latter is currently
    arm64-only, but the high-level parts are all in core code so it could
    easily be adopted by other architectures pending toolchain support

    Branch Target Identification (BTI):

    - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
    branch targets to limit the types of branch from which they can be
    called and additionally prevents branching to arbitrary code,
    although kernel support requires a very recent toolchain.

    - Function annotation via SYM_FUNC_START() so that assembly functions
    are wrapped with the relevant "landing pad" instructions.

    - BPF and vDSO updates to use the new instructions.

    - Addition of a new HWCAP and exposure of BTI capability to userspace
    via ID register emulation, along with ELF loader support for the
    BTI feature in .note.gnu.property.

    - Non-critical fixes to CFI unwind annotations in the sigreturn
    trampoline.

    Shadow Call Stack (SCS):

    - Support for Clang's Shadow Call Stack feature, which reserves
    platform register x18 to point at a separate stack for each task
    that holds only return addresses. This protects function return
    control flow from buffer overruns on the main stack.

    - Save/restore of x18 across problematic boundaries (user-mode,
    hypervisor, EFI, suspend, etc).

    - Core support for SCS, should other architectures want to use it
    too.

    - SCS overflow checking on context-switch as part of the existing
    stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

    CPU feature detection:

    - Removed numerous "SANITY CHECK" errors when running on a system
    with mismatched AArch32 support at EL1. This is primarily a concern
    for KVM, which disabled support for 32-bit guests on such a system.

    - Addition of new ID registers and fields as the architecture has
    been extended.

    Perf and PMU drivers:

    - Minor fixes and cleanups to system PMU drivers.

    Hardware errata:

    - Unify KVM workarounds for VHE and nVHE configurations.

    - Sort vendor errata entries in Kconfig.

    Secure Monitor Call Calling Convention (SMCCC):

    - Update to the latest specification from Arm (v1.2).

    - Allow PSCI code to query the SMCCC version.

    Software Delegated Exception Interface (SDEI):

    - Unexport a bunch of unused symbols.

    - Minor fixes to handling of firmware data.

    Pointer authentication:

    - Add support for dumping the kernel PAC mask in vmcoreinfo so that
    the stack can be unwound by tools such as kdump.

    - Simplification of key initialisation during CPU bringup.

    BPF backend:

    - Improve immediate generation for logical and add/sub instructions.

    vDSO:

    - Minor fixes to the linker flags for consistency with other
    architectures and support for LLVM's unwinder.

    - Clean up logic to initialise and map the vDSO into userspace.

    ACPI:

    - Work around for an ambiguity in the IORT specification relating to
    the "num_ids" field.

    - Support _DMA method for all named components rather than only PCIe
    root complexes.

    - Minor other IORT-related fixes.

    Miscellaneous:

    - Initialise debug traps early for KGDB and fix KDB cacheflushing
    deadlock.

    - Minor tweaks to early boot state (documentation update, set
    TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

    - Refactoring and cleanup"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
    KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
    KVM: arm64: Check advertised Stage-2 page size capability
    arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
    ACPI/IORT: Remove the unused __get_pci_rid()
    arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
    arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
    arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
    arm64/cpufeature: Introduce ID_MMFR5 CPU register
    arm64/cpufeature: Introduce ID_DFR1 CPU register
    arm64/cpufeature: Introduce ID_PFR2 CPU register
    arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
    arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
    arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
    arm64: mm: Add asid_gen_match() helper
    firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
    arm64: vdso: Fix CFI directives in sigreturn trampoline
    arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
    ...

    Linus Torvalds
     

15 May, 2020

1 commit

  • This change adds generic support for Clang's Shadow Call Stack,
    which uses a shadow stack to protect return addresses from being
    overwritten by an attacker. Details are available here:

    https://clang.llvm.org/docs/ShadowCallStack.html

    Note that security guarantees in the kernel differ from the ones
    documented for user space. The kernel must store addresses of
    shadow stacks in memory, which means an attacker capable reading
    and writing arbitrary memory may be able to locate them and hijack
    control flow by modifying the stacks.

    Signed-off-by: Sami Tolvanen
    Reviewed-by: Kees Cook
    Reviewed-by: Miguel Ojeda
    [will: Numerous cosmetic changes]
    Signed-off-by: Will Deacon

    Sami Tolvanen
     

08 May, 2020

1 commit


28 Apr, 2020

2 commits

  • This commit splits ->trc_reader_need_end by using the rcu_special union.
    This change permits readers to check to see if a memory barrier is
    required without any added overhead in the common case where no such
    barrier is required. This commit also adds the read-side checking.
    Later commits will add the machinery to properly set the new
    ->trc_reader_special.b.need_mb field.

    This commit also makes rcu_read_unlock_trace_special() tolerate nested
    read-side critical sections within interrupt and NMI handlers.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Because RCU does not watch exception early-entry/late-exit, idle-loop,
    or CPU-hotplug execution, protection of tracing and BPF operations is
    needlessly complicated. This commit therefore adds a variant of
    Tasks RCU that:

    o Has explicit read-side markers to allow finite grace periods in
    the face of in-kernel loops for PREEMPT=n builds. These markers
    are rcu_read_lock_trace() and rcu_read_unlock_trace().

    o Protects code in the idle loop, exception entry/exit, and
    CPU-hotplug code paths. In this respect, RCU-tasks trace is
    similar to SRCU, but with lighter-weight readers.

    o Avoids expensive read-side instruction, having overhead similar
    to that of Preemptible RCU.

    There are of course downsides:

    o The grace-period code can send IPIs to CPUs, even when those
    CPUs are in the idle loop or in nohz_full userspace. This is
    mitigated by later commits.

    o It is necessary to scan the full tasklist, much as for Tasks RCU.

    o There is a single callback queue guarded by a single lock,
    again, much as for Tasks RCU. However, those early use cases
    that request multiple grace periods in quick succession are
    expected to do so from a single task, which makes the single
    lock almost irrelevant. If needed, multiple callback queues
    can be provided using any number of schemes.

    Perhaps most important, this variant of RCU does not affect the vanilla
    flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace
    readers can operate from idle, offline, and exception entry/exit in no
    way enables rcu_preempt and rcu_sched readers to do so.

    The memory ordering was outlined here:
    https://lore.kernel.org/lkml/20200319034030.GX3199@paulmck-ThinkPad-P72/

    This effort benefited greatly from off-list discussions of BPF
    requirements with Alexei Starovoitov and Andrii Nakryiko. At least
    some of the on-list discussions are captured in the Link: tags below.
    In addition, KCSAN was quite helpful in finding some early bugs.

    Link: https://lore.kernel.org/lkml/20200219150744.428764577@infradead.org/
    Link: https://lore.kernel.org/lkml/87mu8p797b.fsf@nanos.tec.linutronix.de/
    Link: https://lore.kernel.org/lkml/20200225221305.605144982@linutronix.de/
    Cc: Alexei Starovoitov
    Cc: Andrii Nakryiko
    [ paulmck: Apply feedback from Steve Rostedt and Joel Fernandes. ]
    [ paulmck: Decrement trc_n_readers_need_end upon IPI failure. ]
    [ paulmck: Fix locking issue reported by rcutorture. ]
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

14 Apr, 2020

1 commit

  • This adds support for scoped accesses, where the memory range is checked
    for the duration of the scope. The feature is implemented by inserting
    the relevant access information into a list of scoped accesses for
    the current execution context, which are then checked (until removed)
    on every call (through instrumentation) into the KCSAN runtime.

    An alternative, more complex, implementation could set up a watchpoint for
    the scoped access, and keep the watchpoint set up. This, however, would
    require first exposing a handle to the watchpoint, as well as dealing
    with cases such as accesses by the same thread while the watchpoint is
    still set up (and several more cases). It is also doubtful if this would
    provide any benefit, since the majority of delay where the watchpoint
    is set up is likely due to the injected delays by KCSAN. Therefore,
    the implementation in this patch is simpler and avoids hurting KCSAN's
    main use-case (normal data race detection); it also implicitly increases
    scoped-access race-detection-ability due to increased probability of
    setting up watchpoints by repeatedly calling __kcsan_check_access()
    throughout the scope of the access.

    The implementation required adding an additional conditional branch to
    the fast-path. However, the microbenchmark showed a *speedup* of ~5%
    on the fast-path. This appears to be due to subtly improved codegen by
    GCC from moving get_ctx() and associated load of preempt_count earlier.

    Suggested-by: Boqun Feng
    Suggested-by: Paul E. McKenney
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

13 Apr, 2020

1 commit


25 Mar, 2020

1 commit

  • The cred_guard_mutex is problematic as it is held over possibly
    indefinite waits for userspace. The possible indefinite waits for
    userspace that I have identified are: The cred_guard_mutex is held in
    PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is
    held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The
    cred_guard_mutex is held over "get_user(futex_offset, ...") in
    exit_robust_list. The cred_guard_mutex held over copy_strings.

    The functions get_user and put_user can trigger a page fault which can
    potentially wait indefinitely in the case of userfaultfd or if
    userspace implements part of the page fault path.

    In any of those cases the userspace process that the kernel is waiting
    for might make a different system call that winds up taking the
    cred_guard_mutex and result in deadlock.

    Holding a mutex over any of those possibly indefinite waits for
    userspace does not appear necessary. Add exec_update_mutex that will
    just cover updating the process during exec where the permissions and
    the objects pointed to by the task struct may be out of sync.

    The plan is to switch the users of cred_guard_mutex to
    exec_update_mutex one by one. This lets us move forward while still
    being careful and not introducing any regressions.

    Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/
    Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/
    Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/
    Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/
    Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/
    Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.")
    Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2")
    Reviewed-by: Kirill Tkhai
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Bernd Edlinger
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

21 Mar, 2020

1 commit

  • When setting up an access mask with kcsan_set_access_mask(), KCSAN will
    only report races if concurrent changes to bits set in access_mask are
    observed. Conveying access_mask via a separate call avoids introducing
    overhead in the common-case fast-path.

    Acked-by: John Hubbard
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     

16 Nov, 2019

1 commit

  • Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
    kernel space. KCSAN is a sampling watchpoint-based data-race detector.
    See the included Documentation/dev-tools/kcsan.rst for more details.

    This patch adds basic infrastructure, but does not yet enable KCSAN for
    any architecture.

    Signed-off-by: Marco Elver
    Acked-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

18 Sep, 2019

1 commit

  • Pull core timer updates from Thomas Gleixner:
    "Timers and timekeeping updates:

    - A large overhaul of the posix CPU timer code which is a preparation
    for moving the CPU timer expiry out into task work so it can be
    properly accounted on the task/process.

    An update to the bogus permission checks will come later during the
    merge window as feedback was not complete before heading of for
    travel.

    - Switch the timerqueue code to use cached rbtrees and get rid of the
    homebrewn caching of the leftmost node.

    - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
    single function

    - Implement the separation of hrtimers to be forced to expire in hard
    interrupt context even when PREEMPT_RT is enabled and mark the
    affected timers accordingly.

    - Implement a mechanism for hrtimers and the timer wheel to protect
    RT against priority inversion and live lock issues when a (hr)timer
    which should be canceled is currently executing the callback.
    Instead of infinitely spinning, the task which tries to cancel the
    timer blocks on a per cpu base expiry lock which is held and
    released by the (hr)timer expiry code.

    - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
    resulting in faster access to timekeeping functions.

    - Updates to various clocksource/clockevent drivers and their device
    tree bindings.

    - The usual small improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
    posix-cpu-timers: Fix permission check regression
    posix-cpu-timers: Always clear head pointer on dequeue
    hrtimer: Add a missing bracket and hide `migration_base' on !SMP
    posix-cpu-timers: Make expiry_active check actually work correctly
    posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
    tick: Mark sched_timer to expire in hard interrupt context
    hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
    x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
    posix-cpu-timers: Utilize timerqueue for storage
    posix-cpu-timers: Move state tracking to struct posix_cputimers
    posix-cpu-timers: Deduplicate rlimit handling
    posix-cpu-timers: Remove pointless comparisons
    posix-cpu-timers: Get rid of 64bit divisions
    posix-cpu-timers: Consolidate timer expiry further
    posix-cpu-timers: Get rid of zero checks
    rlimit: Rewrite non-sensical RLIMIT_CPU comment
    posix-cpu-timers: Respect INFINITY for hard RTTIME limit
    posix-cpu-timers: Switch thread group sampling to array
    posix-cpu-timers: Restructure expiry array
    posix-cpu-timers: Remove cputime_expires
    ...

    Linus Torvalds
     

28 Aug, 2019

1 commit


01 Aug, 2019

1 commit

  • CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
    CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
    functionality which today depends on CONFIG_PREEMPT.

    Switch the preemption code, scheduler and init task over to use
    CONFIG_PREEMPTION.

    That's the first step towards RT in that area. The more complex changes are
    coming separately.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20190726212124.117528401@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

09 Jul, 2019

1 commit

  • Pull scheduler updates from Ingo Molnar:

    - Remove the unused per rq load array and all its infrastructure, by
    Dietmar Eggemann.

    - Add utilization clamping support by Patrick Bellasi. This is a
    refinement of the energy aware scheduling framework with support for
    boosting of interactive and capping of background workloads: to make
    sure critical GUI threads get maximum frequency ASAP, and to make
    sure background processing doesn't unnecessarily move to cpufreq
    governor to higher frequencies and less energy efficient CPU modes.

    - Add the bare minimum of tracepoints required for LISA EAS regression
    testing, by Qais Yousef - which allows automated testing of various
    power management features, including energy aware scheduling.

    - Restructure the former tsk_nr_cpus_allowed() facility that the -rt
    kernel used to modify the scheduler's CPU affinity logic such as
    migrate_disable() - introduce the task->cpus_ptr value instead of
    taking the address of &task->cpus_allowed directly - by Sebastian
    Andrzej Siewior.

    - Misc optimizations, fixes, cleanups and small enhancements - see the
    Git log for details.

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
    sched/uclamp: Add uclamp support to energy_compute()
    sched/uclamp: Add uclamp_util_with()
    sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
    sched/uclamp: Set default clamps for RT tasks
    sched/uclamp: Reset uclamp values on RESET_ON_FORK
    sched/uclamp: Extend sched_setattr() to support utilization clamping
    sched/core: Allow sched_setattr() to use the current policy
    sched/uclamp: Add system default clamps
    sched/uclamp: Enforce last task's UCLAMP_MAX
    sched/uclamp: Add bucket local max tracking
    sched/uclamp: Add CPU's clamp buckets refcounting
    sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
    sched/debug: Export the newly added tracepoints
    sched/debug: Add sched_overutilized tracepoint
    sched/debug: Add new tracepoint to track PELT at se level
    sched/debug: Add new tracepoints to track PELT at rq level
    sched/debug: Add a new sched_trace_*() helper functions
    sched/autogroup: Make autogroup_path() always available
    sched/wait: Deduplicate code with do-while
    sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
    ...

    Linus Torvalds
     

03 Jun, 2019

3 commits

  • Chain keys are computed using Jenkins hash function, which needs an initial
    hash to start with. Dedicate a macro to make this clear and configurable. A
    later patch changes this initial chain key.

    Signed-off-by: Yuyang Du
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: bvanassche@acm.org
    Cc: frederic@kernel.org
    Cc: ming.lei@redhat.com
    Cc: will.deacon@arm.com
    Link: https://lkml.kernel.org/r/20190506081939.74287-9-duyuyang@gmail.com
    Signed-off-by: Ingo Molnar

    Yuyang Du
     
  • Despite that there is a lockdep_init_task() which does nothing, lockdep
    initiates tasks by assigning lockdep fields and does so inconsistently. Fix
    this by using lockdep_init_task().

    Signed-off-by: Yuyang Du
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: bvanassche@acm.org
    Cc: frederic@kernel.org
    Cc: ming.lei@redhat.com
    Cc: will.deacon@arm.com
    Link: https://lkml.kernel.org/r/20190506081939.74287-8-duyuyang@gmail.com
    Signed-off-by: Ingo Molnar

    Yuyang Du
     
  • In commit:

    4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper")

    the tsk_nr_cpus_allowed() wrapper was removed. There was not
    much difference in !RT but in RT we used this to implement
    migrate_disable(). Within a migrate_disable() section the CPU mask is
    restricted to single CPU while the "normal" CPU mask remains untouched.

    As an alternative implementation Ingo suggested to use:

    struct task_struct {
    const cpumask_t *cpus_ptr;
    cpumask_t cpus_mask;
    };
    with
    t->cpus_ptr = &t->cpus_mask;

    In -RT we then can switch the cpus_ptr to:

    t->cpus_ptr = &cpumask_of(task_cpu(p));

    in a migration disabled region. The rules are simple:

    - Code that 'uses' ->cpus_allowed would use the pointer.
    - Code that 'modifies' ->cpus_allowed would use the direct mask.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de
    Signed-off-by: Ingo Molnar

    Sebastian Andrzej Siewior
     

08 Mar, 2019

1 commit

  • Pull audit updates from Paul Moore:
    "A lucky 13 audit patches for v5.1.

    Despite the rather large diffstat, most of the changes are from two
    bug fix patches that move code from one Kconfig option to another.

    Beyond that bit of churn, the remaining changes are largely cleanups
    and bug-fixes as we slowly march towards container auditing. It isn't
    all boring though, we do have a couple of new things: file
    capabilities v3 support, and expanded support for filtering on
    filesystems to solve problems with remote filesystems.

    All changes pass the audit-testsuite. Please merge for v5.1"

    * tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
    audit: mark expected switch fall-through
    audit: hide auditsc_get_stamp and audit_serial prototypes
    audit: join tty records to their syscall
    audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
    audit: remove unused actx param from audit_rule_match
    audit: ignore fcaps on umount
    audit: clean up AUDITSYSCALL prototypes and stubs
    audit: more filter PATH records keyed on filesystem magic
    audit: add support for fcaps v3
    audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
    audit: add syscall information to CONFIG_CHANGE records
    audit: hand taken context to audit_kill_trees for syscall logging
    audit: give a clue what CONFIG_CHANGE op was involved

    Linus Torvalds
     

07 Mar, 2019

1 commit

  • Merge misc updates from Andrew Morton:

    - a few misc things

    - ocfs2 updates

    - most of MM

    * emailed patches from Andrew Morton : (159 commits)
    tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
    proc: more robust bulk read test
    proc: test /proc/*/maps, smaps, smaps_rollup, statm
    proc: use seq_puts() everywhere
    proc: read kernel cpu stat pointer once
    proc: remove unused argument in proc_pid_lookup()
    fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
    fs/proc/self.c: code cleanup for proc_setup_self()
    proc: return exit code 4 for skipped tests
    mm,mremap: bail out earlier in mremap_to under map pressure
    mm/sparse: fix a bad comparison
    mm/memory.c: do_fault: avoid usage of stale vm_area_struct
    writeback: fix inode cgroup switching comment
    mm/huge_memory.c: fix "orig_pud" set but not used
    mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
    mm/memcontrol.c: fix bad line in comment
    mm/cma.c: cma_declare_contiguous: correct err handling
    mm/page_ext.c: fix an imbalance with kmemleak
    mm/compaction: pass pgdat to too_many_isolated() instead of zone
    mm: remove zone_lru_lock() function, access ->lru_lock directly
    ...

    Linus Torvalds
     

06 Mar, 2019

1 commit

  • Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

    All these places for replacement were found by running the following
    grep patterns on the entire kernel code. Please let me know if this
    might have missed some instances. This might also have replaced some
    false positives. I will appreciate suggestions, inputs and review.

    1. git grep "nid == -1"
    2. git grep "node == -1"
    3. git grep "nid = -1"
    4. git grep "node = -1"

    This patch (of 2):

    At present there are multiple places where invalid node number is
    encoded as -1. Even though implicitly understood it is always better to
    have macros in there. Replace these open encodings for an invalid node
    number with the global macro NUMA_NO_NODE. This helps remove NUMA
    related assumptions like 'invalid node' from various places redirecting
    them to a common definition.

    Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: David Hildenbrand
    Acked-by: Jeff Kirsher [ixgbe]
    Acked-by: Jens Axboe [mtip32xx]
    Acked-by: Vinod Koul [dmaengine.c]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Doug Ledford [drivers/infiniband]
    Cc: Joseph Qi
    Cc: Hans Verkuil
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

04 Feb, 2019

3 commits

  • atomic_t variables are currently used to implement reference
    counters with the following properties:

    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable task_struct.stack_refcount is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    ** Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.

    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.

    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.

    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the task_struct.stack_refcount it might make a difference
    in following places:

    - try_get_task_stack(): increment in refcount_inc_not_zero() only
    guarantees control dependency on success vs. fully ordered
    atomic counterpart
    - put_task_stack(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Signed-off-by: Elena Reshetova
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Reviewed-by: Andrea Parri
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1547814450-18902-6-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Ingo Molnar

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:

    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable task_struct.usage is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    ** Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.

    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.

    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.

    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the task_struct.usage it might make a difference
    in following places:

    - put_task_struct(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Signed-off-by: Elena Reshetova
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Reviewed-by: Andrea Parri
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1547814450-18902-5-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Ingo Molnar

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:

    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable signal_struct.sigcnt is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    ** Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.

    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.

    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.

    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the signal_struct.sigcnt it might make a difference
    in following places:

    - put_signal_struct(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Signed-off-by: Elena Reshetova
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Reviewed-by: Andrea Parri
    Reviewed-by: Oleg Nesterov
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1547814450-18902-3-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Ingo Molnar

    Elena Reshetova
     

26 Jan, 2019

1 commit

  • loginuid and sessionid (and audit_log_session_info) should be part of
    CONFIG_AUDIT scope and not CONFIG_AUDITSYSCALL since it is used in
    CONFIG_CHANGE, ANOM_LINK, FEATURE_CHANGE (and INTEGRITY_RULE), none of
    which are otherwise dependent on AUDITSYSCALL.

    Please see github issue
    https://github.com/linux-audit/audit-kernel/issues/104

    Signed-off-by: Richard Guy Briggs
    [PM: tweaked subject line for better grep'ing]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

10 Aug, 2018

1 commit

  • Wen Yang and majiang
    report that a periodic signal received during fork can cause fork to
    continually restart preventing an application from making progress.

    The code was being overly pessimistic. Fork needs to guarantee that a
    signal sent to multiple processes is logically delivered before the
    fork and just to the forking process or logically delivered after the
    fork to both the forking process and it's newly spawned child. For
    signals like periodic timers that are always delivered to a single
    process fork can safely complete and let them appear to logically
    delivered after the fork().

    While examining this issue I also discovered that fork today will miss
    signals delivered to multiple processes during the fork and handled by
    another thread. Similarly the current code will also miss blocked
    signals that are delivered to multiple process, as those signals will
    not appear pending during fork.

    Add a list of each thread that is currently forking, and keep on that
    list a signal set that records all of the signals sent to multiple
    processes. When fork completes initialize the new processes
    shared_pending signal set with it. The calculate_sigpending function
    will see those signals and set TIF_SIGPENDING causing the new task to
    take the slow path to userspace to handle those signals. Making it
    appear as if those signals were received immediately after the fork.

    It is not possible to send real time signals to multiple processes and
    exceptions don't go to multiple processes, which means that that are
    no signals sent to multiple processes that require siginfo. This
    means it is safe to not bother collecting siginfo on signals sent
    during fork.

    The sigaction of a child of fork is initially the same as the
    sigaction of the parent process. So a signal the parent ignores the
    child will also initially ignore. Therefore it is safe to ignore
    signals sent to multiple processes and ignored by the forking process.

    Signals sent to only a single process or only a single thread and delivered
    during fork are treated as if they are received after the fork, and generally
    not dealt with. They won't cause any problems.

    V2: Added removal from the multiprocess list on failure.
    V3: Use -ERESTARTNOINTR directly
    V4: - Don't queue both SIGCONT and SIGSTOP
    - Initialize signal_struct.multiprocess in init_task
    - Move setting of shared_pending to before the new task
    is visible to signals. This prevents signals from comming
    in before shared_pending.signal is set to delayed.signal
    and being lost.
    V5: - rework list add and delete to account for idle threads
    v6: - Use sigdelsetmask when removing stop signals

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
    Reported-by: Wen Yang and
    Reported-by: majiang
    Fixes: 4a2c7a7837da ("[PATCH] make fork() atomic wrt pgrp/session signals")
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

21 Jul, 2018

3 commits

  • Everywhere except in the pid array we distinguish between a tasks pid and
    a tasks tgid (thread group id). Even in the enumeration we want that
    distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid
    we almost have an implementation of PIDTYPE_TGID in struct signal_struct.

    Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
    into the pids array. Then remove the __PIDTYPE_TGID special case and the
    leader_pid in signal_struct.

    The net size increase is just an extra pointer added to struct pid and
    an extra pair of pointers of an hlist_node added to task_struct.

    The effect on code maintenance is the removal of a number of special
    cases today and the potential to remove many more special cases as
    PIDTYPE_TGID gets used to it's fullest. The long term potential
    is allowing zombie thread group leaders to exit, which will remove
    a lot more special cases in the code.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • To access these fields the code always has to go to group leader so
    going to signal struct is no loss and is actually a fundamental simplification.

    This saves a little bit of memory by only allocating the pid pointer array
    once instead of once for every thread, and even better this removes a
    few potential races caused by the fact that group_leader can be changed
    by de_thread, while signal_struct can not.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • This is cheap and no cost so we might as well.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

15 May, 2018

1 commit


17 Jan, 2018

2 commits