21 Jun, 2018

1 commit

  • [ Upstream commit b5bf9a90bbebffba888c9144c5a8a10317b04064 ]

    Gaurav reported a perceived problem with TASK_PARKED, which turned out
    to be a broken wait-loop pattern in __kthread_parkme(), but the
    reported issue can (and does) in fact happen for states that do not do
    condition based sleeps.

    When the 'current->state = TASK_RUNNING' store of a previous
    (concurrent) try_to_wake_up() collides with the setting of a 'special'
    sleep state, we can loose the sleep state.

    Normal condition based wait-loops are immune to this problem, but for
    sleep states that are not condition based are subject to this problem.

    There already is a fix for TASK_DEAD. Abstract that and also apply it
    to TASK_STOPPED and TASK_TRACED, both of which are also without
    condition based wait-loop.

    Reported-by: Gaurav Kohli
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Oleg Nesterov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

22 Feb, 2018

1 commit

  • commit 75f296d93bcebcfe375884ddac79e30263a31766 upstream.

    Convert all allocations that used a NOTRACK flag to stop using it.

    Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Levin, Alexander (Sasha Levin)
     

10 Jan, 2018

3 commits

  • commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.

    complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
    the thread group, today this is wrong in many ways.

    If nothing else, fatal_signal_pending() should always imply that the
    whole thread group (except ->group_exit_task if it is not NULL) is
    killed, this check breaks the rule.

    After the previous changes we can rely on sig_task_ignored();
    sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
    to kill this task and sig == SIGKILL OR it is traced and debugger can
    intercept the signal.

    This should hopefully fix the problem reported by Dmitry. This
    test-case

    static int init(void *arg)
    {
    for (;;)
    pause();
    }

    int main(void)
    {
    char stack[16 * 1024];

    for (;;) {
    int pid = clone(init, stack + sizeof(stack)/2,
    CLONE_NEWPID | SIGCHLD, NULL);
    assert(pid > 0);

    assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
    assert(waitpid(-1, NULL, WSTOPPED) == pid);

    assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
    assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
    assert(pid == wait(NULL));
    }
    }

    triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
    task_participate_group_stop(). do_signal_stop()->signal_group_exit()
    checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
    checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.

    And his should fix the minor security problem reported by Kyle,
    SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
    task is the root of a pid namespace.

    Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
    Signed-off-by: Oleg Nesterov
    Reported-by: Dmitry Vyukov
    Reported-by: Kyle Huey
    Reviewed-by: Kees Cook
    Tested-by: Kyle Huey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.

    Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
    signals even if force == T. This simplifies the next change and this
    matches the same check in get_signal() which will drop these signals
    anyway.

    Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
    Signed-off-by: Oleg Nesterov
    Tested-by: Kyle Huey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.

    The comment in sig_ignored() says "Tracers may want to know about even
    ignored signals" but SIGKILL can not be reported to debugger and it is
    just wrong to return 0 in this case: SIGKILL should only kill the
    SIGNAL_UNKILLABLE task if it comes from the parent ns.

    Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
    sig_task_ignored().

    SISGTOP coming from within the namespace is not really right too but at
    least debugger can intercept it, and we can't drop it here because this
    will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
    another ->ptrace check later, we will see.

    Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
    Signed-off-by: Oleg Nesterov
    Tested-by: Kyle Huey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     

02 Nov, 2017

1 commit

  • Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
    added a check for SIGMET and NSIGEMT being defined. That SIGMET should
    in fact be SIGEMT, with SIGEMT being defined in
    arch/{alpha,mips,sparc}/include/uapi/asm/signal.h

    This was actually pointed out by BenHutchings in a lwn.net comment
    here https://lwn.net/Comments/734608/

    Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
    Signed-off-by: Andrew Clayton
    Signed-off-by: "Eric W. Biederman"

    Andrew Clayton
     

12 Sep, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "Life has been busy and I have not gotten half as much done this round
    as I would have liked. I delayed it so that a minor conflict
    resolution with the mips tree could spend a little time in linux-next
    before I sent this pull request.

    This includes two long delayed user namespace changes from Kirill
    Tkhai. It also includes a very useful change from Serge Hallyn that
    allows the security capability attribute to be used inside of user
    namespaces. The practical effect of this is people can now untar
    tarballs and install rpms in user namespaces. It had been suggested to
    generalize this and encode some of the namespace information
    information in the xattr name. Upon close inspection that makes the
    things that should be hard easy and the things that should be easy
    more expensive.

    Then there is my bugfix/cleanup for signal injection that removes the
    magic encoding of the siginfo union member from the kernel internal
    si_code. The mips folks reported the case where I had used FPE_FIXME
    me is impossible so I have remove FPE_FIXME from mips, while at the
    same time including a return statement in that case to keep gcc from
    complaining about unitialized variables.

    I almost finished the work to get make copy_siginfo_to_user a trivial
    copy to user. The code is available at:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

    But I did not have time/energy to get the code posted and reviewed
    before the merge window opened.

    I was able to see that the security excuse for just copying fields
    that we know are initialized doesn't work in practice there are buggy
    initializations that don't initialize the proper fields in siginfo. So
    we still sometimes copy unitialized data to userspace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    Introduce v3 namespaced file capabilities
    mips/signal: In force_fcr31_sig return in the impossible case
    signal: Remove kernel interal si_code magic
    fcntl: Don't use ambiguous SIG_POLL si_codes
    prctl: Allow local CAP_SYS_ADMIN changing exe_file
    security: Use user_namespace::level to avoid redundant iterations in cap_capable()
    userns,pidns: Verify the userns for new pid namespaces
    signal/testing: Don't look for __SI_FAULT in userspace
    signal/mips: Document a conflict with SI_USER with SIGFPE
    signal/sparc: Document a conflict with SI_USER with SIGFPE
    signal/ia64: Document a conflict with SI_USER with SIGFPE
    signal/alpha: Document a conflict with SI_USER for SIGTRAP

    Linus Torvalds
     

19 Aug, 2017

1 commit

  • When forcing a signal, SIGNAL_UNKILLABLE is removed to prevent recursive
    faults, but this is undesirable when tracing. For example, debugging an
    init process (whether global or namespace), hitting a breakpoint and
    SIGTRAP will force SIGTRAP and then remove SIGNAL_UNKILLABLE.
    Everything continues fine, but then once debugging has finished, the
    init process is left killable which is unlikely what the user expects,
    resulting in either an accidentally killed init or an init that stops
    reaping zombies.

    Link: http://lkml.kernel.org/r/20170815112806.10728-1-jamie.iles@oracle.com
    Signed-off-by: Jamie Iles
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jamie Iles
     

07 Aug, 2017

1 commit

  • The latest change of compat_sys_sigpending in commit 8f13621abced
    ("sigpending(): move compat to native") has broken it in two ways.

    First, it tries to write 4 bytes more than userspace expects:
    sizeof(old_sigset_t) == sizeof(long) == 8 instead of
    sizeof(compat_old_sigset_t) == sizeof(u32) == 4.

    Second, on big endian architectures these bytes are being written in the
    wrong order.

    This bug was found by strace test suite.

    Reported-by: Anatoly Pugachev
    Inspired-by: Eugene Syromyatnikov
    Fixes: 8f13621abced ("sigpending(): move compat to native")
    Signed-off-by: Dmitry V. Levin
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Dmitry V. Levin
     

25 Jul, 2017

1 commit

  • struct siginfo is a union and the kernel since 2.4 has been hiding a union
    tag in the high 16bits of si_code using the values:
    __SI_KILL
    __SI_TIMER
    __SI_POLL
    __SI_FAULT
    __SI_CHLD
    __SI_RT
    __SI_MESGQ
    __SI_SYS

    While this looks plausible on the surface, in practice this situation has
    not worked well.

    - Injected positive signals are not copied to user space properly
    unless they have these magic high bits set.

    - Injected positive signals are not reported properly by signalfd
    unless they have these magic high bits set.

    - These kernel internal values leaked to userspace via ptrace_peek_siginfo

    - It was possible to inject these kernel internal values and cause the
    the kernel to misbehave.

    - Kernel developers got confused and expected these kernel internal values
    in userspace in kernel self tests.

    - Kernel developers got confused and set si_code to __SI_FAULT which
    is SI_USER in userspace which causes userspace to think an ordinary user
    sent the signal and that it was not kernel generated.

    - The values make it impossible to reorganize the code to transform
    siginfo_copy_to_user into a plain copy_to_user. As si_code must
    be massaged before being passed to userspace.

    So remove these kernel internal si codes and make the kernel code simpler
    and more maintainable.

    To replace these kernel internal magic si_codes introduce the helper
    function siginfo_layout, that takes a signal number and an si_code and
    computes which union member of siginfo is being used. Have
    siginfo_layout return an enumeration so that gcc will have enough
    information to warn if a switch statement does not handle all of union
    members.

    A couple of architectures have a messed up ABI that defines signal
    specific duplications of SI_USER which causes more special cases in
    siginfo_layout than I would like. The good news is only problem
    architectures pay the cost.

    Update all of the code that used the previous magic __SI_ values to
    use the new SIL_ values and to call siginfo_layout to get those
    values. Escept where not all of the cases are handled remove the
    defaults in the switch statements so that if a new case is missed in
    the future the lack will show up at compile time.

    Modify the code that copies siginfo si_code to userspace to just copy
    the value and not cast si_code to a short first. The high bits are no
    longer used to hold a magic union member.

    Fixup the siginfo header files to stop including the __SI_ values in
    their constants and for the headers that were missing it to properly
    update the number of si_codes for each signal type.

    The fixes to copy_siginfo_from_user32 implementations has the
    interesting property that several of them perviously should never have
    worked as the __SI_ values they depended up where kernel internal.
    With that dependency gone those implementations should work much
    better.

    The idea of not passing the __SI_ values out to userspace and then
    not reinserting them has been tested with criu and criu worked without
    changes.

    Ref: 2.4.0-test1
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

11 Jul, 2017

1 commit

  • When running kill(72057458746458112, 0) in userspace I hit the following
    issue.

    UBSAN: Undefined behaviour in kernel/signal.c:1462:11
    negation of -2147483648 cannot be represented in type 'int':
    CPU: 226 PID: 9849 Comm: test Tainted: G B ---- ------- 3.10.0-327.53.58.70.x86_64_ubsan+ #116
    Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014
    Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SYSC_kill+0x43e/0x4d0
    SyS_kill+0xe/0x10
    system_call_fastpath+0x16/0x1b

    Add code to avoid the UBSAN detection.

    [akpm@linux-foundation.org: tweak comment]
    Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhongjiang
    Cc: Oleg Nesterov
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhongjiang
     

07 Jul, 2017

1 commit

  • Pull misc compat stuff updates from Al Viro:
    "This part is basically untangling various compat stuff. Compat
    syscalls moved to their native counterparts, getting rid of quite a
    bit of double-copying and/or set_fs() uses. A lot of field-by-field
    copyin/copyout killed off.

    - kernel/compat.c is much closer to containing just the
    copyin/copyout of compat structs. Not all compat syscalls are gone
    from it yet, but it's getting there.

    - ipc/compat_mq.c killed off completely.

    - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
    drivers/block/floppy.c where they belong. Yes, there are several
    drivers that implement some of the same ioctls. Some are m68k and
    one is 32bit-only pmac. drivers/block/floppy.c is the only one in
    that bunch that can be built on biarch"

    * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    mqueue: move compat syscalls to native ones
    usbdevfs: get rid of field-by-field copyin
    compat_hdio_ioctl: get rid of set_fs()
    take floppy compat ioctls to sodding floppy.c
    ipmi: get rid of field-by-field __get_user()
    ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
    rt_sigtimedwait(): move compat to native
    select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
    put_compat_rusage(): switch to copy_to_user()
    sigpending(): move compat to native
    getrlimit()/setrlimit(): move compat to native
    times(2): move compat to native
    compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
    fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
    do_sigaltstack(): lift copying to/from userland into callers
    take compat_sys_old_getrlimit() to native syscall
    trim __ARCH_WANT_SYS_OLD_GETRLIMIT

    Linus Torvalds
     

04 Jul, 2017

2 commits

  • Pull timer updates from Thomas Gleixner:
    "A rather large update for timers/timekeeping:

    - compat syscall consolidation (Al Viro)

    - Posix timer consolidation (Christoph Helwig / Thomas Gleixner)

    - Cleanup of the device tree based initialization for clockevents and
    clocksources (Daniel Lezcano)

    - Consolidation of the FTTMR010 clocksource/event driver (Linus
    Walleij)

    - The usual set of small fixes and updates all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits)
    timers: Make the cpu base lock raw
    clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()'
    clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap
    clocksource/drivers/tcb_clksrc: Make IO endian agnostic
    clocksource/drivers/sun4i: Switch to the timer-of common init
    clocksource/drivers/timer-of: Fix invalid iomap check
    Revert "ktime: Simplify ktime_compare implementation"
    clocksource/drivers: Fix uninitialized variable use in timer_of_init
    kselftests: timers: Add test for frequency step
    kselftests: timers: Fix inconsistency-check to not ignore first timestamp
    time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
    time: Clean up CLOCK_MONOTONIC_RAW time handling
    posix-cpu-timers: Make timespec to nsec conversion safe
    itimer: Make timeval to nsec conversion range limited
    timers: Fix parameter description of try_to_del_timer_sync()
    ktime: Simplify ktime_compare implementation
    clocksource/drivers/fttmr010: Factor out clock read code
    clocksource/drivers/fttmr010: Implement delay timer
    clocksource/drivers: Add timer-of common init routine
    clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume
    ...

    Linus Torvalds
     
  • Pull m68k updates from Geert Uytterhoeven:

    - NuBus improvements and cleanups

    - defconfig updates

    - Fix debugger syscall restart interactions, leading to the global
    removal of ptrace_signal_deliver()

    * tag 'm68k-for-v4.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
    m68k: Remove ptrace_signal_deliver
    m68k/defconfig: Update defconfigs for v4.12-rc1
    nubus: Fix pointer validation
    nubus: Remove slot zero probe

    Linus Torvalds
     

20 Jun, 2017

1 commit

  • This fixes debugger syscall restart interactions. A debugger that
    modifies the tracee's program counter is expected to set the orig_d0
    pseudo register to -1, to disable a possible syscall restart.

    This removes the last user of the ptrace_signal_deliver hook in the ptrace
    signal handling, so remove that as well.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Geert Uytterhoeven

    Andreas Schwab
     

18 Jun, 2017

1 commit

  • Thomas Gleixner wrote:
    > The CRIU support added a 'feature' which allows a user space task to send
    > arbitrary (kernel) signals to itself. The changelog says:
    >
    > The kernel prevents sending of siginfo with positive si_code, because
    > these codes are reserved for kernel. I think we can allow a task to
    > send such a siginfo to itself. This operation should not be dangerous.
    >
    > Quite contrary to that claim, it turns out that it is outright dangerous
    > for signals with info->si_code == SI_TIMER. The following code sequence in
    > a user space task allows to crash the kernel:
    >
    > id = timer_create(CLOCK_XXX, ..... signo = SIGX);
    > timer_set(id, ....);
    > info->si_signo = SIGX;
    > info->si_code = SI_TIMER:
    > info->_sifields._timer._tid = id;
    > info->_sifields._timer._sys_private = 2;
    > rt_[tg]sigqueueinfo(..., SIGX, info);
    > sigemptyset(&sigset);
    > sigaddset(&sigset, SIGX);
    > rt_sigtimedwait(sigset, info);
    >
    > For timers based on CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID this
    > results in a kernel crash because sigwait() dequeues the signal and the
    > dequeue code observes:
    >
    > info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0
    >
    > which triggers the following callchain:
    >
    > do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer()
    >
    > arm_timer() executes a list_add() on the timer, which is already armed via
    > the timer_set() syscall. That's a double list add which corrupts the posix
    > cpu timer list. As a consequence the kernel crashes on the next operation
    > touching the posix cpu timer list.
    >
    > Posix clocks which are internally implemented based on hrtimers are not
    > affected by this because hrtimer_start() can handle already armed timers
    > nicely, but it's a reliable way to trigger the WARN_ON() in
    > hrtimer_forward(), which complains about calling that function on an
    > already armed timer.

    This problem has existed since the posix timer code was merged into
    2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to
    inject not just a signal (which linux has supported since 1.0) but the
    full siginfo of a signal.

    The core problem is that the code will reschedule in response to
    signals getting dequeued not just for signals the timers sent but
    for other signals that happen to a si_code of SI_TIMER.

    Avoid this confusion by testing to see if the queued signal was
    preallocated as all timer signals are preallocated, and so far
    only the timer code preallocates signals.

    Move the check for if a timer needs to be rescheduled up into
    collect_signal where the preallocation check must be performed,
    and pass the result back to dequeue_signal where the code reschedules
    timers. This makes it clear why the code cares about preallocated
    timers.

    Cc: stable@vger.kernel.org
    Reported-by: Thomas Gleixner
    History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
    Reference: 66dd34ad31e5 ("signal: allow to send any siginfo to itself")
    Reference: 1669ce53e2ff ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO")
    Fixes: db8b50ba75f2 ("[PATCH] POSIX clocks & timers")
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

10 Jun, 2017

2 commits


04 Jun, 2017

2 commits

  • That function is a misnomer. Rename it with a proper prefix to
    posixtimer_rearm().

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20170530211656.811362578@linutronix.de

    Thomas Gleixner
     
  • Having it in asm-generic/siginfo.h doesn't make any sense as it is in no way
    architecture specific. Move it to posix-timers.h instead.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: Fenghua Yu
    Cc: Tony Luck
    Cc: linux-ia64@vger.kernel.org
    Cc: Arnd Bergmann
    Cc: sparclinux@vger.kernel.org
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20170603190102.28866-4-hch@lst.de

    Christoph Hellwig
     

28 May, 2017

1 commit


11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

22 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

02 Mar, 2017

7 commits


28 Feb, 2017

1 commit

  • Currently SS_AUTODISARM is not supported in compatibility mode, but does
    not return -EINVAL either. This makes dosemu built with -m32 on x86_64
    to crash. Also the kernel's sigaltstack selftest fails if compiled with
    -m32.

    This patch adds the needed support.

    Link: http://lkml.kernel.org/r/20170205101213.8163-2-stsp@list.ru
    Signed-off-by: Stas Sergeev
    Cc: Milosz Tanski
    Cc: Andy Lutomirski
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Nicolas Pitre
    Cc: Waiman Long
    Cc: Dave Hansen
    Cc: Dmitry Safonov
    Cc: Wang Xiaoqiang
    Cc: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stas Sergeev
     

01 Feb, 2017

3 commits

  • Use the new nsec based cputime accessors as part of the whole cputime
    conversion from cputime_t to nsecs.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-16-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Now that most cputime readers use the transition API which return the
    task cputime in old style cputime_t, we can safely store the cputime in
    nsecs. This will eventually make cputime statistics less opaque and more
    granular. Back and forth convertions between cputime_t and nsecs in order
    to deal with cputime_t random granularity won't be needed anymore.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • This API returns a task's cputime in cputime_t in order to ease the
    conversion of cputime internals to use nsecs units instead. Blindly
    converting all cputime readers to use this API now will later let us
    convert more smoothly and step by step all these places to use the
    new nsec based cputime.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

11 Jan, 2017

1 commit

  • Since commit 00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we
    can now trace init processes. init is initially protected with
    SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but
    there are a number of paths during tracing where SIGNAL_UNKILLABLE can
    be implicitly cleared.

    This can result in init becoming stoppable/killable after tracing. For
    example, running:

    while true; do kill -STOP 1; done &
    strace -p 1

    and then stopping strace and the kill loop will result in init being
    left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but
    init will now respond to future SIGSTOP signals rather than ignoring
    them.

    Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED
    that we don't clear SIGNAL_UNKILLABLE.

    Link: http://lkml.kernel.org/r/20170104122017.25047-1-jamie.iles@oracle.com
    Signed-off-by: Jamie Iles
    Acked-by: Oleg Nesterov
    Cc: Alexander Viro
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jamie Iles
     

26 Dec, 2016

1 commit

  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

1 commit


15 Dec, 2016

1 commit

  • When running certain database workload on a high-end system with many
    CPUs, it was found that spinlock contention in the sigprocmask syscalls
    became a significant portion of the overall CPU cycles as shown below.

    9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2
    [k] _raw_spin_lock_irq
    |
    ---_raw_spin_lock_irq
    |
    |--99.34%-- __set_current_blocked
    | sigprocmask
    | sys_rt_sigprocmask
    | system_call_fastpath
    | |
    | |--50.63%-- __swapcontext
    | | |
    | | |--99.91%-- upsleepgeneric
    | |
    | |--49.36%-- __setcontext
    | | ktskRun

    Looking further into the swapcontext function in glibc, it was found that
    the function always call sigprocmask() without checking if there are
    changes in the signal mask.

    A check was added to the __set_current_blocked() function to avoid taking
    the sighand->siglock spinlock if there is no change in the signal mask.
    This will prevent unneeded spinlock contention when many threads are
    trying to call sigprocmask().

    With this patch applied, the spinlock contention in sigprocmask() was
    gone.

    Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com
    Signed-off-by: Waiman Long
    Acked-by: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Stas Sergeev
    Cc: Scott J Norton
    Cc: Douglas Hatch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     

16 Nov, 2016

1 commit

  • Some embedded systems have no use for them. This removes about
    25KB from the kernel binary size when configured out.

    Corresponding syscalls are routed to a stub logging the attempt to
    use those syscalls which should be enough of a clue if they were
    disabled without proper consideration. They are: timer_create,
    timer_gettime: timer_getoverrun, timer_settime, timer_delete,
    clock_adjtime, setitimer, getitimer, alarm.

    The clock_settime, clock_gettime, clock_getres and clock_nanosleep
    syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
    CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
    majority of use cases with very little code.

    Signed-off-by: Nicolas Pitre
    Acked-by: Richard Cochran
    Acked-by: Thomas Gleixner
    Acked-by: John Stultz
    Reviewed-by: Josh Triplett
    Cc: Paul Bolle
    Cc: linux-kbuild@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: Michal Marek
    Cc: Edward Cree
    Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
    Signed-off-by: Thomas Gleixner

    Nicolas Pitre