13 Feb, 2018

2 commits

  • commit dc8635b78cd8669c37e230058d18c33af7451ab1 upstream.

    gcc -fisolate-erroneous-paths-dereference can generate calls to abort()
    from modular code too.

    [arnd@arndb.de: drop duplicate exports of abort()]
    Link: http://lkml.kernel.org/r/20180102103311.706364-1-arnd@arndb.de
    Reported-by: Vineet Gupta
    Cc: Sudip Mukherjee
    Cc: Arnd Bergmann
    Cc: Alexey Brodkin
    Cc: Russell King
    Cc: Jose Abreu
    Signed-off-by: Andrew Morton
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Linus Torvalds
    Cc: Evgeniy Didin
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     
  • commit 7c2c11b208be09c156573fc0076b7b3646e05219 upstream.

    gcc toggle -fisolate-erroneous-paths-dereference (default at -O2
    onwards) isolates faulty code paths such as null pointer access, divide
    by zero etc. If gcc port doesnt implement __builtin_trap, an abort() is
    generated which causes kernel link error.

    In this case, gcc is generating abort due to 'divide by zero' in
    lib/mpi/mpih-div.c.

    Currently 'frv' and 'arc' are failing. Previously other arch was also
    broken like m32r was fixed by commit d22e3d69ee1a ("m32r: fix build
    failure").

    Let's define this weak function which is common for all arch and fix the
    problem permanently. We can even remove the arch specific 'abort' after
    this is done.

    Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Sudip Mukherjee
    Cc: Alexey Brodkin
    Cc: Vineet Gupta
    Cc: Sudip Mukherjee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Evgeniy Didin
    Signed-off-by: Greg Kroah-Hartman

    Sudip Mukherjee
     

21 Oct, 2017

1 commit

  • As pointed out by Linus and David, the earlier waitid() fix resulted in
    a (currently harmless) unbalanced user_access_end() call. This fixes it
    to just directly return EFAULT on access_ok() failure.

    Fixes: 96ca579a1ecc ("waitid(): Add missing access_ok() checks")
    Acked-by: David Daney
    Cc: Al Viro
    Signed-off-by: Kees Cook
    Signed-off-by: Linus Torvalds

    Kees Cook
     

10 Oct, 2017

1 commit

  • Adds missing access_ok() checks.

    CVE-2017-5123

    Reported-by: Chris Salls
    Signed-off-by: Kees Cook
    Acked-by: Al Viro
    Fixes: 4c48abe91be0 ("waitid(): switch copyout of siginfo to unsafe_put_user()")
    Cc: stable@kernel.org # 4.13
    Signed-off-by: Linus Torvalds

    Kees Cook
     

30 Sep, 2017

1 commit

  • kernel_waitid() can return a PID, an error or 0. rusage is filled in the first
    case and waitid(2) rusage should've been copied out exactly in that case, *not*
    whenever kernel_waitid() has not returned an error. Compat variant shares that
    braino; none of kernel_wait4() callers do, so the below ought to fix it.

    Reported-and-tested-by: Alexander Potapenko
    Fixes: ce72a16fa705 ("wait4(2)/waitid(2): separate copying rusage to userland")
    Cc: stable@vger.kernel.org # v4.13
    Signed-off-by: Al Viro

    Al Viro
     

12 Sep, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "Life has been busy and I have not gotten half as much done this round
    as I would have liked. I delayed it so that a minor conflict
    resolution with the mips tree could spend a little time in linux-next
    before I sent this pull request.

    This includes two long delayed user namespace changes from Kirill
    Tkhai. It also includes a very useful change from Serge Hallyn that
    allows the security capability attribute to be used inside of user
    namespaces. The practical effect of this is people can now untar
    tarballs and install rpms in user namespaces. It had been suggested to
    generalize this and encode some of the namespace information
    information in the xattr name. Upon close inspection that makes the
    things that should be hard easy and the things that should be easy
    more expensive.

    Then there is my bugfix/cleanup for signal injection that removes the
    magic encoding of the siginfo union member from the kernel internal
    si_code. The mips folks reported the case where I had used FPE_FIXME
    me is impossible so I have remove FPE_FIXME from mips, while at the
    same time including a return statement in that case to keep gcc from
    complaining about unitialized variables.

    I almost finished the work to get make copy_siginfo_to_user a trivial
    copy to user. The code is available at:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

    But I did not have time/energy to get the code posted and reviewed
    before the merge window opened.

    I was able to see that the security excuse for just copying fields
    that we know are initialized doesn't work in practice there are buggy
    initializations that don't initialize the proper fields in siginfo. So
    we still sometimes copy unitialized data to userspace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    Introduce v3 namespaced file capabilities
    mips/signal: In force_fcr31_sig return in the impossible case
    signal: Remove kernel interal si_code magic
    fcntl: Don't use ambiguous SIG_POLL si_codes
    prctl: Allow local CAP_SYS_ADMIN changing exe_file
    security: Use user_namespace::level to avoid redundant iterations in cap_capable()
    userns,pidns: Verify the userns for new pid namespaces
    signal/testing: Don't look for __SI_FAULT in userspace
    signal/mips: Document a conflict with SI_USER with SIGFPE
    signal/sparc: Document a conflict with SI_USER with SIGFPE
    signal/ia64: Document a conflict with SI_USER with SIGFPE
    signal/alpha: Document a conflict with SI_USER for SIGTRAP

    Linus Torvalds
     

05 Sep, 2017

1 commit

  • Pull locking updates from Ingo Molnar:

    - Add 'cross-release' support to lockdep, which allows APIs like
    completions, where it's not the 'owner' who releases the lock, to be
    tracked. It's all activated automatically under
    CONFIG_PROVE_LOCKING=y.

    - Clean up (restructure) the x86 atomics op implementation to be more
    readable, in preparation of KASAN annotations. (Dmitry Vyukov)

    - Fix static keys (Paolo Bonzini)

    - Add killable versions of down_read() et al (Kirill Tkhai)

    - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

    - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

    - Remove smp_mb__before_spinlock() and convert its usages, introduce
    smp_mb__after_spinlock() (Peter Zijlstra)

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
    locking/lockdep/selftests: Fix mixed read-write ABBA tests
    sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
    acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
    locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
    smp: Avoid using two cache lines for struct call_single_data
    locking/lockdep: Untangle xhlock history save/restore from task independence
    locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
    futex: Remove duplicated code and fix undefined behaviour
    Documentation/locking/atomic: Finish the document...
    locking/lockdep: Fix workqueue crossrelease annotation
    workqueue/lockdep: 'Fix' flush_work() annotation
    locking/lockdep/selftests: Add mixed read-write ABBA tests
    mm, locking/barriers: Clarify tlb_flush_pending() barriers
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
    locking/lockdep: Explicitly initialize wq_barrier::done::map
    locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
    locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
    locking/refcounts, x86/asm: Implement fast refcount overflow protection
    locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
    ...

    Linus Torvalds
     

17 Aug, 2017

3 commits

  • …isc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD

    doc.2017.08.17a: Documentation updates.
    fixes.2017.08.17a: RCU fixes.
    hotplug.2017.07.25b: CPU-hotplug updates.
    misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
    spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
    srcu.2017.07.27c: SRCU updates.
    torture.2017.07.24c: Torture-test updates.

    Paul E. McKenney
     
  • There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
    it appears that all callers could do just as well with a lock/unlock pair.
    This commit therefore replaces the spin_unlock_wait() call in do_exit()
    with spin_lock() followed immediately by spin_unlock(). This should be
    safe from a performance perspective because the lock is a per-task lock,
    and this is happening only at task-exit time.

    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: Alan Stern
    Cc: Andrea Parri
    Cc: Linus Torvalds

    Paul E. McKenney
     
  • Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
    This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
    APIs for do_exit() use. This has the benefit of confining the use of the
    tasks_rcu_exit_srcu variable to one file, allowing it to become static.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

10 Aug, 2017

1 commit

  • Lockdep is a runtime locking correctness validator that detects and
    reports a deadlock or its possibility by checking dependencies between
    locks. It's useful since it does not report just an actual deadlock but
    also the possibility of a deadlock that has not actually happened yet.
    That enables problems to be fixed before they affect real systems.

    However, this facility is only applicable to typical locks, such as
    spinlocks and mutexes, which are normally released within the context in
    which they were acquired. However, synchronization primitives like page
    locks or completions, which are allowed to be released in any context,
    also create dependencies and can cause a deadlock.

    So lockdep should track these locks to do a better job. The 'crossrelease'
    implementation makes these primitives also be tracked.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

25 Jul, 2017

1 commit

  • struct siginfo is a union and the kernel since 2.4 has been hiding a union
    tag in the high 16bits of si_code using the values:
    __SI_KILL
    __SI_TIMER
    __SI_POLL
    __SI_FAULT
    __SI_CHLD
    __SI_RT
    __SI_MESGQ
    __SI_SYS

    While this looks plausible on the surface, in practice this situation has
    not worked well.

    - Injected positive signals are not copied to user space properly
    unless they have these magic high bits set.

    - Injected positive signals are not reported properly by signalfd
    unless they have these magic high bits set.

    - These kernel internal values leaked to userspace via ptrace_peek_siginfo

    - It was possible to inject these kernel internal values and cause the
    the kernel to misbehave.

    - Kernel developers got confused and expected these kernel internal values
    in userspace in kernel self tests.

    - Kernel developers got confused and set si_code to __SI_FAULT which
    is SI_USER in userspace which causes userspace to think an ordinary user
    sent the signal and that it was not kernel generated.

    - The values make it impossible to reorganize the code to transform
    siginfo_copy_to_user into a plain copy_to_user. As si_code must
    be massaged before being passed to userspace.

    So remove these kernel internal si codes and make the kernel code simpler
    and more maintainable.

    To replace these kernel internal magic si_codes introduce the helper
    function siginfo_layout, that takes a signal number and an si_code and
    computes which union member of siginfo is being used. Have
    siginfo_layout return an enumeration so that gcc will have enough
    information to warn if a switch statement does not handle all of union
    members.

    A couple of architectures have a messed up ABI that defines signal
    specific duplications of SI_USER which causes more special cases in
    siginfo_layout than I would like. The good news is only problem
    architectures pay the cost.

    Update all of the code that used the previous magic __SI_ values to
    use the new SIL_ values and to call siginfo_layout to get those
    values. Escept where not all of the cases are handled remove the
    defaults in the switch statements so that if a new case is missed in
    the future the lack will show up at compile time.

    Modify the code that copies siginfo si_code to userspace to just copy
    the value and not cast si_code to a short first. The high bits are no
    longer used to hold a magic union member.

    Fixup the siginfo header files to stop including the __SI_ values in
    their constants and for the headers that were missing it to properly
    update the number of si_codes for each signal type.

    The fixes to copy_siginfo_from_user32 implementations has the
    interesting property that several of them perviously should never have
    worked as the __SI_ values they depended up where kernel internal.
    With that dependency gone those implementations should work much
    better.

    The idea of not passing the __SI_ values out to userspace and then
    not reinserting them has been tested with criu and criu worked without
    changes.

    Ref: 2.4.0-test1
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

11 Jul, 2017

1 commit

  • wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
    UBSAN: Undefined behaviour in kernel/exit.c:1651:9

    The related calltrace is as follows:

    negation of -2147483648 cannot be represented in type 'int':
    CPU: 9 PID: 16482 Comm: zj Tainted: G B ---- ------- 3.10.0-327.53.58.71.x86_64+ #66
    Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011
    Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SyS_wait4+0x1cb/0x1e0
    system_call_fastpath+0x16/0x1b

    Exclude the overflow to avoid the UBSAN warning.

    Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhongjiang
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Aneesh Kumar K.V
    Cc: Kirill A. Shutemov
    Cc: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhongjiang
     

08 Jul, 2017

1 commit

  • We lose the distinction between "found a PID" and "nothing, but that's not
    an error" a bit too early in waitid(). Easily fixed, fortunately...

    Reported-by: Markus Trippelsdorf
    Fixes: 67d7ddded322 ("waitid(2): leave copyout of siginfo to syscall itself")
    Tested-by: Markus Trippelsdorf
    Signed-off-by: Al Viro

    Al Viro
     

07 Jul, 2017

1 commit

  • Commit dd0db88d8094 ("userfaultfd: non-cooperative: rollback
    userfaultfd_exit") removed userfaultfd callback from exit() which makes
    the include of unnecessary.

    Link: http://lkml.kernel.org/r/1494930907-3060-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

06 Jul, 2017

1 commit

  • Pull wait syscall updates from Al Viro:
    "Consolidating sys_wait* and compat counterparts.

    Gets rid of set_fs()/double-copy mess, simplifies the whole thing
    (lifting the copyouts to the syscalls means less headache in the part
    that does actual work - fewer failure exits, to start with), gets rid
    of the overhead of field-by-field __put_user()"

    * 'work.sys_wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    osf_wait4: switch to kernel_wait4()
    waitid(): switch copyout of siginfo to unsafe_put_user()
    wait_task_zombie: consolidate info logics
    kill wait_noreap_copyout()
    lift getrusage() from wait_noreap_copyout()
    waitid(2): leave copyout of siginfo to syscall itself
    kernel_wait4()/kernel_waitid(): delay copying status to userland
    wait4(2)/waitid(2): separate copying rusage to userland
    move compat wait4 and waitid next to native variants

    Linus Torvalds
     

20 Jun, 2017

2 commits

  • This function was introduced by:

    150593bf8693 ("sched/api: Introduce task_rcu_dereference() and try_get_task_struct()")

    ... to allow easier usage of task_rcu_dereference(), however no users
    were ever added. Drop the helper.

    Signed-off-by: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Link: http://lkml.kernel.org/r/20170615023730.22827-1-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

22 May, 2017

9 commits


10 Mar, 2017

1 commit

  • Patch series "userfaultfd non-cooperative further update for 4.11 merge
    window".

    Unfortunately I noticed one relevant bug in userfaultfd_exit while doing
    more testing. I've been doing testing before and this was also tested
    by kbuild bot and exercised by the selftest, but this bug never
    reproduced before.

    I dropped userfaultfd_exit as result. I dropped it because of
    implementation difficulty in receiving signals in __mmput and because I
    think -ENOSPC as result from the background UFFDIO_COPY should be enough
    already.

    Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit
    wasn't exercised by the selftest and when I tried to exercise it, after
    moving it to a more correct place in __mmput where it would make more
    sense and where the vma list is stable, it resulted in the
    event_wait_completion in D state. So then I added the second patch to
    be sure even if we call userfaultfd_event_wait_completion too late
    during task exit(), we won't risk to generate tasks in D state. The
    same check exists in handle_userfault() for the same reason, except it
    makes a difference there, while here is just a robustness check and it's
    run under WARN_ON_ONCE.

    While looking at the userfaultfd_event_wait_completion() function I
    looked back at its callers too while at it and I think it's not ok to
    stop executing dup_fctx on the fcs list because we relay on
    userfaultfd_event_wait_completion to execute
    userfaultfd_ctx_put(fctx->orig) which is paired against
    userfaultfd_ctx_get(fctx->orig) in dup_userfault just before
    list_add(fcs). This change only takes care of fctx->orig but this area
    also needs further review looking for similar problems in fctx->new.

    The only patch that is urgent is the first because it's an use after
    free during a SMP race condition that affects all processes if
    CONFIG_USERFAULTFD=y. Very hard to reproduce though and probably
    impossible without SLUB poisoning enabled.

    This patch (of 3):

    I once reproduced this oops with the userfaultfd selftest, it's not
    easily reproducible and it requires SLUB poisoning to reproduce.

    general protection fault: 0000 [#1] SMP
    Modules linked in:
    CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G ------------ T 3.10.0+ #15
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000
    RIP: 0010:[] [] userfaultfd_exit+0x29/0xa0
    RSP: 0018:ffff8801f833fe80 EFLAGS: 00010202
    RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600
    RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000
    R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700
    FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    do_exit+0x297/0xd10
    SyS_exit+0x17/0x20
    tracesys+0xdd/0xe2
    Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80
    RIP [] userfaultfd_exit+0x29/0xa0
    RSP
    ---[ end trace 9fecd6dcb442846a ]---

    In the debugger I located the "mm" pointer in the stack and walking
    mm->mmap->vm_next through the end shows the vma->vm_next list is fully
    consistent and it is null terminated list as expected. So this has to
    be an SMP race condition where userfaultfd_exit was running while the
    vma list was being modified by another CPU.

    When userfaultfd_exit() run one of the ->vm_next pointers pointed to
    SLAB_POISON (RBX is the vma pointer and is 0x6b6b..).

    The reason is that it's not running in __mmput but while there are still
    other threads running and it's not holding the mmap_sem (it can't as it
    has to wait the even to be received by the manager). So this is an use
    after free that was happening for all processes.

    One more implementation problem aside from the race condition:
    userfaultfd_exit has really to check a flag in mm->flags before walking
    the vma or it's going to slowdown the exit() path for regular tasks.

    One more implementation problem: at that point signals can't be
    delivered so it would also create a task in D state if the manager
    doesn't read the event.

    The major design issue: it overall looks superfluous as the manager can
    check for -ENOSPC in the background transfer:

    if (mmget_not_zero(ctx->mm)) {
    [..]
    } else {
    return -ENOSPC;
    }

    It's safer to roll it back and re-introduce it later if at all.

    [rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT]
    Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Mike Rapoport
    Acked-by: Mike Rapoport
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Cc: Hillf Danton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

02 Mar, 2017

6 commits

  • …linux/sched/cputime.h>

    Introduce a trivial, mostly empty <linux/sched/cputime.h> header
    to prepare for the moving of cputime functionality out of sched.h.

    Update all code that relies on these facilities.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    The APIs that are going to be moved first are:

    mm_alloc()
    __mmdrop()
    mmdrop()
    mmdrop_async_fn()
    mmdrop_async()
    mmget_not_zero()
    mmput()
    mmput_async()
    get_task_mm()
    mm_access()
    mm_release()

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Feb, 2017

1 commit

  • Apart from adding the helper function itself, the rest of the kernel is
    converted mechanically using:

    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    (Michal Hocko provided most of the kerneldoc comment.)

    Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     

25 Feb, 2017

1 commit

  • Allow userfaultfd monitor track termination of the processes that have
    memory backed by the uffd.

    [rppt@linux.vnet.ibm.com: add comment]
    Link: http://lkml.kernel.org/r/20170202135448.GB19804@rapoport-lnxLink: http://lkml.kernel.org/r/1485542673-24387-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

24 Feb, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "There is a lot here. A lot of these changes result in subtle user
    visible differences in kernel behavior. I don't expect anything will
    care but I will revert/fix things immediately if any regressions show
    up.

    From Seth Forshee there is a continuation of the work to make the vfs
    ready for unpriviled mounts. We had thought the previous changes
    prevented the creation of files outside of s_user_ns of a filesystem,
    but it turns we missed the O_CREAT path. Ooops.

    Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
    standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
    children that are forked after the prctl are considered and not
    children forked before the prctl. The only known user of this prctl
    systemd forks all children after the prctl. So no userspace
    regressions will occur. Holding earlier forked children to the same
    rules as later forked children creates a semantic that is sane enough
    to allow checkpoing of processes that use this feature.

    There is a long delayed change by Nikolay Borisov to limit inotify
    instances inside a user namespace.

    Michael Kerrisk extends the API for files used to maniuplate
    namespaces with two new trivial ioctls to allow discovery of the
    hierachy and properties of namespaces.

    Konstantin Khlebnikov with the help of Al Viro adds code that when a
    network namespace exits purges it's sysctl entries from the dcache. As
    in some circumstances this could use a lot of memory.

    Vivek Goyal fixed a bug with stacked filesystems where the permissions
    on the wrong inode were being checked.

    I continue previous work on ptracing across exec. Allowing a file to
    be setuid across exec while being ptraced if the tracer has enough
    credentials in the user namespace, and if the process has CAP_SETUID
    in it's own namespace. Proc files for setuid or otherwise undumpable
    executables are now owned by the root in the user namespace of their
    mm. Allowing debugging of setuid applications in containers to work
    better.

    A bug I introduced with permission checking and automount is now
    fixed. The big change is to mark the mounts that the kernel initiates
    as a result of an automount. This allows the permission checks in sget
    to be safely suppressed for this kind of mount. As the permission
    check happened when the original filesystem was mounted.

    Finally a special case in the mount namespace is removed preventing
    unbounded chains in the mount hash table, and making the semantics
    simpler which benefits CRIU.

    The vfs fix along with related work in ima and evm I believe makes us
    ready to finish developing and merge fully unprivileged mounts of the
    fuse filesystem. The cleanups of the mount namespace makes discussing
    how to fix the worst case complexity of umount. The stacked filesystem
    fixes pave the way for adding multiple mappings for the filesystem
    uids so that efficient and safer containers can be implemented"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc/sysctl: Don't grab i_lock under sysctl_lock.
    vfs: Use upper filesystem inode in bprm_fill_uid()
    proc/sysctl: prune stale dentries during unregistering
    mnt: Tuck mounts under others instead of creating shadow/side mounts.
    prctl: propagate has_child_subreaper flag to every descendant
    introduce the walk_process_tree() helper
    nsfs: Add an ioctl() to return owner UID of a userns
    fs: Better permission checking for submounts
    exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
    vfs: open() with O_CREAT should not create inodes with unknown ids
    nsfs: Add an ioctl() to return the namespace type
    proc: Better ownership of files for non-dumpable tasks in user namespaces
    exec: Remove LSM_UNSAFE_PTRACE_CAP
    exec: Test the ptracer's saved cred to see if the tracee can gain caps
    exec: Don't reset euid and egid when the tracee has CAP_SETUID
    inotify: Convert to using per-namespace limits

    Linus Torvalds
     

22 Feb, 2017

1 commit

  • Pull security layer updates from James Morris:
    "Highlights:

    - major AppArmor update: policy namespaces & lots of fixes

    - add /sys/kernel/security/lsm node for easy detection of loaded LSMs

    - SELinux cgroupfs labeling support

    - SELinux context mounts on tmpfs, ramfs, devpts within user
    namespaces

    - improved TPM 2.0 support"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
    tpm: declare tpm2_get_pcr_allocation() as static
    tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
    tpm xen: drop unneeded chip variable
    tpm: fix misspelled "facilitate" in module parameter description
    tpm_tis: fix the error handling of init_tis()
    KEYS: Use memzero_explicit() for secret data
    KEYS: Fix an error code in request_master_key()
    sign-file: fix build error in sign-file.c with libressl
    selinux: allow changing labels for cgroupfs
    selinux: fix off-by-one in setprocattr
    tpm: silence an array overflow warning
    tpm: fix the type of owned field in cap_t
    tpm: add securityfs support for TPM 2.0 firmware event log
    tpm: enhance read_log_of() to support Physical TPM event log
    tpm: enhance TPM 2.0 PCR extend to support multiple banks
    tpm: implement TPM 2.0 capability to get active PCR banks
    tpm: fix RC value check in tpm2_seal_trusted
    tpm_tis: fix iTPM probe via probe_itpm() function
    tpm: Begin the process to deprecate user_read_timer
    tpm: remove tpm_read_index and tpm_write_index from tpm.h
    ...

    Linus Torvalds
     

21 Feb, 2017

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Implement wraparound-safe refcount_t and kref_t types based on
    generic atomic primitives (Peter Zijlstra)

    - Improve and fix the ww_mutex code (Nicolai Hähnle)

    - Add self-tests to the ww_mutex code (Chris Wilson)

    - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
    Bueso)

    - Micro-optimize the current-task logic all around the core kernel
    (Davidlohr Bueso)

    - Tidy up after recent optimizations: remove stale code and APIs,
    clean up the code (Waiman Long)

    - ... plus misc fixes, updates and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    fork: Fix task_struct alignment
    locking/spinlock/debug: Remove spinlock lockup detection code
    lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
    lkdtm: Convert to refcount_t testing
    kref: Implement 'struct kref' using refcount_t
    refcount_t: Introduce a special purpose refcount type
    sched/wake_q: Clarify queue reinit comment
    sched/wait, rcuwait: Fix typo in comment
    locking/mutex: Fix lockdep_assert_held() fail
    locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
    locking/rwsem: Reinit wake_q after use
    locking/rwsem: Remove unnecessary atomic_long_t casts
    jump_labels: Move header guard #endif down where it belongs
    locking/atomic, kref: Implement kref_put_lock()
    locking/ww_mutex: Turn off __must_check for now
    locking/atomic, kref: Avoid more abuse
    locking/atomic, kref: Use kref_get_unless_zero() more
    locking/atomic, kref: Kill kref_sub()
    locking/atomic, kref: Add kref_read()
    locking/atomic, kref: Add KREF_INIT()
    ...

    Linus Torvalds
     

01 Feb, 2017

1 commit

  • Now that most cputime readers use the transition API which return the
    task cputime in old style cputime_t, we can safely store the cputime in
    nsecs. This will eventually make cputime statistics less opaque and more
    granular. Back and forth convertions between cputime_t and nsecs in order
    to deal with cputime_t random granularity won't be needed anymore.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker