27 Mar, 2019

1 commit

  • commit 5a07168d8d89b00fe1760120714378175b3ef992 upstream.

    The futex code requires that the user space addresses of futexes are 32bit
    aligned. sys_futex() checks this in futex_get_keys() but the robust list
    code has no alignment check in place.

    As a consequence the kernel crashes on architectures with strict alignment
    requirements in handle_futex_death() when trying to cmpxchg() on an
    unaligned futex address which was retrieved from the robust list.

    [ tglx: Rewrote changelog, proper sizeof() based alignement check and add
    comment ]

    Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
    Signed-off-by: Chen Jie
    Signed-off-by: Thomas Gleixner
    Cc:
    Cc:
    Cc:
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Chen Jie
     

06 Mar, 2019

1 commit

  • [ Upstream commit b061c38bef43406df8e73c5be06cbfacad5ee6ad ]

    We must not rely on wake_q_add() to delay the wakeup; in particular
    commit:

    1d0dcb3ad9d3 ("futex: Implement lockless wakeups")

    moved wake_q_add() before smp_store_release(&q->lock_ptr, NULL), which
    could result in futex_wait() waking before observing ->lock_ptr ==
    NULL and going back to sleep again.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups")
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     

13 Feb, 2019

1 commit

  • commit 1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434 upstream.

    commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the
    rtmutex") changed the locking rules in the futex code so that the hash
    bucket lock is not longer held while the waiter is enqueued into the
    rtmutex wait list. This made the lock and the unlock path symmetric, but
    unfortunately the possible early exit from __rt_mutex_proxy_start() due to
    a detected deadlock was not updated accordingly. That allows a concurrent
    unlocker to observe inconsitent state which triggers the warning in the
    unlock path.

    futex_lock_pi() futex_unlock_pi()
    lock(hb->lock)
    queue(hb_waiter) lock(hb->lock)
    lock(rtmutex->wait_lock)
    unlock(hb->lock)
    // acquired hb->lock
    hb_waiter = futex_top_waiter()
    lock(rtmutex->wait_lock)
    __rt_mutex_proxy_start()
    ---> fail
    remove(rtmutex_waiter);
    ---> returns -EDEADLOCK
    unlock(rtmutex->wait_lock)
    // acquired wait_lock
    wake_futex_pi()
    rt_mutex_next_owner()
    --> returns NULL
    --> WARN

    lock(hb->lock)
    unqueue(hb_waiter)

    The problem is caused by the remove(rtmutex_waiter) in the failure case of
    __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
    hash bucket but no waiter on the rtmutex, i.e. inconsistent state.

    The original commit handles this correctly for the other early return cases
    (timeout, signal) by delaying the removal of the rtmutex waiter until the
    returning task reacquired the hash bucket lock.

    Treat the failure case of __rt_mutex_proxy_start() in the same way and let
    the existing cleanup code handle the eventual handover of the rtmutex
    gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
    removal for the failure case, so that the other callsites are still
    operating correctly.

    Add proper comments to the code so all these details are fully documented.

    Thanks to Peter for helping with the analysis and writing the really
    valuable code comments.

    Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex")
    Reported-by: Heiko Carstens
    Co-developed-by: Peter Zijlstra
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Tested-by: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: linux-s390@vger.kernel.org
    Cc: Stefan Liebler
    Cc: Sebastian Sewior
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Dec, 2018

1 commit

  • commit da791a667536bf8322042e38ca85d55a78d3c273 upstream.

    Stefan reported, that the glibc tst-robustpi4 test case fails
    occasionally. That case creates the following race between
    sys_exit() and sys_futex_lock_pi():

    CPU0 CPU1

    sys_exit() sys_futex()
    do_exit() futex_lock_pi()
    exit_signals(tsk) No waiters:
    tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
    mm_release(tsk) Set waiter bit
    exit_robust_list(tsk) { *uaddr = 0x80000PID;
    Set owner died attach_to_pi_owner() {
    *uaddr = 0xC0000000; tsk = get_task(PID);
    } if (!tsk->flags & PF_EXITING) {
    ... attach();
    tsk->flags |= PF_EXITPIDONE; } else {
    if (!(tsk->flags & PF_EXITPIDONE))
    return -EAGAIN;
    return -ESRCH;
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: Heiko Carstens
    Cc: Darren Hart
    Cc: Ingo Molnar
    Cc: Sasha Levin
    Cc: stable@vger.kernel.org
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
    Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Aug, 2018

1 commit


07 Feb, 2018

1 commit

  • There are several functions that do find_task_by_vpid() followed by
    get_task_struct(). We can use a helper function instead.

    Link: http://lkml.kernel.org/r/1509602027-11337-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

24 Jan, 2018

1 commit

  • Both Geert and DaveJ reported that the recent futex commit:

    c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")

    introduced a problem with setting OWNER_DEAD. We set the bit on an
    uninitialized variable and then entirely optimize it away as a
    dead-store.

    Move the setting of the bit to where it is more useful.

    Reported-by: Geert Uytterhoeven
    Reported-by: Dave Jones
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
    Link: http://lkml.kernel.org/r/20180122103947.GD2228@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Jan, 2018

2 commits

  • UBSAN reports signed integer overflow in kernel/futex.c:

    UBSAN: Undefined behaviour in kernel/futex.c:2041:18
    signed integer overflow:
    0 - -2147483648 cannot be represented in type 'int'

    Add a sanity check to catch negative values of nr_wake and nr_requeue.

    Signed-off-by: Li Jinyue
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Cc: dvhart@infradead.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com

    Li Jinyue
     
  • Julia reported futex state corruption in the following scenario:

    waiter waker stealer (prio > waiter)

    futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
    timeout=[N ms])
    futex_wait_requeue_pi()
    futex_wait_queue_me()
    freezable_schedule()

    futex(LOCK_PI, uaddr2)
    futex(CMP_REQUEUE_PI, uaddr,
    uaddr2, 1, 0)
    /* requeues waiter to uaddr2 */
    futex(UNLOCK_PI, uaddr2)
    wake_futex_pi()
    cmp_futex_value_locked(uaddr2, waiter)
    wake_up_q()

    task>
    futex(LOCK_PI, uaddr2)
    __rt_mutex_start_proxy_lock()
    try_to_take_rt_mutex() /* steals lock */
    rt_mutex_set_owner(lock, stealer)


    rt_mutex_wait_proxy_lock()
    __rt_mutex_slowlock()
    try_to_take_rt_mutex() /* fails, lock held by stealer */
    if (timeout && !timeout->task)
    return -ETIMEDOUT;
    fixup_owner()
    /* lock wasn't acquired, so,
    fixup_pi_state_owner skipped */

    return -ETIMEDOUT;

    /* At this point, we've returned -ETIMEDOUT to userspace, but the
    * futex word shows waiter to be the owner, and the pi_mutex has
    * stealer as the owner */

    futex_lock(LOCK_PI, uaddr2)
    -> bails with EDEADLK, futex word says we're owner.

    And suggested that what commit:

    73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

    removes from fixup_owner() looks to be just what is needed. And indeed
    it is -- I completely missed that requeue_pi could also result in this
    case. So we need to restore that, except that subsequent patches, like
    commit:

    16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

    changed all the locking rules. Even without that, the sequence:

    - if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
    - locked = 1;
    - goto out;
    - }

    - raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
    - owner = rt_mutex_owner(&q->pi_state->pi_mutex);
    - if (!owner)
    - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
    - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
    - ret = fixup_pi_state_owner(uaddr, q, owner);

    already suggests there were races; otherwise we'd never have to look
    at next_owner.

    So instead of doing 3 consecutive wait_lock sections with who knows
    what races, we do it all in a single section. Additionally, the usage
    of pi_state->owner in fixup_owner() was only safe because only the
    rt_mutex owner would modify it, which this additional case wrecks.

    Luckily the values can only change away and not to the value we're
    testing, this means we can do a speculative test and double check once
    we have the wait_lock.

    Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
    Reported-by: Julia Cartwright
    Reported-by: Gratian Crisan
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Julia Cartwright
    Tested-by: Gratian Crisan
    Cc: Darren Hart
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net

    Peter Zijlstra
     

11 Dec, 2017

1 commit

  • sign_extend32 counts the sign bit parameter from 0, not from 1. So we
    have to use "11" for 12th bit, not "12".

    This mistake means we have not allowed negative op and cmp args since
    commit 30d6e0a4190d ("futex: Remove duplicated code and fix undefined
    behaviour") till now.

    Fixes: 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour")
    Signed-off-by: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

04 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • In commit 30d6e0a4190d ("futex: Remove duplicated code and fix undefined
    behaviour"), I let FUTEX_WAKE_OP to fail on invalid op. Namely when op
    should be considered as shift and the shift is out of range (< 0 or > 31).

    But strace's test suite does this madness:

    futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee);
    futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced);
    futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff);

    When I pick the first 0xa0caffee, it decodes as:

    0x80000000 & 0xa0caffee: oparg is shift
    0x70000000 & 0xa0caffee: op is FUTEX_OP_OR
    0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ
    0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849
    0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18

    That means the op tries to do this:

    (futex |= (1 << (-849))) == -18

    which is completely bogus. The new check of op in the code is:

    if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
    if (oparg < 0 || oparg > 31)
    return -EINVAL;
    oparg = 1 << oparg;
    }

    which results obviously in the "Invalid argument" errno:

    FAIL: futex
    ===========

    futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument
    futex.test: failed test: ../futex failed with code 1

    So let us soften the failure to print only a (ratelimited) message, crop
    the value and continue as if it were right. When userspace keeps up, we
    can switch this to return -EINVAL again.

    [v2] Do not return 0 immediatelly, proceed with the cropped value.

    Fixes: 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour")
    Signed-off-by: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

01 Nov, 2017

1 commit

  • Dmitry (through syzbot) reported being able to trigger the WARN in
    get_pi_state() and a use-after-free on:

    raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

    Both are due to this race:

    exit_pi_state_list() put_pi_state()

    lock(&curr->pi_lock)
    while() {
    pi_state = list_first_entry(head);
    hb = hash_futex(&pi_state->key);
    unlock(&curr->pi_lock);

    dec_and_test(&pi_state->refcount);

    lock(&hb->lock)
    lock(&pi_state->pi_mutex.wait_lock) // uaf if pi_state free'd
    lock(&curr->pi_lock);

    ....

    unlock(&curr->pi_lock);
    get_pi_state(); // WARN; refcount==0

    The problem is we take the reference count too late, and don't allow it
    being 0. Fix it by using inc_not_zero() and simply retrying the loop
    when we fail to get a refcount. In that case put_pi_state() should
    remove the entry from the list.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Thomas Gleixner
    Cc: Gratian Crisan
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: dvhart@infradead.org
    Cc: syzbot
    Cc: syzkaller-bugs@googlegroups.com
    Cc:
    Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization")
    Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdfhjs@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Sep, 2017

1 commit

  • There was a reported suspicion about a race between exit_pi_state_list()
    and put_pi_state(). The same report mentioned the comment with
    put_pi_state() said it should be called with hb->lock held, and it no
    longer is in all places.

    As it turns out, the pi_state->owner serialization is indeed broken. As per
    the new rules:

    734009e96d19 ("futex: Change locking rules")

    pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock.
    For the sites setting pi_state->owner we already hold wait_lock (where
    required) but exit_pi_state_list() and put_pi_state() were not and
    raced on clearing it.

    Fixes: 734009e96d19 ("futex: Change locking rules")
    Reported-by: Gratian Crisan
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: dvhart@infradead.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4o4y@hirez.programming.kicks-ass.net

    Peter Zijlstra
     

26 Aug, 2017

1 commit

  • There is code duplicated over all architecture's headers for
    futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
    and comparison of the result.

    Remove this duplication and leave up to the arches only the needed
    assembly which is now in arch_futex_atomic_op_inuser.

    This effectively distributes the Will Deacon's arm64 fix for undefined
    behaviour reported by UBSAN to all architectures. The fix was done in
    commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
    FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

    And as suggested by Thomas, check for negative oparg too, because it was
    also reported to cause undefined behaviour report.

    Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
    remove pointless access_ok() checks") as access_ok there returns true.
    We introduce it back to the helper for the sake of simplicity (it gets
    optimized away anyway).

    Signed-off-by: Jiri Slaby
    Signed-off-by: Thomas Gleixner
    Acked-by: Russell King
    Acked-by: Michael Ellerman (powerpc)
    Acked-by: Heiko Carstens [s390]
    Acked-by: Chris Metcalf [for tile]
    Reviewed-by: Darren Hart (VMware)
    Reviewed-by: Will Deacon [core/arm64]
    Cc: linux-mips@linux-mips.org
    Cc: Rich Felker
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: peterz@infradead.org
    Cc: Benjamin Herrenschmidt
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: sparclinux@vger.kernel.org
    Cc: Jonas Bonn
    Cc: linux-s390@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: Yoshinori Sato
    Cc: linux-hexagon@vger.kernel.org
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Catalin Marinas
    Cc: Matt Turner
    Cc: linux-snps-arc@lists.infradead.org
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: linux-xtensa@linux-xtensa.org
    Cc: Stefan Kristiansson
    Cc: openrisc@lists.librecores.org
    Cc: Ivan Kokshaysky
    Cc: Stafford Horne
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Richard Henderson
    Cc: Chris Zankel
    Cc: Michal Simek
    Cc: Tony Luck
    Cc: linux-parisc@vger.kernel.org
    Cc: Vineet Gupta
    Cc: Ralf Baechle
    Cc: Richard Kuo
    Cc: linux-alpha@vger.kernel.org
    Cc: Martin Schwidefsky
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz

    Jiri Slaby
     

10 Aug, 2017

2 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in
    get_futex_key()") removed an unnecessary lock_page() with the
    side-effect that page->mapping needed to be treated very carefully.

    Two defensive warnings were added in case any assumption was missed and
    the first warning assumed a correct application would not alter a
    mapping backing a futex key. Since merging, it has not triggered for
    any unexpected case but Mark Rutland reported the following bug
    triggering due to the first warning.

    kernel BUG at kernel/futex.c:679!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
    Hardware name: linux,dummy-virt (DT)
    task: ffff80001e271780 task.stack: ffff000010908000
    PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
    LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
    pc : [] lr : [] pstate: 80000145

    The fact that it's a bug instead of a warning was due to an unrelated
    arm64 problem, but the warning itself triggered because the underlying
    mapping changed.

    This is an application issue but from a kernel perspective it's a
    recoverable situation and the warning is unnecessary so this patch
    removes the warning. The warning may potentially be triggered with the
    following test program from Mark although it may be necessary to adjust
    NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
    system.

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define NR_FUTEX_THREADS 16
    pthread_t threads[NR_FUTEX_THREADS];

    void *mem;

    #define MEM_PROT (PROT_READ | PROT_WRITE)
    #define MEM_SIZE 65536

    static int futex_wrapper(int *uaddr, int op, int val,
    const struct timespec *timeout,
    int *uaddr2, int val3)
    {
    syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
    }

    void *poll_futex(void *unused)
    {
    for (;;) {
    futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
    }
    }

    int main(int argc, char *argv[])
    {
    int i;

    mem = mmap(NULL, MEM_SIZE, MEM_PROT,
    MAP_SHARED | MAP_ANONYMOUS, -1, 0);

    printf("Mapping @ %p\n", mem);

    printf("Creating futex threads...\n");

    for (i = 0; i < NR_FUTEX_THREADS; i++)
    pthread_create(&threads[i], NULL, poll_futex, NULL);

    printf("Flipping mapping...\n");
    for (;;) {
    mmap(mem, MEM_SIZE, MEM_PROT,
    MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    }

    return 0;
    }

    Reported-and-tested-by: Mark Rutland
    Signed-off-by: Mel Gorman
    Acked-by: Peter Zijlstra (Intel)
    Cc: stable@vger.kernel.org # 4.7+
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

01 Aug, 2017

1 commit

  • This makes it possible to preserve basic futex support and compile out the
    PI support when RT mutexes are not available.

    Signed-off-by: Nicolas Pitre
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708010024190.5981@knanqh.ubzr

    Nicolas Pitre
     

19 Jul, 2017

1 commit

  • Pull structure randomization updates from Kees Cook:
    "Now that IPC and other changes have landed, enable manual markings for
    randstruct plugin, including the task_struct.

    This is the rest of what was staged in -next for the gcc-plugins, and
    comes in three patches, largest first:

    - mark "easy" structs with __randomize_layout

    - mark task_struct with an optional anonymous struct to isolate the
    __randomize_layout section

    - mark structs to opt _out_ of automated marking (which will come
    later)

    And, FWIW, this continues to pass allmodconfig (normal and patched to
    enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
    s390 for me"

    * tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    randstruct: opt-out externally exposed function pointer structs
    task_struct: Allow randomized layout
    randstruct: Mark various structs for randomization

    Linus Torvalds
     

04 Jul, 2017

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "There has been a fair amount of activity in the docs tree this time
    around. Highlights include:

    - Conversion of a bunch of security documentation into RST

    - The conversion of the remaining DocBook templates by The Amazing
    Mauro Machine. We can now drop the entire DocBook build chain.

    - The usual collection of fixes and minor updates"

    * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
    scripts/kernel-doc: handle DECLARE_HASHTABLE
    Documentation: atomic_ops.txt is core-api/atomic_ops.rst
    Docs: clean up some DocBook loose ends
    Make the main documentation title less Geocities
    Docs: Use kernel-figure in vidioc-g-selection.rst
    Docs: fix table problems in ras.rst
    Docs: Fix breakage with Sphinx 1.5 and upper
    Docs: Include the Latex "ifthen" package
    doc/kokr/howto: Only send regression fixes after -rc1
    docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
    doc: Document suitability of IBM Verse for kernel development
    Doc: fix a markup error in coding-style.rst
    docs: driver-api: i2c: remove some outdated information
    Documentation: DMA API: fix a typo in a function name
    Docs: Insert missing space to separate link from text
    doc/ko_KR/memory-barriers: Update control-dependencies example
    Documentation, kbuild: fix typo "minimun" -> "minimum"
    docs: Fix some formatting issues in request-key.rst
    doc: ReSTify keys-trusted-encrypted.txt
    doc: ReSTify keys-request-key.txt
    ...

    Linus Torvalds
     

01 Jul, 2017

1 commit

  • This marks many critical kernel structures for randomization. These are
    structures that have been targeted in the past in security exploits, or
    contain functions pointers, pointers to function pointer tables, lists,
    workqueues, ref-counters, credentials, permissions, or are otherwise
    sensitive. This initial list was extracted from Brad Spengler/PaX Team's
    code in the last public patch of grsecurity/PaX based on my understanding
    of the code. Changes or omissions from the original code are mine and
    don't reflect the original grsecurity/PaX code.

    Left out of this list is task_struct, which requires special handling
    and will be covered in a subsequent patch.

    Signed-off-by: Kees Cook

    Kees Cook
     

20 Jun, 2017

1 commit

  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 May, 2017

1 commit

  • There are a few issues on some kernel-doc markups that was
    causing troubles with kernel-doc output on ReST format:

    ./kernel/futex.c:492: WARNING: Inline emphasis start-string without end-string.
    ./kernel/futex.c:1264: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./kernel/futex.c:1721: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./kernel/futex.c:2338: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./kernel/futex.c:2426: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./kernel/futex.c:2899: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./kernel/futex.c:2972: WARNING: Block quote ends without a blank line; unexpected unindent.

    Fix them.

    No functional changes.

    Acked-by: Darren Hart (VMware)
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

15 Apr, 2017

1 commit

  • Clarify the scenario described in mark_wake_futex requiring the
    smp_store_release(). Update the comment to explicitly refer to the
    plist_del now under __unqueue_futex() (previously plist_del was in the
    same function as the comment).

    Signed-off-by: Darren Hart (VMware)
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20170414223138.GA4222@fury
    Signed-off-by: Thomas Gleixner

    Darren Hart (VMware)
     

14 Apr, 2017

2 commits

  • During (post-commit) review Darren spotted a few minor things. One
    (harmless AFAICT) type inconsistency and a comment that wasn't as
    clear as hoped.

    Reported-by: Darren Hart (VMWare)
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Darren Hart (VMware)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Alexander reported a hrtimer debug_object splat:

    ODEBUG: free active (active state 0) object type: hrtimer hint: hrtimer_wakeup (kernel/time/hrtimer.c:1423)

    debug_object_free (lib/debugobjects.c:603)
    destroy_hrtimer_on_stack (kernel/time/hrtimer.c:427)
    futex_lock_pi (kernel/futex.c:2740)
    do_futex (kernel/futex.c:3399)
    SyS_futex (kernel/futex.c:3447 kernel/futex.c:3415)
    do_syscall_64 (arch/x86/entry/common.c:284)
    entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:249)

    Which was caused by commit:

    cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")

    ... losing the hrtimer_cancel() in the shuffle. Where previously the
    hrtimer_cancel() was done by rt_mutex_slowlock() we now need to do it
    manually.

    Reported-by: Alexander Levin
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Fixes: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704101802370.2906@nanos
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

04 Apr, 2017

2 commits

  • Previous patches changed the meaning of the return value of
    rt_mutex_slowunlock(); update comments and code to reflect this.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170323150216.255058238@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • We should deboost before waking the high-priority task, such that we
    don't run two tasks with the same "state" (priority, deadline,
    sched_class, etc).

    In order to make sure the boosting task doesn't start running between
    unlock and deboost (due to 'spurious' wakeup), we move the deboost
    under the wait_lock, that way its serialized against the wait loop in
    __rt_mutex_slowlock().

    Doing the deboost early can however lead to priority-inversion if
    current would get preempted after the deboost but before waking our
    high-prio task, hence we disable preemption before doing deboost, and
    enabling it after the wake up is over.

    This gets us the right semantic order, but most importantly however;
    this change ensures pointer stability for the next patch, where we
    have rt_mutex_setprio() cache a pointer to the top-most waiter task.
    If we, as before this change, do the wakeup first and then deboost,
    this pointer might point into thin air.

    [peterz: Changelog + patch munging]
    Suggested-by: Peter Zijlstra
    Signed-off-by: Xunlei Pang
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Steven Rostedt
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170323150216.110065320@infradead.org
    Signed-off-by: Thomas Gleixner

    Xunlei Pang
     

24 Mar, 2017

12 commits

  • When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
    chain code will (falsely) report a deadlock and BUG.

    The problem is that it hold hb->lock (now an rt_mutex) while doing
    task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
    interleaved just right with futex_unlock_pi() leads it to believe to see an
    AB-BA deadlock.

    Task1 (holds rt_mutex, Task2 (does FUTEX_LOCK_PI)
    does FUTEX_UNLOCK_PI)

    lock hb->lock
    lock rt_mutex (as per start_proxy)
    lock hb->lock

    Which is a trivial AB-BA.

    It is not an actual deadlock, because it won't be holding hb->lock by the
    time it actually blocks on the rt_mutex, but the chainwalk code doesn't
    know that and it would be a nightmare to handle this gracefully.

    To avoid this problem, do the same as in futex_unlock_pi() and drop
    hb->lock after acquiring wait_lock. This still fully serializes against
    futex_unlock_pi(), since adding to the wait_list does the very same lock
    dance, and removing it holds both locks.

    Aside of solving the RT problem this makes the lock and unlock mechanism
    symetric and reduces the hb->lock held time.

    Reported-and-tested-by: Sebastian Andrzej Siewior
    Suggested-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • The problem with returning -EAGAIN when the waiter state mismatches is that
    it becomes very hard to proof a bounded execution time on the
    operation. And seeing that this is a RT operation, this is somewhat
    important.

    While in practise; given the previous patch; it will be very unlikely to
    ever really take more than one or two rounds, proving so becomes rather
    hard.

    However, now that modifying wait_list is done while holding both hb->lock
    and wait_lock, the scenario can be avoided entirely by acquiring wait_lock
    while still holding hb-lock. Doing a hand-over, without leaving a hole.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list
    modifications are done under both hb->lock and wait_lock.

    This closes the obvious interleave pattern between futex_lock_pi() and
    futex_unlock_pi(), but not entirely so. See below:

    Before:

    futex_lock_pi() futex_unlock_pi()
    unlock hb->lock

    lock hb->lock
    unlock hb->lock

    lock rt_mutex->wait_lock
    unlock rt_mutex_wait_lock
    -EAGAIN

    lock rt_mutex->wait_lock
    list_add
    unlock rt_mutex->wait_lock

    schedule()

    lock rt_mutex->wait_lock
    list_del
    unlock rt_mutex->wait_lock


    -EAGAIN

    lock hb->lock

    After:

    futex_lock_pi() futex_unlock_pi()

    lock hb->lock
    lock rt_mutex->wait_lock
    list_add
    unlock rt_mutex->wait_lock
    unlock hb->lock

    schedule()
    lock hb->lock
    unlock hb->lock
    lock hb->lock
    lock rt_mutex->wait_lock
    list_del
    unlock rt_mutex->wait_lock

    lock rt_mutex->wait_lock
    unlock rt_mutex_wait_lock
    -EAGAIN

    unlock hb->lock

    It does however solve the earlier starvation/live-lock scenario which got
    introduced with the -EAGAIN since unlike the before scenario; where the
    -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the
    after scenario it happens while futex_unlock_pi() actually holds a lock,
    and then it is serialized on that lock.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
    consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
    parts, such that only the actual blocking can be done without hb->lock
    held.

    Split split_mutex_finish_proxy_lock() into two parts, one that does the
    blocking and one that does remove_waiter() when the lock acquire failed.

    When the rtmutex was acquired successfully the waiter can be removed in the
    acquisiton path safely, since there is no concurrency on the lock owner.

    This means that, except for futex_lock_pi(), all wait_list modifications
    are done with both hb->lock and wait_lock held.

    [bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • Since there's already two copies of this code, introduce a helper now
    before adding a third one.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.950039479@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • There's a number of 'interesting' problems, all caused by holding
    hb->lock while doing the rt_mutex_unlock() equivalient.

    Notably:

    - a PI inversion on hb->lock; and,

    - a SCHED_DEADLINE crash because of pointer instability.

    The previous changes:

    - changed the locking rules to cover {uval,pi_state} with wait_lock.

    - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in
    turn allows to rely on wait_lock atomicity completely.

    - simplified the waiter conundrum.

    It's now sufficient to hold rtmutex::wait_lock and a reference on the
    pi_state to protect the state consistency, so hb->lock can be dropped
    before calling rt_mutex_futex_unlock().

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • There is a weird state in the futex_unlock_pi() path when it interleaves
    with a concurrent futex_lock_pi() at the point where it drops hb->lock.

    In this case, it can happen that the rt_mutex wait_list and the futex_q
    disagree on pending waiters, in particular rt_mutex will find no pending
    waiters where futex_q thinks there are. In this case the rt_mutex unlock
    code cannot assign an owner.

    The futex side fixup code has to cleanup the inconsistencies with quite a
    bunch of interesting corner cases.

    Simplify all this by changing wake_futex_pi() to return -EAGAIN when this
    situation occurs. This then gives the futex_lock_pi() code the opportunity
    to continue and the retried futex_unlock_pi() will now observe a coherent
    state.

    The only problem is that this breaks RT timeliness guarantees. That
    is, consider the following scenario:

    T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)

    CPU0

    T1
    lock_pi()
    queue_me()
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • Add a put_pit_state() as counterpart for get_pi_state() so the refcounting
    becomes consistent.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
    creates another set of problems, especially priority inversions on RT where
    hb->lock becomes a rt_mutex itself.

    The rt_mutex::wait_lock is the most obvious protection for keeping the
    futex user space value and the kernel internal pi_state in sync.

    Rework and document the locking so rt_mutex::wait_lock is held accross all
    operations which modify the user space value and the pi state.

    This allows to invoke rt_mutex_unlock() (including deboost) without holding
    hb->lock as a next step.

    Nothing yet relies on the new locking rules.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • Part of what makes futex_unlock_pi() intricate is that
    rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop
    rt_mutex::wait_lock.

    This means it cannot rely on the atomicy of wait_lock, which would be
    preferred in order to not rely on hb->lock so much.

    The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can
    race with the rt_mutex fastpath, however futexes have their own fast path.

    Since futexes already have a bunch of separate rt_mutex accessors, complete
    that set and implement a rt_mutex variant without fastpath for them.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • Since the futex_q can dissapear the instruction after assigning NULL,
    this really should be a RELEASE barrier. That stops loads from hitting
    dead memory too.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.604296452@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
    this to a variable 'match' totally obscures the code.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: juri.lelli@arm.com
    Cc: bigeasy@linutronix.de
    Cc: xlpang@redhat.com
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: jdesfossez@efficios.com
    Cc: dvhart@infradead.org
    Cc: bristot@redhat.com
    Link: http://lkml.kernel.org/r/20170322104151.554710645@infradead.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra