16 Nov, 2020

1 commit


09 Nov, 2020

2 commits

  • The exit_pi_state_list() function calls put_pi_state() with IRQs disabled
    and is not expecting that IRQs will be enabled inside the function.

    Use the _irqsave() variant so that IRQs are restored to the original state
    instead of being enabled unconditionally.

    Fixes: 153fbd1226fb ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20201106085205.GA1159983@mwanda

    Dan Carpenter
     
  • Linux 5.10-rc3

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: I7884051ea7b86204b2685b51462368e122ad0772

    Greg Kroah-Hartman
     

08 Nov, 2020

1 commit

  • Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
    This is one possible chain of events leading to this:

    Task Prio Operation
    T1 120 lock(F)
    T2 120 lock(F) -> blocks (top waiter)
    T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
    XX timeout/ -> wakes T2
    signal
    T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
    T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
    and the lower priority T2 cannot steal the lock.
    -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

    The comment states that this is invalid and rt_mutex_real_owner() must
    return a non NULL owner when the trylock failed, but in case of a queued
    and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
    state. The higher priority waiter has simply not yet managed to take over
    the rtmutex.

    The BUG_ON() is therefore wrong and this is just another retry condition in
    fixup_pi_state_owner().

    Drop the locks, so that T3 can make progress, and then try the fixup again.

    Gratian provided a great analysis, traces and a reproducer. The analysis is
    to the point, but it confused the hell out of that tglx dude who had to
    page in all the futex horrors again. Condensed version is above.

    [ tglx: Wrote comment and changelog ]

    Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
    Reported-by: Gratian Crisan
    Signed-off-by: Mike Galbraith
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
    Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de

    Mike Galbraith
     

02 Nov, 2020

2 commits

  • Linux 5.10-rc2

    Signed-off-by: Greg Kroah-Hartman
    Change-Id: Ib7738b2fe5c513b7eb2dc7b475f4dc848df931d2

    Greg Kroah-Hartman
     
  • Pull locking fixes from Thomas Gleixner:
    "A couple of locking fixes:

    - Fix incorrect failure injection handling in the fuxtex code

    - Prevent a preemption warning in lockdep when tracking
    local_irq_enable() and interrupts are already enabled

    - Remove more raw_cpu_read() usage from lockdep which causes state
    corruption on !X86 architectures.

    - Make the nr_unused_locks accounting in lockdep correct again"

    * tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep: Fix nr_unused_locks accounting
    locking/lockdep: Remove more raw_cpu_read() usage
    futex: Fix incorrect should_fail_futex() handling
    lockdep: Fix preemption WARN for spurious IRQ-enable

    Linus Torvalds
     

29 Oct, 2020

1 commit


28 Oct, 2020

1 commit

  • If should_futex_fail() returns true in futex_wake_pi(), then the 'ret'
    variable is set to -EFAULT and then immediately overwritten. So the failure
    injection is non-functional.

    Fix it by actually leaving the function and returning -EFAULT.

    The Fixes tag is kinda blury because the initial commit which introduced
    failure injection was already sloppy, but the below mentioned commit broke
    it completely.

    [ tglx: Massaged changelog ]

    Fixes: 6b4f4bc9cb22 ("locking/futex: Allow low-level atomic operations to return -EAGAIN")
    Signed-off-by: Mateusz Nosek
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200927000858.24219-1-mateusznosek0@gmail.com

    Mateusz Nosek
     

26 Oct, 2020

1 commit


20 Oct, 2020

1 commit

  • For all commands except FUTEX_WAIT, the timeout is interpreted as an
    absolute value. This absolute value is inside the task's time namespace and
    has to be converted to the host's time.

    Fixes: 5a590f35add9 ("posix-clocks: Wire up clock_gettime() with timens offsets")
    Reported-by: Hans van der Laan
    Signed-off-by: Andrei Vagin
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Dmitry Safonov
    Cc:
    Link: https://lore.kernel.org/r/20201015160020.293748-1-avagin@gmail.com

    Andrei Vagin
     

17 Oct, 2020

1 commit

  • Fix multiple occurrences of duplicated words in kernel/.

    Fix one typo/spello on the same line as a duplicate word. Change one
    instance of "the the" to "that the". Otherwise just drop one of the
    repeated words.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

17 Aug, 2020

1 commit


14 Aug, 2020

1 commit


13 Aug, 2020

3 commits


05 Aug, 2020

1 commit

  • Pull uninitialized_var() macro removal from Kees Cook:
    "This is long overdue, and has hidden too many bugs over the years. The
    series has several "by hand" fixes, and then a trivial treewide
    replacement.

    - Clean up non-trivial uses of uninitialized_var()

    - Update documentation and checkpatch for uninitialized_var() removal

    - Treewide removal of uninitialized_var()"

    * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    compiler: Remove uninitialized_var() macro
    treewide: Remove uninitialized_var() usage
    checkpatch: Remove awareness of uninitialized_var() macro
    mm/debug_vm_pgtable: Remove uninitialized_var() usage
    f2fs: Eliminate usage of uninitialized_var() macro
    media: sur40: Remove uninitialized_var() usage
    KVM: PPC: Book3S PR: Remove uninitialized_var() usage
    clk: spear: Remove uninitialized_var() usage
    clk: st: Remove uninitialized_var() usage
    spi: davinci: Remove uninitialized_var() usage
    ide: Remove uninitialized_var() usage
    rtlwifi: rtl8192cu: Remove uninitialized_var() usage
    b43: Remove uninitialized_var() usage
    drbd: Remove uninitialized_var() usage
    x86/mm/numa: Remove uninitialized_var() usage
    docs: deprecated.rst: Add uninitialized_var()

    Linus Torvalds
     

18 Jul, 2020

4 commits

  • Since 82af7aca ("Removal of FUTEX_FD"), some includes related to file
    operations aren't needed anymore. More investigation around the includes
    showed that a lot of includes aren't required for compilation, possible
    due to redundant includes. Simplify the code by removing unused
    includes.

    Signed-off-by: André Almeida
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200702202843.520764-4-andrealmeid@collabora.com

    André Almeida
     
  • Since fshared is only conveying true/false values, declare it as bool.

    In get_futex_key() the usage of fshared can be restricted to the first part
    of the function. If fshared is false the function is terminated early and
    the subsequent code can use a constant 'true' instead of the variable.

    Signed-off-by: André Almeida
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200702202843.520764-5-andrealmeid@collabora.com

    André Almeida
     
  • As stated in the coding style documentation, "if there is no cleanup
    needed then just return directly", instead of jumping to a label and
    then returning.

    Remove such goto's and replace with a return statement. When there's a
    ternary operator on the return value, replace it with the result of the
    operation when it is logically possible to determine it by the control
    flow.

    Signed-off-by: André Almeida
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200702202843.520764-3-andrealmeid@collabora.com

    André Almeida
     
  • Since 4b39f99c ("futex: Remove {get,drop}_futex_key_refs()"),
    put_futex_key() is empty.

    Remove all references for this function and the then redundant labels.

    Signed-off-by: André Almeida
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200702202843.520764-2-andrealmeid@collabora.com

    André Almeida
     

17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

10 Jun, 2020

1 commit

  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

21 Apr, 2020

1 commit

  • Adjust whitespaces and blank lines in order to get rid of this:

    ./kernel/futex.c:491: WARNING: Definition list ends without a blank line; unexpected unindent.

    Signed-off-by: Mauro Carvalho Chehab
    Link: https://lore.kernel.org/r/57788af7889161483e0c97f91c079cfb3986c4b3.1586881715.git.mchehab+huawei@kernel.org
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

31 Mar, 2020

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Continued user-access cleanups in the futex code.

    - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
    instead of an embedded rwsem. This addresses a couple of
    weaknesses, but the primary motivation was complications on the -rt
    kernel.

    - Introduce raw lock nesting detection on lockdep
    (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
    lock differences. This too originates from -rt.

    - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
    footprint on distro-ish kernels running into the "BUG:
    MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
    chain-entries pool.

    - Misc cleanups, smaller fixes and enhancements - see the changelog
    for details"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
    fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
    thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
    Documentation/locking/locktypes: Minor copy editor fixes
    Documentation/locking/locktypes: Further clarifications and wordsmithing
    m68knommu: Remove mm.h include from uaccess_no.h
    x86: get rid of user_atomic_cmpxchg_inatomic()
    generic arch_futex_atomic_op_inuser() doesn't need access_ok()
    x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
    x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
    objtool: whitelist __sanitizer_cov_trace_switch()
    [parisc, s390, sparc64] no need for access_ok() in futex handling
    sh: no need of access_ok() in arch_futex_atomic_op_inuser()
    futex: arch_futex_atomic_op_inuser() calling conventions change
    completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
    lockdep: Add posixtimer context tracing bits
    lockdep: Annotate irq_work
    lockdep: Add hrtimer context tracing bits
    lockdep: Introduce wait-type checks
    completion: Use simple wait queues
    sched/swait: Prepare usage in completions
    ...

    Linus Torvalds
     

28 Mar, 2020

2 commits


10 Mar, 2020

1 commit

  • The recent futex inode life time fix changed the ordering of the futex key
    union struct members, but forgot to adjust the hash function accordingly,

    As a result the hashing omits the leading 64bit and even hashes beyond the
    futex key causing a bad hash distribution which led to a ~100% performance
    regression.

    Hand in the futex key pointer instead of a random struct member and make
    the size calculation based of the struct offset.

    Fixes: 8019ad13ef7f ("futex: Fix inode life-time issue")
    Reported-by: Rong Chen
    Decoded-by: Linus Torvalds
    Signed-off-by: Thomas Gleixner
    Tested-by: Rong Chen
    Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     

06 Mar, 2020

3 commits

  • Now that {get,drop}_futex_key_refs() have become a glorified NOP,
    remove them entirely.

    The only thing get_futex_key_refs() is still doing is an smp_mb(), and
    now that we don't need to (ab)use existing atomic ops to obtain them,
    we can place it explicitly where we need it.

    Signed-off-by: Peter Zijlstra (Intel)

    Peter Zijlstra
     
  • We always set 'key->private.mm' to 'current->mm', getting an extra
    reference on 'current->mm' is quite pointless, because as long as the
    task is blocked it isn't going to go away.

    Signed-off-by: Peter Zijlstra (Intel)

    Peter Zijlstra
     
  • As reported by Jann, ihold() does not in fact guarantee inode
    persistence. And instead of making it so, replace the usage of inode
    pointers with a per boot, machine wide, unique inode identifier.

    This sequence number is global, but shared (file backed) futexes are
    rare enough that this should not become a performance issue.

    Reported-by: Jann Horn
    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra (Intel)

    Peter Zijlstra
     

09 Jan, 2020

1 commit

  • Fix a kernel-doc warning in kernel/futex.c by adding notation
    for @ret.

    ../kernel/futex.c:1187: warning: Function parameter or member 'ret' not described in 'wait_for_owner_exiting'

    Fixes: 3ef240eaff36 ("futex: Prevent exit livelock")
    Signed-off-by: Randy Dunlap
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/223be78c-f3c8-52df-836d-c5fb8e7907e9@infradead.org

    Randy Dunlap
     

20 Nov, 2019

8 commits

  • Oleg provided the following test case:

    int main(void)
    {
    struct sched_param sp = {};

    sp.sched_priority = 2;
    assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);

    int lock = vfork();
    if (!lock) {
    sp.sched_priority = 1;
    assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);
    _exit(0);
    }

    syscall(__NR_futex, &lock, FUTEX_LOCK_PI, 0,0,0);
    return 0;
    }

    This creates an unkillable RT process spinning in futex_lock_pi() on a UP
    machine or if the process is affine to a single CPU. The reason is:

    parent child

    set FIFO prio 2

    vfork() -> set FIFO prio 1
    implies wait_for_child() sched_setscheduler(...)
    exit()
    do_exit()
    ....
    mm_release()
    tsk->futex_state = FUTEX_STATE_EXITING;
    exit_futex(); (NOOP in this case)
    complete() --> wakes parent
    sys_futex()
    loop infinite because
    tsk->futex_state == FUTEX_STATE_EXITING

    The same problem can happen just by regular preemption as well:

    task holds futex
    ...
    do_exit()
    tsk->futex_state = FUTEX_STATE_EXITING;

    --> preemption (unrelated wakeup of some other higher prio task, e.g. timer)

    switch_to(other_task)

    return to user
    sys_futex()
    loop infinite as above

    Just for the fun of it the futex exit cleanup could trigger the wakeup
    itself before the task sets its futex state to DEAD.

    To cure this, the handling of the exiting owner is changed so:

    - A refcount is held on the task

    - The task pointer is stored in a caller visible location

    - The caller drops all locks (hash bucket, mmap_sem) and blocks
    on task::futex_exit_mutex. When the mutex is acquired then
    the exiting task has completed the cleanup and the state
    is consistent and can be reevaluated.

    This is not a pretty solution, but there is no choice other than returning
    an error code to user space, which would break the state consistency
    guarantee and open another can of problems including regressions.

    For stable backports the preparatory commits ac31c7ff8624 .. ba31c1a48538
    are required as well, but for anything older than 5.3.y the backports are
    going to be provided when this hits mainline as the other dependencies for
    those kernels are definitely not stable material.

    Fixes: 778e9a9c3e71 ("pi-futex: fix exit races and locking problems")
    Reported-by: Oleg Nesterov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Cc: Stable Team
    Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de

    Thomas Gleixner
     
  • attach_to_pi_owner() returns -EAGAIN for various cases:

    - Owner task is exiting
    - Futex value has changed

    The caller drops the held locks (hash bucket, mmap_sem) and retries the
    operation. In case of the owner task exiting this can result in a live
    lock.

    As a preparatory step for seperating those cases, provide a distinct return
    value (EBUSY) for the owner exiting case.

    No functional change.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.935606117@linutronix.de

    Thomas Gleixner
     
  • The mutex will be used in subsequent changes to replace the busy looping of
    a waiter when the futex owner is currently executing the exit cleanup to
    prevent a potential live lock.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.845798895@linutronix.de

    Thomas Gleixner
     
  • exec() attempts to handle potentially held futexes gracefully by running
    the futex exit handling code like exit() does.

    The current implementation has no protection against concurrent incoming
    waiters. The reason is that the futex state cannot be set to
    FUTEX_STATE_DEAD after the cleanup because the task struct is still active
    and just about to execute the new binary.

    While its arguably buggy when a task holds a futex over exec(), for
    consistency sake the state handling can at least cover the actual futex
    exit cleanup section. This provides state consistency protection accross
    the cleanup. As the futex state of the task becomes FUTEX_STATE_OK after the
    cleanup has been finished, this cannot prevent subsequent attempts to
    attach to the task in case that the cleanup was not successfull in mopping
    up all leftovers.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.753355618@linutronix.de

    Thomas Gleixner
     
  • Instead of having a smp_mb() and an empty lock/unlock of task::pi_lock move
    the state setting into to the lock section.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.645603214@linutronix.de

    Thomas Gleixner
     
  • Instead of relying on PF_EXITING use an explicit state for the futex exit
    and set it in the futex exit function. This moves the smp barrier and the
    lock/unlock serialization into the futex code.

    As with the DEAD state this is restricted to the exit path as exec
    continues to use the same task struct.

    This allows to simplify that logic in a next step.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de

    Thomas Gleixner
     
  • Setting task::futex_state in do_exit() is rather arbitrarily placed for no
    reason. Move it into the futex code.

    Note, this is only done for the exit cleanup as the exec cleanup cannot set
    the state to FUTEX_STATE_DEAD because the task struct is still in active
    use.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.439511191@linutronix.de

    Thomas Gleixner
     
  • To allow separate handling of the futex exit state in the futex exit code
    for exit and exec, split futex_mm_release() into two functions and invoke
    them from the corresponding exit/exec_mm_release() callsites.

    Preparatory only, no functional change.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20191106224556.332094221@linutronix.de

    Thomas Gleixner