18 Aug, 2016

1 commit

  • The current percpu-rwsem read side is entirely free of serializing insns
    at the cost of having a synchronize_sched() in the write path.

    The latency of the synchronize_sched() is too high for cgroups. The
    commit 1ed1328792ff talks about the write path being a fairly cold path
    but this is not the case for Android which moves task to the foreground
    cgroup and back around binder IPC calls from foreground processes to
    background processes, so it is significantly hotter than human initiated
    operations.

    Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
    problem, hopefully it should not be that slow after another commit:

    80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global impact").

    We could just add rcu_sync_enter() into cgroup_init() but we do not want
    another synchronize_sched() at boot time, so this patch adds the new helper
    which doesn't block but currently can only be called before the first use.

    Reported-by: John Stultz
    Reported-by: Dmitry Shmidt
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Colin Cross
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rom Lemarchand
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Todd Kjos
    Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Aug, 2016

1 commit

  • Currently the percpu-rwsem switches to (global) atomic ops while a
    writer is waiting; which could be quite a while and slows down
    releasing the readers.

    This patch cures this problem by ordering the reader-state vs
    reader-count (see the comments in __percpu_down_read() and
    percpu_down_write()). This changes a global atomic op into a full
    memory barrier, which doesn't have the global cacheline contention.

    This also enables using the percpu-rwsem with rcu_sync disabled in order
    to bias the implementation differently, reducing the writer latency by
    adding some cost to readers.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    [ Fixed modular build. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Oct, 2015

5 commits

  • 1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
    change it to use rcu_lockdep_assert().

    2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
    unconditonally, this way we can remove the same check from
    rcu_sync_lockdep_assert() and clearly isolate the debugging
    code.

    Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
    another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
    this needs some simple preparations in the core RCU code to avoid the
    code duplication.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit allows rcu_sync structures to be safely deallocated,
    The trick is to add a new ->wait field to the gp_ops array.
    This field is a pointer to the rcu_barrier() function corresponding
    to the flavor of RCU in question. This allows a new rcu_sync_dtor()
    to wait for any outstanding callbacks before freeing the rcu_sync
    structure.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit validates that the caller of rcu_sync_is_idle() holds the
    corresponding type of RCU read-side lock, but only in kernels built
    with CONFIG_PROVE_RCU=y. This validation is carried out via a new
    rcu_sync_ops->held() method that is checked within rcu_sync_is_idle().

    Note that although this does add code to the fast path, it only does so
    in kernels built with CONFIG_PROVE_RCU=y.

    Suggested-by: "Paul E. McKenney"
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • This commit adds the new struct rcu_sync_ops which holds sync/call
    methods, and turns the function pointers in rcu_sync_struct into an array
    of struct rcu_sync_ops. This simplifies the "init" helpers by collapsing
    a switch statement and explicit multiple definitions into a simple
    assignment and a helper macro, respectively.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov
     
  • The rcu_sync infrastructure can be thought of as infrastructure to be
    used to implement reader-writer primitives having extremely lightweight
    readers during times when there are no writers. The first use is in
    the percpu_rwsem used by the VFS subsystem.

    This infrastructure is functionally equivalent to

    struct rcu_sync_struct {
    atomic_t counter;
    };

    /* Check possibility of fast-path read-side operations. */
    static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
    {
    return atomic_read(&rss->counter) == 0;
    }

    /* Tell readers to use slowpaths. */
    static inline void rcu_sync_enter(struct rcu_sync_struct *rss)
    {
    atomic_inc(&rss->counter);
    synchronize_sched();
    }

    /* Allow readers to once again use fastpaths. */
    static inline void rcu_sync_exit(struct rcu_sync_struct *rss)
    {
    synchronize_sched();
    atomic_dec(&rss->counter);
    }

    The main difference is that it records the state and only calls
    synchronize_sched() if required. At least some of the calls to
    synchronize_sched() will be optimized away when rcu_sync_enter() and
    rcu_sync_exit() are invoked repeatedly in quick succession.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Oleg Nesterov