18 Dec, 2012

3 commits

  • Add lockdep annotations. Not only this can help to find the potential
    problems, we do not want the false warnings if, say, the task takes two
    different percpu_rw_semaphore's for reading. IOW, at least ->rw_sem
    should not use a single class.

    This patch exposes this internal lock to lockdep so that it represents the
    whole percpu_rw_semaphore. This way we do not need to add another "fake"
    ->lockdep_map and lock_class_key. More importantly, this also makes the
    output from lockdep much more understandable if it finds the problem.

    In short, with this patch from lockdep pov percpu_down_read() and
    percpu_up_read() acquire/release ->rw_sem for reading, this matches the
    actual semantics. This abuses __up_read() but I hope this is fine and in
    fact I'd like to have down_read_no_lockdep() as well,
    percpu_down_read_recursive_readers() will need it.

    Signed-off-by: Oleg Nesterov
    Cc: Anton Arapov
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Mikulas Patocka
    Cc: "Paul E. McKenney"
    Cc: Peter Zijlstra
    Cc: Srikar Dronamraju
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • percpu_rw_semaphore->writer_mutex was only added to simplify the initial
    rewrite, the only thing it protects is clear_fast_ctr() which otherwise
    could be called by multiple writers. ->rw_sem is enough to serialize the
    writers.

    Kill this mutex and add "atomic_t write_ctr" instead. The writers
    increment/decrement this counter, the readers check it is zero instead of
    mutex_is_locked().

    Move atomic_add(clear_fast_ctr(), slow_read_ctr) under down_write() to
    avoid the race with other writers. This is a bit sub-optimal, only the
    first writer needs this and we do not need to exclude the readers at this
    stage. But this is simple, we do not want another internal lock until we
    add more features.

    And this speeds up the write-contended case. Before this patch the racing
    writers sleep in synchronize_sched_expedited() sequentially, with this
    patch multiple synchronize_sched_expedited's can "overlap" with each
    other. Note: we can do more optimizations, this is only the first step.

    Signed-off-by: Oleg Nesterov
    Cc: Anton Arapov
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Mikulas Patocka
    Cc: "Paul E. McKenney"
    Cc: Peter Zijlstra
    Cc: Srikar Dronamraju
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently the writer does msleep() plus synchronize_sched() 3 times to
    acquire/release the semaphore, and during this time the readers are
    blocked completely. Even if the "write" section was not actually started
    or if it was already finished.

    With this patch down_write/up_write does synchronize_sched() twice and
    down_read/up_read are still possible during this time, just they use the
    slow path.

    percpu_down_write() first forces the readers to use rw_semaphore and
    increment the "slow" counter to take the lock for reading, then it
    takes that rw_semaphore for writing and blocks the readers.

    Also. With this patch the code relies on the documented behaviour of
    synchronize_sched(), it doesn't try to pair synchronize_sched() with
    barrier.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Paul E. McKenney
    Cc: Linus Torvalds
    Cc: Mikulas Patocka
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Srikar Dronamraju
    Cc: Ananth N Mavinakayanahalli
    Cc: Anton Arapov
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov