24 May, 2011

1 commit

  • Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard
    a single spinlock (idr_lock) in posix-timers code, on its 48 core
    machine.

    Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time
    used in ticket_spin_lock, from lock_timer

    Ref: http://www.spinics.net/lists/kvm/msg51526.html

    Switching to RCU is quite easy, IDR being already RCU ready. idr_lock
    should be locked only for an insert/delete, not a lookup.

    Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime().

    Before :

    real 1m18.669s
    user 0m1.346s
    sys 1m17.180s

    After :

    real 0m3.296s
    user 0m1.366s
    sys 0m1.926s

    Reported-by: Ben Nagy
    Signed-off-by: Eric Dumazet
    Tested-by: Ben Nagy
    Cc: Oleg Nesterov
    Cc: Avi Kivity
    Cc: John Stultz
    Cc: Richard Cochran
    Cc: Paul E. McKenney
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Eric Dumazet
     

23 May, 2011

1 commit


31 Mar, 2011

1 commit


22 Feb, 2011

1 commit


02 Feb, 2011

22 commits


21 Oct, 2010

1 commit

  • lock_timer() conditionally grabs it_lock in case of returning non-NULL
    but unlock_timer() releases it unconditionally. This leads sparse to
    complain about the lock context imbalance. Rename and wrap lock_timer
    using __cond_lock() macro to make sparse happy.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Namhyung Kim
     

23 Jul, 2010

1 commit


28 May, 2010

1 commit

  • Move CLOCK_DISPATCH(which_clock, timer_create, (new_timer)) after all
    posible EFAULT erros.

    *_timer_create may allocate/get resources.
    (for example posix_cpu_timer_create does get_task_struct)

    [ tglx: fold the remove crappy comment patch into this ]

    Signed-off-by: Andrey Vagin
    Cc: Oleg Nesterov
    Cc: Pavel Emelyanov
    Cc:
    Reviewed-by: Stanislaw Gruszka
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Andrey Vagin
     

05 Feb, 2010

1 commit


22 Aug, 2009

1 commit

  • After talking with some application writers who want very fast, but not
    fine-grained timestamps, I decided to try to implement new clock_ids
    to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
    which returns the time at the last tick. This is very fast as we don't
    have to access any hardware (which can be very painful if you're using
    something like the acpi_pm clocksource), and we can even use the vdso
    clock_gettime() method to avoid the syscall. The only trade off is you
    only get low-res tick grained time resolution.

    This isn't a new idea, I know Ingo has a patch in the -rt tree that made
    the vsyscall gettimeofday() return coarse grained time when the
    vsyscall64 sysctrl was set to 2. However this affects all applications
    on a system.

    With this method, applications can choose the proper speed/granularity
    trade-off for themselves.

    Signed-off-by: John Stultz
    Cc: Andi Kleen
    Cc: nikolag@ca.ibm.com
    Cc: Darren Hart
    Cc: arjan@infradead.org
    Cc: jonathan@jonmasters.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    john stultz
     

04 Aug, 2009

1 commit


14 Jan, 2009

1 commit


26 Dec, 2008

1 commit


21 Dec, 2008

1 commit

  • Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW

    commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68 (clocksource:
    introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only
    available to read out the raw not NTP adjusted system time.

    The above commit did not prevent that a posix timer can be created
    with that clockid. The timer_create() syscall succeeds and initializes
    the timer to a non existing hrtimer base. When the timer is deleted
    either by timer_delete() or by the exit() cleanup the kernel crashes.

    Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the
    posix clock function to no_timer_create which returns an error code.

    Reported-and-tested-by: Eric Sesterhenn
    Signed-off-by: Thomas Gleixner
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

13 Dec, 2008

2 commits

  • Impact: clean up, speed up

    ->it_pid (was ->it_process) has also a special meaning: if it is NULL,
    the timer is under deletion or it wasn't initialized yet. We can check
    ->it_signal != NULL instead, this way we can

    - simplify sys_timer_create() a bit

    - remove yet another check from lock_timer()

    - move put_pid(->it_pid) into release_posix_timer() which
    runs outside of ->it_lock

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Impact: restructure, clean up code

    k_itimer holds the ref to the ->it_process until sys_timer_delete(). This
    allows to pin up to RLIMIT_SIGPENDING dead task_struct's. Change the code
    to use "struct pid *" instead.

    The patch doesn't kill ->it_process, it places ->it_pid into the union.
    ->it_process is still used by do_cpu_nanosleep() as before. It would be
    trivial to change the nanosleep code as well, but since it uses it_process
    in a special way I think it is better to keep this field for grep.

    The patch bloats the kernel by 104 bytes and it also adds the new pointer,
    ->it_signal, to k_itimer. It is used by lock_timer() to verify that the
    found timer was not created by another process. It is not clear why do we
    use the global database (and thus the global idr_lock) for posix timers.
    We still need the signal_struct->posix_timers which contains all useable
    timers, perhaps it is better to use some form of per-process array
    instead.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

22 Oct, 2008

1 commit


20 Oct, 2008

1 commit


18 Oct, 2008

1 commit