21 Oct, 2010

1 commit

  • lock_timer() conditionally grabs it_lock in case of returning non-NULL
    but unlock_timer() releases it unconditionally. This leads sparse to
    complain about the lock context imbalance. Rename and wrap lock_timer
    using __cond_lock() macro to make sparse happy.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Namhyung Kim
     

23 Jul, 2010

1 commit


28 May, 2010

1 commit

  • Move CLOCK_DISPATCH(which_clock, timer_create, (new_timer)) after all
    posible EFAULT erros.

    *_timer_create may allocate/get resources.
    (for example posix_cpu_timer_create does get_task_struct)

    [ tglx: fold the remove crappy comment patch into this ]

    Signed-off-by: Andrey Vagin
    Cc: Oleg Nesterov
    Cc: Pavel Emelyanov
    Cc:
    Reviewed-by: Stanislaw Gruszka
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Andrey Vagin
     

05 Feb, 2010

1 commit


22 Aug, 2009

1 commit

  • After talking with some application writers who want very fast, but not
    fine-grained timestamps, I decided to try to implement new clock_ids
    to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
    which returns the time at the last tick. This is very fast as we don't
    have to access any hardware (which can be very painful if you're using
    something like the acpi_pm clocksource), and we can even use the vdso
    clock_gettime() method to avoid the syscall. The only trade off is you
    only get low-res tick grained time resolution.

    This isn't a new idea, I know Ingo has a patch in the -rt tree that made
    the vsyscall gettimeofday() return coarse grained time when the
    vsyscall64 sysctrl was set to 2. However this affects all applications
    on a system.

    With this method, applications can choose the proper speed/granularity
    trade-off for themselves.

    Signed-off-by: John Stultz
    Cc: Andi Kleen
    Cc: nikolag@ca.ibm.com
    Cc: Darren Hart
    Cc: arjan@infradead.org
    Cc: jonathan@jonmasters.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    john stultz
     

04 Aug, 2009

1 commit


14 Jan, 2009

1 commit


26 Dec, 2008

1 commit


21 Dec, 2008

1 commit

  • Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW

    commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68 (clocksource:
    introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only
    available to read out the raw not NTP adjusted system time.

    The above commit did not prevent that a posix timer can be created
    with that clockid. The timer_create() syscall succeeds and initializes
    the timer to a non existing hrtimer base. When the timer is deleted
    either by timer_delete() or by the exit() cleanup the kernel crashes.

    Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the
    posix clock function to no_timer_create which returns an error code.

    Reported-and-tested-by: Eric Sesterhenn
    Signed-off-by: Thomas Gleixner
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

13 Dec, 2008

2 commits

  • Impact: clean up, speed up

    ->it_pid (was ->it_process) has also a special meaning: if it is NULL,
    the timer is under deletion or it wasn't initialized yet. We can check
    ->it_signal != NULL instead, this way we can

    - simplify sys_timer_create() a bit

    - remove yet another check from lock_timer()

    - move put_pid(->it_pid) into release_posix_timer() which
    runs outside of ->it_lock

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Impact: restructure, clean up code

    k_itimer holds the ref to the ->it_process until sys_timer_delete(). This
    allows to pin up to RLIMIT_SIGPENDING dead task_struct's. Change the code
    to use "struct pid *" instead.

    The patch doesn't kill ->it_process, it places ->it_pid into the union.
    ->it_process is still used by do_cpu_nanosleep() as before. It would be
    trivial to change the nanosleep code as well, but since it uses it_process
    in a special way I think it is better to keep this field for grep.

    The patch bloats the kernel by 104 bytes and it also adds the new pointer,
    ->it_signal, to k_itimer. It is used by lock_timer() to verify that the
    found timer was not created by another process. It is not clear why do we
    use the global database (and thus the global idr_lock) for posix timers.
    We still need the signal_struct->posix_timers which contains all useable
    timers, perhaps it is better to use some form of per-process array
    instead.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

22 Oct, 2008

1 commit


20 Oct, 2008

1 commit


18 Oct, 2008

1 commit


03 Oct, 2008

1 commit


24 Sep, 2008

9 commits

  • Cleanup. Imho makes the code much more understandable. At least this
    patch lessens both the source and compiled code.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • lock_timer() checks that the timer found by idr_find(timer_id) has ->it_id
    == timer_id. This buys nothing. This check can fail only if
    sys_timer_create() unlocked idr_lock after idr_get_new(), but didn't set
    ->it_id = new_timer_id yet. But in that case ->it_process == NULL so
    lock_timer() can't succeed anyway.

    Also remove a couple of unneeded typecasts.

    Note that with or without this patch we have a small problem.
    sys_timer_create() doesn't ensure that the result of setting (say)
    ->it_sigev_notify must be visible if lock_timer() succeeds.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • With the recent changes ->it_sigev_signo and ->it_sigev_value are only
    used in sys_timer_create(), kill them.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Cleanup.

    - sys_timer_create() is big and complicated. The code above the "out:"
    label relies on the fact that "error" must be == 0. This is not very
    robust, make the code more explicit. Remove the unneeded initialization
    of error.

    - If idr_get_new() succeeds (as it normally should), we check the returned
    value twice. Move the "-EAGAIN" check under "if (error)".

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • posix_timer_event() always populates timer->sigq with the same numbers,
    move this code into sys_timer_create().

    Note that with this patch we can kill it_sigev_signo and it_sigev_value.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • - Change the code to do rcu_read_lock() instead of taking tasklist_lock,
    it is safe to get_task_struct(p) if p was found under RCU.

    However, now we must not use process's sighand/signal, they may be NULL.
    We can use current->sighand/signal instead, this "process" must belong
    to the current's thread-group.

    - Factor out the common code for 2 "if (timer_event_spec)" branches, the
    !timer_event_spec case can use current too.

    - use spin_lock_irq() instead of _irqsave(), kill "flags".

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • sys_timer_create() return -EINVAL if the target thread has PF_EXITING.

    This doesn't really make sense, the sub-thread can die right after unlock.
    And in fact, this is just wrong. Without SIGEV_THREAD_ID good_sigevent()
    returns ->group_leader, and it is very possible that the leader is already
    dead. This is OK, we shouldn't return the error in this case.

    Remove this check and the comment. Note that the "process" was found
    under tasklist_lock, it must have ->sighand != NULL.

    Also, remove a couple of unneeded initializations.

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • Change the code to get/put timer->it_process regardless of
    SIGEV_THREAD_ID. This streamlines the create/destroy paths and allows us
    to simplify the usage of exit_itimers() in de_thread().

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     
  • posix_timer_event() drops SIGEV_THREAD_ID and switches to ->group_leader
    if send_sigqueue() fails.

    This is not very useful and doesn't work reliably. send_sigqueue() can
    only fail if ->it_process is dead. But it can die before it dequeues the
    SI_TIMER signal, in that case the timer stops anyway.

    Remove this code. I guess it was needed a long ago to ensure that the
    timer is not destroyed when when its creator thread dies.

    Q: perhaps it makes sense to change sys_timer_settime() to return an error
    if ->it_process is dead?

    Signed-off-by: Oleg Nesterov
    Cc: mingo@elte.hu
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

06 Sep, 2008

1 commit


21 Aug, 2008

1 commit

  • In talking with Josip Loncaric, and his work on clock synchronization (see
    btime.sf.net), he mentioned that for really close synchronization, it is
    useful to have access to "hardware time", that is a notion of time that is
    not in any way adjusted by the clock slewing done to keep close time sync.

    Part of the issue is if we are using the kernel's ntp adjusted
    representation of time in order to measure how we should correct time, we
    can run into what Paul McKenney aptly described as "Painting a road using
    the lines we're painting as the guide".

    I had been thinking of a similar problem, and was trying to come up with a
    way to give users access to a purely hardware based time representation
    that avoided users having to know the underlying frequency and mask values
    needed to deal with the wide variety of possible underlying hardware
    counters.

    My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a
    nanosecond based time value, that increments starting at bootup and has no
    frequency adjustments made to it what so ever.

    The time is accessed from userspace via the posix_clock_gettime() syscall,
    passing CLOCK_MONOTONIC_RAW as the clock_id.

    Signed-off-by: John Stultz
    Signed-off-by: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    John Stultz
     

12 Aug, 2008

1 commit


26 Jul, 2008

2 commits

  • release_posix_timer() can't be called with ->it_process != NULL. Once
    sys_timer_create() sets ->it_process it must not call
    release_posix_timer(), otherwise we can race with another thread doing
    sys_timer_delete(), this timer is visible to idr_find() and unlocked.

    The same is true for two other callers (actually, for any possible
    caller), sys_timer_delete() and itimer_delete(). They must clear
    ->it_process before unlock_timer() + release_posix_timer().

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: john stultz
    Cc: Thomas Gleixner
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_timer_delete() and itimer_delete() check "timer->it_process != NULL",
    this looks completely bogus. ->it_process == NULL means that this timer
    is already under destruction or it is not fully initialized, this must not
    happen.

    sys_timer_delete: the timer is locked, and lock_timer() can't succeed
    if ->it_process == NULL.

    itimer_delete: it is called by exit_itimers() when there are no other
    threads which can play with signal_struct->posix_timers.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: john stultz
    Cc: Thomas Gleixner
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

24 Jul, 2008

2 commits

  • The bug was reported and analysed by Mark McLoughlin ,
    the patch is based on his and Roland's suggestions.

    posix_timer_event() always rewrites the pre-allocated siginfo before sending
    the signal. Most of the written info is the same all the time, but memset(0)
    is very wrong. If ->sigq is queued we can race with collect_signal() which
    can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
    copy the wrong .si_code/si_tid/etc.

    In short, sys_timer_settime() can in fact stop the active timer, or the user
    can receive the siginfo with the wrong .si_xxx values.

    Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
    change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
    It would be nice to move the whole sigq->info initialization from send to
    create path, but this is not easy to do without uglifying timer_create()
    further.

    As Roland rightly pointed out, we need more cleanups/fixes here, see the
    "FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
    it can mask the most bad implications.

    Reported-by: Mark McLoughlin
    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    kernel/posix-timers.c | 17 +++++++++++++----
    kernel/signal.c | 1 +
    2 files changed, 14 insertions(+), 4 deletions(-)

    Oleg Nesterov
     
  • do_schedule_next_timer() sets info->si_overrun = timr->it_overrun_last,
    this discards the already accumulated overruns.

    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Oleg Nesterov
     

30 Apr, 2008

1 commit

  • We export send_sigqueue() and send_group_sigqueue() for the only user,
    posix_timer_event(). This is a bit silly, because both are just trivial
    helpers on top of do_send_sigqueue() and because the we pass the unused
    .si_signo parameter.

    Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

19 Apr, 2008

1 commit


15 Feb, 2008

1 commit

  • Various user space callers ask for relative timeouts. While we fixed
    that overflow issue in hrtimer_start(), the sites which convert
    relative user space values to absolute timeouts themself were uncovered.

    Instead of putting overflow checks into each place add a function
    which does the sanity checking and convert all affected callers to use
    it.

    Thanks to Frans Pop, who reported the problem and tested the fixes.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Tested-by: Frans Pop

    Thomas Gleixner
     

10 Feb, 2008

1 commit

  • Spotted by Pavel Emelyanov and Alexey Dobriyan.

    hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
    the local variable which lives in the caller's stack frame. This means that
    if sys_restart_syscall() actually happens and it is interrupted as well, we
    don't update the user-space variable, but write into the already dead stack
    frame.

    Introduced by commit 04c227140fed77587432667a574b14736a06dd7f
    hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

    Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
    hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.

    Small problem remains. man 2 nanosleep states that *rtmp should be written if
    nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
    if nanosleep returns 0), but (with or without this patch) we can dirty *rem
    even if nanosleep() returns 0.

    NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
    bugs. Fixed by the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Toyo Abe
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    include/linux/hrtimer.h | 2 -
    kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
    kernel/posix-timers.c | 14 +------------
    3 files changed, 30 insertions(+), 37 deletions(-)

    Oleg Nesterov
     

09 Feb, 2008

1 commit

  • All the functions that need to lookup a task by pid in posix timers obtain
    this pid from a user space, and thus this value refers to a task in the same
    namespace, as the current task lives in.

    So the proper behavior is to call find_task_by_vpid() here.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

06 Feb, 2008

1 commit

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

03 Feb, 2008

1 commit


20 Oct, 2007

1 commit

  • With pid namespaces this field is now dangerous to use explicitly, so hide
    it behind the helpers.

    Also the pid and pgrp fields o task_struct and signal_struct are to be
    deprecated. Unfortunately this patch cannot be sent right now as this
    leads to tons of warnings, so start isolating them, and deprecate later.

    Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
    but Oleg pointed out that in case of posix cpu timers this is the same, and
    thread_group_leader() is more preferable.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

19 Oct, 2007

1 commit