11 Aug, 2010

1 commit

  • * 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
    unistd: add __NR_prlimit64 syscall numbers
    rlimits: implement prlimit64 syscall
    rlimits: switch more rlimit syscalls to do_prlimit
    rlimits: redo do_setrlimit to more generic do_prlimit
    rlimits: add rlimit64 structure
    rlimits: do security check under task_lock
    rlimits: allow setrlimit to non-current tasks
    rlimits: split sys_setrlimit
    rlimits: selinux, do rlimits changes under task_lock
    rlimits: make sure ->rlim_max never grows in sys_setrlimit
    rlimits: add task_struct to update_rlimit_cpu
    rlimits: security, add task_struct to setrlimit

    Fix up various system call number conflicts. We not only added fanotify
    system calls in the meantime, but asm-generic/unistd.h added a wait4
    along with a range of reserved per-architecture system calls.

    Linus Torvalds
     

16 Jul, 2010

1 commit


18 Jun, 2010

3 commits

  • fastpath_timer_check()->thread_group_cputimer() is racy and
    unneeded.

    It is racy because another thread can clear ->running before
    thread_group_cputimer() takes cputimer->lock. In this case
    thread_group_cputimer() will set ->running = true again and call
    thread_group_cputime(). But since we do not hold tasklist or
    siglock, we can race with fork/exit and copy the wrong results
    into cputimer->cputime.

    It is unneeded because if ->running == true we can just use
    the numbers in cputimer->cputime we already have.

    Change fastpath_timer_check() to copy cputimer->cputime into
    the local variable under cputimer->lock. We do not re-check
    ->running under cputimer->lock, run_posix_cpu_timers() does
    this check later.

    Note: we can add more optimizations on top of this change.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • run_posix_cpu_timers() doesn't work if current has already passed
    exit_notify(). This was needed to prevent the races with do_wait().

    Since ea6d290c ->signal is always valid and can't go away. We can
    remove the "tsk->exit_state == 0" in fastpath_timer_check() and
    convert run_posix_cpu_timers() to use lock_task_sighand().

    Note: it makes sense to take group_leader's sighand instead, the
    sub-thread still uses CPU after release_task(). But we need more
    changes to do this.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • thread_group_cputime() looks as if it is rcu-safe, but in fact this
    was wrong until ea6d290c which pins task->signal to task_struct.
    It checks ->sighand != NULL under rcu, but this can't help if ->signal
    can go away. Fortunately the caller either holds ->siglock, or it is
    fastpath_timer_check() which uses current and checks exit_state == 0.

    - Since ea6d290c commit tsk->signal is stable, we can read it first
    and avoid the initialization from INIT_CPUTIME.

    - Even if tsk->signal is always valid, we still have to check it
    is safe to use next_thread() under rcu_read_lock(). Currently
    the code checks ->sighand != NULL, change it to use pid_alive()
    which is commonly used to ensure the task wasn't unhashed before
    we take rcu_read_lock().

    Add the comment to explain this check.

    - Change the main loop to use the while_each_thread() helper.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

28 May, 2010

1 commit

  • Preparation to make task->signal immutable, no functional changes.

    posix-cpu-timers.c checks task->signal != NULL to ensure this task is
    alive and didn't pass __exit_signal(). This is correct but we are going
    to change the lifetime rules for ->signal and never reset this pointer.

    Change the code to check ->sighand instead, it doesn't matter which
    pointer we check under tasklist, they both are cleared simultaneously.

    As Roland pointed out, some of these changes are not strictly needed and
    probably it makes sense to revert them later, when ->signal will be pinned
    to task_struct. But this patch tries to ensure the subsequent changes in
    fork/exit can't make any visible impact on posix cpu timers.

    Signed-off-by: Oleg Nesterov
    Cc: Fenghua Yu
    Acked-by: Roland McGrath
    Cc: Stanislaw Gruszka
    Cc: Tony Luck
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

10 May, 2010

2 commits

  • We can optimize and simplify things taking into account signal->cputimer
    is always running when we have configured any process wide cpu timer.

    In check_process_timers(), we don't have to check if new updated value of
    signal->cputime_expires is smaller, since we maintain new first expiration
    time ({prof,virt,sched}_expires) in code flow and all other writes to
    expiration cache are protected by sighand->siglock .

    Signed-off-by: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • Reason: Further posix_cpu_timer patches depend on mainline changes

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

27 Mar, 2010

1 commit


13 Mar, 2010

6 commits

  • Spread p->sighand->siglock locking scope to make sure that
    fastpath_timer_check() never iterates over all threads. Without
    locking there is small possibility that signal->cputimer will stop
    running while we write values to signal->cputime_expires.

    Calling thread_group_cputime() from fastpath_timer_check() is not only
    bad because it is slow, also it is racy with __exit_signal() which can
    lead to invalid signal->{s,u}time values.

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • When user sets up a timer without associated signal and process does
    not use any other cpu timers and does not exit, tsk->signal->cputimer
    is enabled and running forever.

    Avoid running the timer for no reason.

    I used below program to check patch does not break current user space
    visible behavior.

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    void consume_cpu(void)
    {
    int i = 0;
    int count = 0;

    for(i=0; i< 30; i++) {
    consume_cpu();
    memset(&spec, 0, sizeof(spec));
    assert(timer_gettime(tid, &spec) == 0);
    printf("%lu.%09lu\n",
    (unsigned long) spec.it_value.tv_sec,
    (unsigned long) spec.it_value.tv_nsec);
    }

    assert(timer_delete(tid) == 0);
    return 0;
    }

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • According POSIX we need to correctly set old timer it_interval value when
    user request that in timer_settime(). Tested using below program.

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    int main(void)
    {
    struct sigaction act;
    struct sigevent evt = { };
    timer_t tid;
    struct itimerspec spec, u_spec, k_spec;

    evt.sigev_notify = SIGEV_SIGNAL;
    evt.sigev_signo = SIGPROF;
    assert(timer_create(CLOCK_PROCESS_CPUTIME_ID, &evt, &tid) == 0);

    spec.it_value.tv_sec = 1;
    spec.it_value.tv_nsec = 2;
    spec.it_interval.tv_sec = 3;
    spec.it_interval.tv_nsec = 4;
    u_spec = spec;
    assert(timer_settime(tid, 0, &spec, NULL) == 0);

    spec.it_value.tv_sec = 5;
    spec.it_value.tv_nsec = 6;
    spec.it_interval.tv_sec = 7;
    spec.it_interval.tv_nsec = 8;
    assert(timer_settime(tid, 0, &spec, &k_spec) == 0);

    #define PRT(val) printf(#val ":\t%d/%d\n", (int) u_spec.val, (int) k_spec.val)
    PRT(it_value.tv_sec);
    PRT(it_value.tv_nsec);
    PRT(it_interval.tv_sec);
    PRT(it_interval.tv_nsec);

    return 0;
    }

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • Let always set signal->cputime_expires expiration cache when setting
    new itimer, POSIX 1.b timer, and RLIMIT_CPU. Since we are
    initializing prof_exp expiration cache during fork(), this allows to
    remove "RLIMIT_CPU != inf" check from fastpath_timer_check() and do
    some other cleanups.

    Checked against regression using test cases from:
    http://marc.info/?l=linux-kernel&m=123749066504641&w=4
    http://marc.info/?l=linux-kernel&m=123811277916642&w=2

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • When a process deletes cpu timer or a timer expires we do not clear
    the expiration cache sig->cputimer_expires.

    As a result the fastpath_timer_check() which prevents us to loop over
    all threads in case no timer is active is not working and we run the
    slow path needlessly on every tick.

    Zero sig->cputimer_expires in stop_process_timers().

    Signed-off-by: Stanislaw Gruszka
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Hidetoshi Seto
    Cc: Spencer Candland
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     

07 Mar, 2010

2 commits

  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Fetch rlimit (both hard and soft) values only once and work on them. It
    removes many accesses through sig structure and makes the code cleaner.

    Mostly a preparation for writable resource limits support.

    Signed-off-by: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

18 Nov, 2009

1 commit


29 Aug, 2009

2 commits

  • Add tracepoints for all itimer variants: ITIMER_REAL, ITIMER_VIRTUAL
    and ITIMER_PROF.

    [ tglx: Fixed comments and made the output more readable, parseable
    and consistent. Replaced pid_vnr by pid_nr because the hrtimer
    callback can happen in any namespace ]

    Signed-off-by: Xiao Guangrong
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Mathieu Desnoyers
    Cc: Anton Blanchard
    Cc: Peter Zijlstra
    Cc: KOSAKI Motohiro
    Cc: Zhaolei
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Xiao Guangrong
     
  • Merge reason: timer tracepoint patches depend on both branches

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

09 Aug, 2009

1 commit

  • When the process exits we don't have to run new cputimer nor
    use running one (as it not accounts when tsk->exit_state != 0)
    to get process CPU times. As there is only one thread we can
    just use CPU times fields from task and signal structs.

    Signed-off-by: Stanislaw Gruszka
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Cc: Vitaly Mayatskikh
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Stanislaw Gruszka
     

03 Aug, 2009

4 commits

  • For powerpc with CONFIG_VIRT_CPU_ACCOUNTING
    jiffies_to_cputime(1) is not compile time constant and run time
    calculations are quite expensive. To optimize we use
    precomputed value. For all other architectures is is
    preprocessor definition.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Peter Zijlstra
    Acked-by: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stanislaw Gruszka
     
  • Don't update values in expiration cache when new ones are
    equal. Add expire_le() and expire_gt() helpers to simplify the
    code.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Peter Zijlstra
    Acked-by: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stanislaw Gruszka
     
  • Measure ITIMER_PROF and ITIMER_VIRT timers interval error
    between real ticks and requested by user. Take it into account
    when scheduling next tick.

    This patch introduce possibility where time between two
    consecutive tics is smaller then requested interval, it
    preserve however dependency that n tick is generated not
    earlier than n*interval time - counting from the beginning of
    periodic signal generation.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Peter Zijlstra
    Acked-by: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stanislaw Gruszka
     
  • Both cpu itimers have same data flow in the few places, this
    patch make unification of code related with VIRT and PROF
    itimers.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Peter Zijlstra
    Acked-by: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stanislaw Gruszka
     

30 Apr, 2009

1 commit


10 Apr, 2009

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: do not count frozen tasks toward load
    sched: refresh MAINTAINERS entry
    sched: Print sched_group::__cpu_power in sched_domain_debug
    cpuacct: add per-cgroup utime/stime statistics
    posixtimers, sched: Fix posix clock monotonicity
    sched_rt: don't allocate cpumask in fastpath
    cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2

    Linus Torvalds
     

08 Apr, 2009

2 commits

  • update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case
    when we already have CPUCLOCK_PROF timer which should expire first. But it
    uses cputime_lt() instead of cputime_gt().

    Test case:

    int main(void)
    {
    struct itimerval it = {
    .it_value = { .tv_sec = 1000 },
    };

    assert(!setitimer(ITIMER_PROF, &it, NULL));

    struct rlimit rl = {
    .rlim_cur = 1,
    .rlim_max = 1,
    };

    assert(!setrlimit(RLIMIT_CPU, &rl));

    for (;;)
    ;

    return 0;
    }

    Without this patch, the task is not killed as RLIMIT_CPU demands.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Cc: Peter Lojkin
    Cc: Roland McGrath
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Merge reason: update to latest upstream to queue up fix

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 Apr, 2009

1 commit

  • Impact: Regression fix (against clock_gettime() backwarding bug)

    This patch re-introduces a couple of functions, task_sched_runtime
    and thread_group_sched_runtime, which was once removed at the
    time of 2.6.28-rc1.

    These functions protect the sampling of thread/process clock with
    rq lock. This rq lock is required not to update rq->clock during
    the sampling.

    i.e.
    The clock_gettime() may return
    ((accounted runtime before update) + (delta after update))
    that is less than what it should be.

    v2 -> v3:
    - Rename static helper function __task_delta_exec()
    to do_task_delta_exec() since -tip tree already has
    a __task_delta_exec() of different version.

    v1 -> v2:
    - Revises comments of function and patch description.
    - Add note about accuracy of thread group's runtime.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: stable@kernel.org [2.6.28.x][2.6.29.x]
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     

24 Mar, 2009

1 commit

  • See http://bugzilla.kernel.org/show_bug.cgi?id=12911

    copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
    posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
    fastpath_timer_check() returns false unless we have other cpu timers.

    This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
    optimal, we need further cleanups here. With this patch update_rlimit_cpu()
    is not really needed, but I don't think it should be removed.

    The proper fix (I think) is:

    - set_process_cpu_timer() should just start the cputimer->running
    logic (it does), no need to change cputime_expires.xxx_exp

    - posix_cpu_timers_init_group() should set ->running when needed

    - fastpath_timer_check() can check ->running instead of
    task_cputime_zero(signal->cputime_expires)

    Reported-by: Peter Lojkin
    Signed-off-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Cc: [for 2.6.29.x]
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

13 Feb, 2009

1 commit

  • While reviewing the manpages, I noticed I'd missed some clock vs timer sites.

    Make sure that all timer functions call cpu_timer_sample_group() and not
    cpu_clock_sample_group(). This ensures that we enable the process wide timer
    in time, and therefore pay the O(n) thread group cost from the syscall.

    Not doing it here, will result in the first jiffy tick after setting the timer
    doing this, resulting in a very expensive tick (but only once) and a delay in
    actually starting the timer.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Feb, 2009

2 commits


05 Feb, 2009

1 commit

  • Change the process wide cpu timers/clocks so that we:

    1) don't mess up the kernel with too many threads,
    2) don't have a per-cpu allocation for each process,
    3) have no impact when not used.

    In order to accomplish this we're going to split it into two parts:

    - clocks; which can take all the time they want since they run
    from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)

    - timers; which need constant time sampling but since they're
    explicity used, the user can pay the overhead.

    The clock readout will go back to a full sum of the thread group, while the
    timers will run of a global 'clock' that only runs when needed, so only
    programs that make use of the facility pay the price.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: Ingo Molnar
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Jan, 2009

1 commit

  • Either we bounce once cacheline per cpu per tick, yielding n^2 bounces
    or we just bounce a single..

    Also, using per-cpu allocations for the thread-groups complicates the
    per-cpu allocator in that its currently aimed to be a fixed sized
    allocator and the only possible extention to that would be vmap based,
    which is seriously constrained on 32 bit archs.

    So making the per-cpu memory requirement depend on the number of
    processes is an issue.

    Lastly, it didn't deal with cpu-hotplug, although admittedly that might
    be fixable.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Dec, 2008

1 commit


24 Nov, 2008

1 commit


17 Nov, 2008

2 commits

  • Impact: simplify the code

    thread_group_cputime() is called by current when it must have the valid
    ->signal, or under ->siglock, or under tasklist_lock after the ->signal
    check, or the caller is wait_task_zombie() which reaps the child. In any
    case ->signal can't be NULL.

    But the point of this patch is not optimization. If it is possible to call
    thread_group_cputime() when ->signal == NULL we are doing something wrong,
    and we should not mask the problem. thread_group_cputime() fills *times
    and the caller will use it, if we silently use task_struct->*times* we
    report the wrong values.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     
  • Impact: fix potential NULL dereference

    Contrary to ad474caca3e2a0550b7ce0706527ad5ab389a4d4 changelog, other
    acct_group_xxx() helpers can be called after exit_notify() by timer tick.
    Thanks to Roland for pointing out this. Somehow I missed this simple fact
    when I read the original patch, and I am afraid I confused Frank during
    the discussion. Sorry.

    Fortunately, these helpers work with current, we can check ->exit_state
    to ensure that ->signal can't go away under us.

    Also, add the comment and compiler barrier to account_group_exec_runtime(),
    to make sure we load ->signal only once.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar

    Oleg Nesterov