02 Dec, 2011

3 commits


23 Nov, 2011

2 commits

  • The current code checks if abs(delta_delta.tv_sec) is greater or
    equal to two before it discards the old delta value, but this can
    trigger at close to -1 seconds since -1.000000001 seconds is stored
    as tv_sec -2 and tv_nsec 999999999 in a normalized timespec.

    rtc_resume had an early return check if the rtc value had not changed
    since rtc_suspend. This effectivly stops time for the duration of the
    short sleep. Check if sleep_time is positive after all the adjustments
    have been applied instead since this allows the old_system adjustment
    in rtc_suspend to have an effect even for short sleep cycles.

    CC: stable@kernel.org
    Signed-off-by: Arve Hjønnevåg
    Signed-off-by: John Stultz

    Arve Hjønnevåg
     
  • Currently, the RTC code does not disable the alarm in the hardware.

    This means that after a sequence such as the one below (the files are in the
    RTC sysfs), the box will boot up after 2 minutes even though we've
    asked for the alarm to be turned off.

    # echo $((`cat since_epoch`)+120) > wakealarm
    # echo 0 > wakealarm
    # poweroff

    Fix this by disabling the alarm when there are no timers to run.

    Cc: stable@kernel.org
    Cc: John Stultz
    Signed-off-by: Rabin Vincent
    Signed-off-by: John Stultz

    Rabin Vincent
     

19 Nov, 2011

1 commit

  • __remove_hrtimer() attempts to reprogram the clockevent device when
    the timer being removed is the next to expire. However,
    __remove_hrtimer() reprograms the clockevent *before* removing the
    timer from the timerqueue and thus when hrtimer_force_reprogram()
    finds the next timer to expire it finds the timer we're trying to
    remove.

    This is especially noticeable when the system switches to NOHz mode
    and the system tick is removed. The timer tick is removed from the
    system but the clockevent is programmed to wakeup in another HZ
    anyway.

    Silence the extra wakeup by removing the timer from the timerqueue
    before calling hrtimer_force_reprogram() so that we actually program
    the clockevent for the next timer to expire.

    This was broken by 998adc3 "hrtimers: Convert hrtimers to use
    timerlist infrastructure".

    Signed-off-by: Jeff Ohlstein
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
    Signed-off-by: Thomas Gleixner

    Jeff Ohlstein
     

18 Nov, 2011

1 commit

  • ktime_get and ktime_get_ts were calling timekeeping_get_ns()
    but later they were not calling arch_gettimeoffset() so architectures
    using this mechanism returned 0 ns when calling these functions.

    This happened for example when running Busybox's ping which calls
    syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
    calls ktime_get. As a result the returned ping travel time was zero.

    CC: stable@kernel.org
    Signed-off-by: Hector Palacios
    Signed-off-by: John Stultz

    Hector Palacios
     

11 Nov, 2011

2 commits

  • …tz/linux into timers/core

    Conflicts:
    kernel/time/timekeeping.c

    Ingo Molnar
     
  • For some frequencies, the clocks_calc_mult_shift() function will
    unfortunately select mult values very close to 0xffffffff. This
    has the potential to overflow when NTP adjusts the clock, adding
    to the mult value.

    This patch adds a clocksource.maxadj value, which provides
    an approximation of an 11% adjustment(NTP limits adjustments to
    500ppm and the tick adjustment is limited to 10%), which could
    be made to the clocksource.mult value. This is then used to both
    check that the current mult value won't overflow/underflow, as
    well as warning us if the timekeeping_adjust() code pushes over
    that 11% boundary.

    v2: Fix max_adjustment calculation, and improve WARN_ONCE
    messages.

    v3: Don't warn before maxadj has actually been set

    CC: Yong Zhang
    CC: David Daney
    CC: Thomas Gleixner
    CC: Chen Jie
    CC: zhangfx
    CC: stable@kernel.org
    Reported-by: Chen Jie
    Reported-by: zhangfx
    Tested-by: Yong Zhang
    Signed-off-by: John Stultz

    John Stultz
     

28 Oct, 2011

1 commit

  • After getting a number of questions in private emails about the
    math around admittedly very complex timekeeping_adjust() and
    timekeeping_big_adjust(), I figure the code needs some better
    comments.

    Hopefully the explanations are clear enough and don't muddy the
    water any worse.

    Still needs documentation for ntp_error, but I couldn't recall
    exactly the full explanation behind the code that's there
    (although I do recall once working it out when Roman first
    proposed it). Given a bit more time I can probably work it out,
    but I don't want to hold back this documentation until then.

    Signed-off-by: John Stultz
    Cc: Chen Jie
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1319764362-32367-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     

12 Oct, 2011

1 commit

  • "s390: Use direct ktime path for s390 clockevent device" in linux-next
    introduces this compile warning:

    arch/s390/kernel/time.c: In function 's390_next_ktime':
    arch/s390/kernel/time.c:118:2: warning:
    comparison of distinct pointer types lacks a cast [enabled by default]

    Just use a u64 instead of an s64 variable. This is not a problem since it
    will always contain a positive value.

    Signed-off-by: Heiko Carstens
    Cc: Martin Schwidefsky
    Link: http://lkml.kernel.org/r/1316675957-5538-1-git-send-email-heiko.carstens@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Heiko Carstens
     

05 Oct, 2011

2 commits


21 Sep, 2011

1 commit

  • The parameter's origin type is long. On an i386 architecture, it can
    easily be larger than 0x80000000, causing this function to convert it
    to a sign-extended u64 type.

    Change the type to unsigned long so we get the correct result.

    Signed-off-by: hank
    Cc: John Stultz
    Cc:
    [ build fix ]
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    hank
     

14 Sep, 2011

1 commit


13 Sep, 2011

1 commit

  • KGDB needs to trylock watchdog_lock when trying to reset the
    clocksource watchdog after the system has been stopped to avoid a
    potential deadlock. When the trylock fails TSC usually becomes
    unstable.

    We can be more clever by using an atomic counter and checking it in
    the clocksource_watchdog callback. We restart the watchdog whenever
    the counter is > 0 and only decrement the counter when we ran through
    a full update cycle.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Acked-by: Jason Wessel
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109121326280.2723@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

08 Sep, 2011

9 commits

  • David reported:

    Attached below is a watered-down version of rt/tst-cpuclock2.c from
    GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
    similar.

    Run it several times, and you will see cases where the main thread
    will measure a process clock difference before and after the nanosleep
    which is smaller than the cpu-burner thread's individual thread clock
    difference. This doesn't make any sense since the cpu-burner thread
    is part of the top-level process's thread group.

    I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
    64-bit binaries).

    For example:

    [davem@boricha build-x86_64-linux]$ ./test
    process: before(0.001221967) after(0.498624371) diff(497402404)
    thread: before(0.000081692) after(0.498316431) diff(498234739)
    self: before(0.001223521) after(0.001240219) diff(16698)
    [davem@boricha build-x86_64-linux]$

    The diff of 'process' should always be >= the diff of 'thread'.

    I make sure to wrap the 'thread' clock measurements the most tightly
    around the nanosleep() call, and that the 'process' clock measurements
    are the outer-most ones.

    ---
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static pthread_barrier_t barrier;

    static void *chew_cpu(void *arg)
    {
    pthread_barrier_wait(&barrier);
    while (1)
    __asm__ __volatile__("" : : : "memory");
    return NULL;
    }

    int main(void)
    {
    clockid_t process_clock, my_thread_clock, th_clock;
    struct timespec process_before, process_after;
    struct timespec me_before, me_after;
    struct timespec th_before, th_after;
    struct timespec sleeptime;
    unsigned long diff;
    pthread_t th;
    int err;

    err = clock_getcpuclockid(0, &process_clock);
    if (err)
    return 1;

    err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
    if (err)
    return 1;

    pthread_barrier_init(&barrier, NULL, 2);
    err = pthread_create(&th, NULL, chew_cpu, NULL);
    if (err)
    return 1;

    err = pthread_getcpuclockid(th, &th_clock);
    if (err)
    return 1;

    pthread_barrier_wait(&barrier);

    err = clock_gettime(process_clock, &process_before);
    if (err)
    return 1;

    err = clock_gettime(my_thread_clock, &me_before);
    if (err)
    return 1;

    err = clock_gettime(th_clock, &th_before);
    if (err)
    return 1;

    sleeptime.tv_sec = 0;
    sleeptime.tv_nsec = 500000000;
    nanosleep(&sleeptime, NULL);

    err = clock_gettime(th_clock, &th_after);
    if (err)
    return 1;

    err = clock_gettime(my_thread_clock, &me_after);
    if (err)
    return 1;

    err = clock_gettime(process_clock, &process_after);
    if (err)
    return 1;

    diff = process_after.tv_nsec - process_before.tv_nsec;
    printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
    process_before.tv_sec, process_before.tv_nsec,
    process_after.tv_sec, process_after.tv_nsec, diff);
    diff = th_after.tv_nsec - th_before.tv_nsec;
    printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
    th_before.tv_sec, th_before.tv_nsec,
    th_after.tv_sec, th_after.tv_nsec, diff);
    diff = me_after.tv_nsec - me_before.tv_nsec;
    printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
    me_before.tv_sec, me_before.tv_nsec,
    me_after.tv_sec, me_after.tv_nsec, diff);

    return 0;
    }

    This is due to us using p->se.sum_exec_runtime in
    thread_group_cputime() where we iterate the thread group and sum all
    data. This does not take time since the last schedule operation (tick
    or otherwise) into account. We can cure this by using
    task_sched_runtime() at the cost of having to take locks.

    This also means we can (and must) do away with
    thread_group_sched_runtime() since the modified thread_group_cputime()
    is now more accurate and would deadlock when called from
    thread_group_sched_runtime().

    Reported-by: David Miller
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • The clock comparator on s390 uses the same format as the TOD clock.
    If the value in the clock comparator is smaller than the current TOD
    value an interrupt is pending. Use the CLOCK_EVT_FEAT_KTIME feature
    to get the unmodified ktime of the next clockevent expiration and
    use it to program the clock comparator without querying the TOD clock.

    Signed-off-by: Martin Schwidefsky
    Cc: john stultz
    Link: http://lkml.kernel.org/r/20110823133143.153017933@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     
  • There is at least one architecture (s390) with a sane clockevent device
    that can be programmed with the equivalent of a ktime. No need to create
    a delta against the current time, the ktime can be used directly.

    A new clock device function 'set_next_ktime' is introduced that is called
    with the unmodified ktime for the timer if the clock event device has the
    CLOCK_EVT_FEAT_KTIME bit set.

    Signed-off-by: Martin Schwidefsky
    Cc: john stultz
    Link: http://lkml.kernel.org/r/20110823133142.815350967@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     
  • The automatic increase of the min_delta_ns of a clockevents device
    should be done in the clockevents code as the minimum delay is an
    attribute of the clockevents device.

    In addition not all architectures want the automatic adjustment, on a
    massively virtualized system it can happen that the programming of a
    clock event fails several times in a row because the virtual cpu has
    been rescheduled quickly enough. In that case the minimum delay will
    erroneously be increased with no way back. The new config symbol
    GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
    adjustment. The config option is selected only for x86.

    Signed-off-by: Martin Schwidefsky
    Cc: john stultz
    Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     
  • When performing cpu hotplug tests the kernel printk log buffer gets flooded
    with pointless "Switched to NOHz mode..." messages. Especially when afterwards
    analyzing a dump this might have removed more interesting stuff out of the
    buffer.
    Assuming that switching to NOHz mode simply works just remove the printk.

    Signed-off-by: Heiko Carstens
    Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com
    Signed-off-by: Thomas Gleixner

    Heiko Carstens
     
  • show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
    statistics when priting information about idle and iowait times.
    This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
    counters are updated periodically.
    With NO_HZ things got more tricky because we are not doing idle/iowait
    accounting while we are tickless so the value might get outdated.
    Users of /proc/stat will notice that by unchanged idle/iowait values
    which is then interpreted as 0% idle/iowait time. From the user space
    POV this is an unexpected behavior and a change of the interface.

    Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
    total idle/iowait time since boot and it doesn't rely on sampling or any
    other periodic activity. Fall back to the previous behavior if NO_HZ is
    disabled or not configured.

    Signed-off-by: Michal Hocko
    Cc: Dave Jones
    Cc: Arnd Bergmann
    Cc: Alexey Dobriyan
    Link: http://lkml.kernel.org/r/39181366adac1b39cb6aa3cd53ff0f7c78d32676.1314172057.git.mhocko@suse.cz
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     
  • get_cpu_{idle,iowait}_time_us update idle/iowait counters
    unconditionally if the given CPU is in the idle loop.

    This doesn't work well outside of CPU governors which are singletons
    so nobody (except for IRQ) can race with them.

    We will need to use both functions from /proc/stat handler to properly
    handle nohz idle/iowait times.

    Make the update depend on a non NULL last_update_time argument.

    Signed-off-by: Michal Hocko
    Cc: Dave Jones
    Cc: Arnd Bergmann
    Cc: Alexey Dobriyan
    Link: http://lkml.kernel.org/r/11f23179472635ce52e78921d47a20216b872f23.1314172057.git.mhocko@suse.cz
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     
  • update_ts_time_stat currently updates idle time even if we are in
    iowait loop at the moment. The only real users of the idle counter
    (via get_cpu_idle_time_us) are CPU governors and they expect to get
    cumulative time for both idle and iowait times.
    The value (idle_sleeptime) is also printed to userspace by print_cpu
    but it prints both idle and iowait times so the idle part is misleading.

    Let's clean this up and fix update_ts_time_stat to account both counters
    properly and update consumers of idle to consider iowait time as well.
    If we do this we might use get_cpu_{idle,iowait}_time_us from other
    contexts as well and we will get expected values.

    Signed-off-by: Michal Hocko
    Cc: Dave Jones
    Cc: Arnd Bergmann
    Cc: Alexey Dobriyan
    Link: http://lkml.kernel.org/r/e9c909c221a8da402c4da07e4cd968c3218f8eb1.1314172057.git.mhocko@suse.cz
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     
  • Get rid of semicolon so that those expressions can be used also
    somewhere else than just in an assignment.

    Signed-off-by: Michal Hocko
    Acked-by: Arnd Bergmann
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Link: http://lkml.kernel.org/r/7565417ce30d7e6b1ddc169843af0777dbf66e75.1314172057.git.mhocko@suse.cz
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     

11 Aug, 2011

9 commits


10 Aug, 2011

2 commits


08 Aug, 2011

4 commits