02 Dec, 2011
3 commits
-
If a device is shutdown, then there might be a pending interrupt,
which will be processed after we reenable interrupts, which causes the
original handler to be run. If the old handler is the (broadcast)
periodic handler the shutdown state might hang the kernel completely.Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org -
When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org -
In order to leave a margin of 12.5% we should >> 3 not >> 5.
CC: stable@kernel.org
Signed-off-by: Yang Honggang (Joseph)
[jstultz: Modified commit subject]
Signed-off-by: John Stultz
23 Nov, 2011
2 commits
-
The current code checks if abs(delta_delta.tv_sec) is greater or
equal to two before it discards the old delta value, but this can
trigger at close to -1 seconds since -1.000000001 seconds is stored
as tv_sec -2 and tv_nsec 999999999 in a normalized timespec.rtc_resume had an early return check if the rtc value had not changed
since rtc_suspend. This effectivly stops time for the duration of the
short sleep. Check if sleep_time is positive after all the adjustments
have been applied instead since this allows the old_system adjustment
in rtc_suspend to have an effect even for short sleep cycles.CC: stable@kernel.org
Signed-off-by: Arve Hjønnevåg
Signed-off-by: John Stultz -
Currently, the RTC code does not disable the alarm in the hardware.
This means that after a sequence such as the one below (the files are in the
RTC sysfs), the box will boot up after 2 minutes even though we've
asked for the alarm to be turned off.# echo $((`cat since_epoch`)+120) > wakealarm
# echo 0 > wakealarm
# poweroffFix this by disabling the alarm when there are no timers to run.
Cc: stable@kernel.org
Cc: John Stultz
Signed-off-by: Rabin Vincent
Signed-off-by: John Stultz
19 Nov, 2011
1 commit
-
__remove_hrtimer() attempts to reprogram the clockevent device when
the timer being removed is the next to expire. However,
__remove_hrtimer() reprograms the clockevent *before* removing the
timer from the timerqueue and thus when hrtimer_force_reprogram()
finds the next timer to expire it finds the timer we're trying to
remove.This is especially noticeable when the system switches to NOHz mode
and the system tick is removed. The timer tick is removed from the
system but the clockevent is programmed to wakeup in another HZ
anyway.Silence the extra wakeup by removing the timer from the timerqueue
before calling hrtimer_force_reprogram() so that we actually program
the clockevent for the next timer to expire.This was broken by 998adc3 "hrtimers: Convert hrtimers to use
timerlist infrastructure".Signed-off-by: Jeff Ohlstein
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
Signed-off-by: Thomas Gleixner
18 Nov, 2011
1 commit
-
ktime_get and ktime_get_ts were calling timekeeping_get_ns()
but later they were not calling arch_gettimeoffset() so architectures
using this mechanism returned 0 ns when calling these functions.This happened for example when running Busybox's ping which calls
syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
calls ktime_get. As a result the returned ping travel time was zero.CC: stable@kernel.org
Signed-off-by: Hector Palacios
Signed-off-by: John Stultz
11 Nov, 2011
2 commits
-
…tz/linux into timers/core
Conflicts:
kernel/time/timekeeping.c -
For some frequencies, the clocks_calc_mult_shift() function will
unfortunately select mult values very close to 0xffffffff. This
has the potential to overflow when NTP adjusts the clock, adding
to the mult value.This patch adds a clocksource.maxadj value, which provides
an approximation of an 11% adjustment(NTP limits adjustments to
500ppm and the tick adjustment is limited to 10%), which could
be made to the clocksource.mult value. This is then used to both
check that the current mult value won't overflow/underflow, as
well as warning us if the timekeeping_adjust() code pushes over
that 11% boundary.v2: Fix max_adjustment calculation, and improve WARN_ONCE
messages.v3: Don't warn before maxadj has actually been set
CC: Yong Zhang
CC: David Daney
CC: Thomas Gleixner
CC: Chen Jie
CC: zhangfx
CC: stable@kernel.org
Reported-by: Chen Jie
Reported-by: zhangfx
Tested-by: Yong Zhang
Signed-off-by: John Stultz
28 Oct, 2011
1 commit
-
After getting a number of questions in private emails about the
math around admittedly very complex timekeeping_adjust() and
timekeeping_big_adjust(), I figure the code needs some better
comments.Hopefully the explanations are clear enough and don't muddy the
water any worse.Still needs documentation for ntp_error, but I couldn't recall
exactly the full explanation behind the code that's there
(although I do recall once working it out when Roman first
proposed it). Given a bit more time I can probably work it out,
but I don't want to hold back this documentation until then.Signed-off-by: John Stultz
Cc: Chen Jie
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1319764362-32367-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar
12 Oct, 2011
1 commit
-
"s390: Use direct ktime path for s390 clockevent device" in linux-next
introduces this compile warning:arch/s390/kernel/time.c: In function 's390_next_ktime':
arch/s390/kernel/time.c:118:2: warning:
comparison of distinct pointer types lacks a cast [enabled by default]Just use a u64 instead of an s64 variable. This is not a problem since it
will always contain a positive value.Signed-off-by: Heiko Carstens
Cc: Martin Schwidefsky
Link: http://lkml.kernel.org/r/1316675957-5538-1-git-send-email-heiko.carstens@de.ibm.com
Signed-off-by: Thomas Gleixner
05 Oct, 2011
2 commits
-
The clocksource name should be const for correctness.
Cc: John Stultz
Signed-off-by: Jamie Iles
Signed-off-by: John Stultz -
Awhile back I removed all the CONFIG_GENERIC_TIME referecnes as
the last of the non-GENERIC_TIME arches were converted.However, due to the functionality being important and around for
awhile, there apparently were some out of tree hardware enablement
patches that used it and have since been merged.This patch removes the remaining instances of GENERIC_TIME.
Singed-off-by: John Stultz
21 Sep, 2011
1 commit
-
The parameter's origin type is long. On an i386 architecture, it can
easily be larger than 0x80000000, causing this function to convert it
to a sign-extended u64 type.Change the type to unsigned long so we get the correct result.
Signed-off-by: hank
Cc: John Stultz
Cc:
[ build fix ]
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
14 Sep, 2011
1 commit
-
commit 8bc0daf (alarmtimers: Rework RTC device selection using class
interface) did not implement required error checks. Add them.Signed-off-by: Thomas Gleixner
13 Sep, 2011
1 commit
-
KGDB needs to trylock watchdog_lock when trying to reset the
clocksource watchdog after the system has been stopped to avoid a
potential deadlock. When the trylock fails TSC usually becomes
unstable.We can be more clever by using an atomic counter and checking it in
the clocksource_watchdog callback. We restart the watchdog whenever
the counter is > 0 and only decrement the counter when we ran through
a full update cycle.Signed-off-by: Thomas Gleixner
Cc: John Stultz
Acked-by: Jason Wessel
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109121326280.2723@ionos
Signed-off-by: Thomas Gleixner
08 Sep, 2011
9 commits
-
David reported:
Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).For example:
[davem@boricha build-x86_64-linux]$ ./test
process: before(0.001221967) after(0.498624371) diff(497402404)
thread: before(0.000081692) after(0.498316431) diff(498234739)
self: before(0.001223521) after(0.001240219) diff(16698)
[davem@boricha build-x86_64-linux]$The diff of 'process' should always be >= the diff of 'thread'.
I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.---
#include
#include
#include
#include
#include
#include
#include
#includestatic pthread_barrier_t barrier;
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1)
__asm__ __volatile__("" : : : "memory");
return NULL;
}int main(void)
{
clockid_t process_clock, my_thread_clock, th_clock;
struct timespec process_before, process_after;
struct timespec me_before, me_after;
struct timespec th_before, th_after;
struct timespec sleeptime;
unsigned long diff;
pthread_t th;
int err;err = clock_getcpuclockid(0, &process_clock);
if (err)
return 1;err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
if (err)
return 1;pthread_barrier_init(&barrier, NULL, 2);
err = pthread_create(&th, NULL, chew_cpu, NULL);
if (err)
return 1;err = pthread_getcpuclockid(th, &th_clock);
if (err)
return 1;pthread_barrier_wait(&barrier);
err = clock_gettime(process_clock, &process_before);
if (err)
return 1;err = clock_gettime(my_thread_clock, &me_before);
if (err)
return 1;err = clock_gettime(th_clock, &th_before);
if (err)
return 1;sleeptime.tv_sec = 0;
sleeptime.tv_nsec = 500000000;
nanosleep(&sleeptime, NULL);err = clock_gettime(th_clock, &th_after);
if (err)
return 1;err = clock_gettime(my_thread_clock, &me_after);
if (err)
return 1;err = clock_gettime(process_clock, &process_after);
if (err)
return 1;diff = process_after.tv_nsec - process_before.tv_nsec;
printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
diff = th_after.tv_nsec - th_before.tv_nsec;
printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
diff = me_after.tv_nsec - me_before.tv_nsec;
printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);return 0;
}This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().Reported-by: David Miller
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner -
The clock comparator on s390 uses the same format as the TOD clock.
If the value in the clock comparator is smaller than the current TOD
value an interrupt is pending. Use the CLOCK_EVT_FEAT_KTIME feature
to get the unmodified ktime of the next clockevent expiration and
use it to program the clock comparator without querying the TOD clock.Signed-off-by: Martin Schwidefsky
Cc: john stultz
Link: http://lkml.kernel.org/r/20110823133143.153017933@de.ibm.com
Signed-off-by: Thomas Gleixner -
There is at least one architecture (s390) with a sane clockevent device
that can be programmed with the equivalent of a ktime. No need to create
a delta against the current time, the ktime can be used directly.A new clock device function 'set_next_ktime' is introduced that is called
with the unmodified ktime for the timer if the clock event device has the
CLOCK_EVT_FEAT_KTIME bit set.Signed-off-by: Martin Schwidefsky
Cc: john stultz
Link: http://lkml.kernel.org/r/20110823133142.815350967@de.ibm.com
Signed-off-by: Thomas Gleixner -
The automatic increase of the min_delta_ns of a clockevents device
should be done in the clockevents code as the minimum delay is an
attribute of the clockevents device.In addition not all architectures want the automatic adjustment, on a
massively virtualized system it can happen that the programming of a
clock event fails several times in a row because the virtual cpu has
been rescheduled quickly enough. In that case the minimum delay will
erroneously be increased with no way back. The new config symbol
GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
adjustment. The config option is selected only for x86.Signed-off-by: Martin Schwidefsky
Cc: john stultz
Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
Signed-off-by: Thomas Gleixner -
When performing cpu hotplug tests the kernel printk log buffer gets flooded
with pointless "Switched to NOHz mode..." messages. Especially when afterwards
analyzing a dump this might have removed more interesting stuff out of the
buffer.
Assuming that switching to NOHz mode simply works just remove the printk.Signed-off-by: Heiko Carstens
Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com
Signed-off-by: Thomas Gleixner -
show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
statistics when priting information about idle and iowait times.
This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
counters are updated periodically.
With NO_HZ things got more tricky because we are not doing idle/iowait
accounting while we are tickless so the value might get outdated.
Users of /proc/stat will notice that by unchanged idle/iowait values
which is then interpreted as 0% idle/iowait time. From the user space
POV this is an unexpected behavior and a change of the interface.Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
total idle/iowait time since boot and it doesn't rely on sampling or any
other periodic activity. Fall back to the previous behavior if NO_HZ is
disabled or not configured.Signed-off-by: Michal Hocko
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Link: http://lkml.kernel.org/r/39181366adac1b39cb6aa3cd53ff0f7c78d32676.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner -
get_cpu_{idle,iowait}_time_us update idle/iowait counters
unconditionally if the given CPU is in the idle loop.This doesn't work well outside of CPU governors which are singletons
so nobody (except for IRQ) can race with them.We will need to use both functions from /proc/stat handler to properly
handle nohz idle/iowait times.Make the update depend on a non NULL last_update_time argument.
Signed-off-by: Michal Hocko
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Link: http://lkml.kernel.org/r/11f23179472635ce52e78921d47a20216b872f23.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner -
update_ts_time_stat currently updates idle time even if we are in
iowait loop at the moment. The only real users of the idle counter
(via get_cpu_idle_time_us) are CPU governors and they expect to get
cumulative time for both idle and iowait times.
The value (idle_sleeptime) is also printed to userspace by print_cpu
but it prints both idle and iowait times so the idle part is misleading.Let's clean this up and fix update_ts_time_stat to account both counters
properly and update consumers of idle to consider iowait time as well.
If we do this we might use get_cpu_{idle,iowait}_time_us from other
contexts as well and we will get expected values.Signed-off-by: Michal Hocko
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Link: http://lkml.kernel.org/r/e9c909c221a8da402c4da07e4cd968c3218f8eb1.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner -
Get rid of semicolon so that those expressions can be used also
somewhere else than just in an assignment.Signed-off-by: Michal Hocko
Acked-by: Arnd Bergmann
Cc: Dave Jones
Cc: Alexey Dobriyan
Link: http://lkml.kernel.org/r/7565417ce30d7e6b1ddc169843af0777dbf66e75.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner
11 Aug, 2011
9 commits
-
This allows cleaner detection of the RTC device being registered, rather
then probing any time someone calls alarmtimer_get_rtcdev.CC: Thomas Gleixner
Signed-off-by: John Stultz -
There's a number of edge cases when cancelling a alarm, so
to be sure we accurately do so, introduce try_to_cancel, which
returns proper failure errors if it cannot. Also modify cancel
to spin until the alarm is properly disabled.CC: Thomas Gleixner
Signed-off-by: John Stultz -
In order to allow for functionality like try_to_cancel, add
more refined state tracking (similar to hrtimers).CC: Thomas Gleixner
Signed-off-by: John Stultz -
Now that periodic alarmtimers are managed by the handler function,
remove the period value from the alarm structure and let the handlers
manage the interval on their own.CC: Thomas Gleixner
Signed-off-by: John Stultz -
Now that the alarmtimers code has been refactored, the interval
cap limit can be removed.CC: Thomas Gleixner
Signed-off-by: John Stultz -
In order to avoid wasting time expiring and re-adding very high freq
periodic alarmtimers, introduce alarm_forward() which is similar to
hrtimer_forward and moves the timer to the next future expiration time
and returns the number of overruns.CC: Thomas Gleixner
Signed-off-by: John Stultz -
This patch pushes the periodic alarmtimer re-arming down into the alarmtimer
handler, mimicking how hrtimers handle this.CC: Thomas Gleixner
Signed-off-by: John Stultz -
In order to properly fix the denial of service issue with high freq
periodic alarm timers, we need to push the re-arming logic into the
alarm timer handler, much as the hrtimer code does.This patch introduces alarmtimer_restart enum and changes the
alarmtimer handler declarations to use it as a return value. Further,
to ease following changes, it extends the alarmtimer handler functions
to also take the time at expiration. No logic is yet modified.CC: Thomas Gleixner
Signed-off-by: John Stultz -
Its possible to jam up the alarm timers by setting very small interval
timers, which will cause the alarmtimer subsystem to spend all of its time
firing and restarting timers. This can effectivly lock up a box.A deeper fix is needed, closely mimicking the hrtimer code, but for now
just cap the interval to 100us to avoid userland hanging the system.CC: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: John Stultz
10 Aug, 2011
2 commits
-
Following common_timer_get, zero out the itimerspec passed in.
CC: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: John Stultz -
We don't check if old_setting is non null before assigning it, so
correct this.CC: Thomas Gleixner
CC: stable@kernel.org
Signed-off-by: John Stultz
08 Aug, 2011
4 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Fix build with DEBUG_PAGEALLOC enabled. -
Commit d006199e72a9 ("serial: sh-sci: Regtype probing doesn't need to be
fatal.") made sci_init_single() return when sci_probe_regmap() succeeds,
although it should return when sci_probe_regmap() fails. This causes
systems using the serial sh-sci driver to crash during boot.Fix the problem by using the right return condition.
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds -
The generic library code already exports the generic function, this was
left-over from the ARM-specific version that just got removed.Signed-off-by: Linus Torvalds