26 Dec, 2016
2 commits
-
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra -
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
25 Dec, 2016
2 commits
-
There is no point in having an extra type for extra confusion. u64 is
unambiguous.Conversion was done with the following coccinelle script:
@rem@
@@
-typedef u64 cycle_t;@fix@
typedef cycle_t;
@@
-cycle_t
+u64Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: John Stultz -
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
19 Dec, 2016
1 commit
-
Pull timer fix from Thomas Gleixner:
"Prevent NULL pointer dereferencing in the tick broadcast code. Old
bug, which got unearthed by the hotplug ordering problem"* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
15 Dec, 2016
2 commits
-
When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.Reported-by: Mason
Signed-off-by: Thomas Gleixner
Cc: Mark Rutland
Cc: Anna-Maria Gleixner
Cc: Richard Cochran
Cc: Sebastian Andrzej Siewior
Cc: Daniel Lezcano
Cc: Peter Zijlstra ,
Cc: Sebastian Frias
Cc: Thibaud Cornic
Cc: Robin Murphy
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr -
The OpenRISC compiler (so far) fails to optimize away a large portion of
code containing a reference to posix_timer_event in alarmtimer.c when
CONFIG_POSIX_TIMERS is unset. Let's give it a direct clue to let the
build succeed.This fixes
[linux-next:master 6682/7183] alarmtimer.c:undefined reference to `posix_timer_event'
reported by kbuild test robot.Signed-off-by: Nicolas Pitre
Cc: Thomas Gleixner
Cc: Josh TriplettSigned-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Dec, 2016
3 commits
-
Pull timer updates from Thomas Gleixner:
"The time/timekeeping/timer folks deliver with this update:- Fix a reintroduced signed/unsigned issue and cleanup the whole
signed/unsigned mess in the timekeeping core so this wont happen
accidentaly again.- Add a new trace clock based on boot time
- Prevent injection of random sleep times when PM tracing abuses the
RTC for storage- Make posix timers configurable for real tiny systems
- Add tracepoints for the alarm timer subsystem so timer based
suspend wakeups can be instrumented- The usual pile of fixes and updates to core and drivers"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
timekeeping: Use mul_u64_u32_shr() instead of open coding it
timekeeping: Get rid of pointless typecasts
timekeeping: Make the conversion call chain consistently unsigned
timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
alarmtimer: Add tracepoints for alarm timers
trace: Update documentation for mono, mono_raw and boot clock
trace: Add an option for boot clock as trace clock
timekeeping: Add a fast and NMI safe boot clock
timekeeping/clocksource_cyc2ns: Document intended range limitation
timekeeping: Ignore the bogus sleep time if pm_trace is enabled
selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
arm64: dts: rockchip: Arch counter doesn't tick in system suspend
clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
posix-timers: Make them configurable
posix_cpu_timers: Move the add_device_randomness() call to a proper place
timer: Move sys_alarm from timer.c to itimer.c
ptp_clock: Allow for it to be optional
Kconfig: Regenerate *.c_shipped files after previous changes
... -
Pull smp hotplug updates from Thomas Gleixner:
"This is the final round of converting the notifier mess to the state
machine. The removal of the notifiers and the related infrastructure
will happen around rc1, as there are conversions outstanding in other
trees.The whole exercise removed about 2000 lines of code in total and in
course of the conversion several dozen bugs got fixed. The new
mechanism allows to test almost every hotplug step standalone, so
usage sites can exercise all transitions extensively.There is more room for improvement, like integrating all the
pointlessly different architecture mechanisms of synchronizing,
setting cpus online etc into the core code"* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
tracing/rb: Init the CPU mask on allocation
soc/fsl/qbman: Convert to hotplug state machine
soc/fsl/qbman: Convert to hotplug state machine
zram: Convert to hotplug state machine
KVM/PPC/Book3S HV: Convert to hotplug state machine
arm64/cpuinfo: Convert to hotplug state machine
arm64/cpuinfo: Make hotplug notifier symmetric
mm/compaction: Convert to hotplug state machine
iommu/vt-d: Convert to hotplug state machine
mm/zswap: Convert pool to hotplug state machine
mm/zswap: Convert dst-mem to hotplug state machine
mm/zsmalloc: Convert to hotplug state machine
mm/vmstat: Convert to hotplug state machine
mm/vmstat: Avoid on each online CPU loops
mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
tracing/rb: Convert to hotplug state machine
oprofile/nmi timer: Convert to hotplug state machine
net/iucv: Use explicit clean up labels in iucv_init()
x86/pci/amd-bus: Convert to hotplug state machine
x86/oprofile/nmi: Convert to hotplug state machine
... -
Pull scheduler updates from Ingo Molnar:
"The main scheduler changes in this cycle were:- support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a
notion of 'better cores', which the scheduler will prefer to
schedule single threaded workloads on. (Tim Chen, Srinivas
Pandruvada)- enhance the handling of asymmetric capacity CPUs further (Morten
Rasmussen)- improve/fix load handling when moving tasks between task groups
(Vincent Guittot)- simplify and clean up the cputime code (Stanislaw Gruszka)
- improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent
Guittot)- make struct kthread kmalloc()ed and related fixes (Oleg Nesterov)
- add uaccess atomicity debugging (when using access_ok() in the
wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra)- implement various fixes, cleanups and other enhancements (Daniel
Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)"* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
sched/core: Use load_avg for selecting idlest group
sched/core: Fix find_idlest_group() for fork
kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker()
kthread: Don't use to_live_kthread() in kthread_[un]park()
kthread: Don't use to_live_kthread() in kthread_stop()
Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
kthread: Make struct kthread kmalloc'ed
x86/uaccess, sched/preempt: Verify access_ok() context
sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable
sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO
x86/sched: Use #include instead of #include
cpufreq/intel_pstate: Use CPPC to get max performance
acpi/bus: Set _OSC for diverse core support
acpi/bus: Enable HWP CPPC objects
x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
x86/sysctl: Add sysctl for ITMT scheduling feature
x86: Enable Intel Turbo Boost Max Technology 3.0
x86/topology: Define x86's arch_update_cpu_topology
sched: Extend scheduler's asym packing
sched/fair: Clean up the tunable parameter definitions
...
09 Dec, 2016
4 commits
-
The resume code must deal with a clocksource delta which is potentially big
enough to overflow the 64bit mult.Replace the open coded handling with the proper function.
Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: Liav Rehana
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.de
Signed-off-by: Thomas Gleixner -
cycle_t is defined as u64, so casting it to u64 is a pointless and
confusing exercise. cycle_t should simply go away and be replaced with a
plain u64 to avoid further confusion.Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: Liav Rehana
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.de
Signed-off-by: Thomas Gleixner -
Propagating a unsigned value through signed variables and functions makes
absolutely no sense and is just prone to (re)introduce subtle signed
vs. unsigned issues as happened recently.Clean it up.
Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: Liav Rehana
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.de
Signed-off-by: Thomas Gleixner -
The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:- Time jumps backwards
- __iter_div_u64_rem() which is used in one of the calling code pathes
will take forever to piecewise calculate the seconds/nanoseconds part.This has been reported by several people with different scenarios:
David observed that when stopping a VM with a debugger:
"It was essentially the stopped by debugger case. I forget exactly why,
but the guest was being explicitly stopped from outside, it wasn't just
scheduling lag. I think it was something in the vicinity of 10 minutes
stopped."When lifting the stop the machine went dead.
The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.But this was also observed on a live system by Liav:
"When the OS is too overloaded, delta will get a high enough value for the
msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
after the shift the nsec variable will gain a value similar to
0xffffffffff000000."Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:s64 nsec;
- nsec = (d * m + n) >> s:
+ nsec = d * m + n;
+ nsec >>= s;d being type of cycle_t adds another level of obfuscation.
This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson
Reported-by: Liav Rehana
Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: John Stultz
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner
08 Dec, 2016
1 commit
-
The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.Cc: John Stultz
Signed-off-by: Murali Karicheri
Signed-off-by: Grygorii Strashko
Acked-by: Thomas Gleixner
Signed-off-by: David S. Miller
01 Dec, 2016
1 commit
-
Alarm timers are one of the mechanisms to wake up a system from suspend,
but there exist no tracepoints to analyse which process/thread armed an
alarmtimer.Add tracepoints for start/cancel/expire of individual alarm timers and one
for tracing the suspend time decision when to resume the system.The following trace excerpt illustrates the new mechanism:
Binder:3292_2-3304 [000] d..2 149.981123: alarmtimer_cancel:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325463120000000000 now:1325376810370370245Binder:3292_2-3304 [000] d..2 149.981136: alarmtimer_start:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325376840000000000 now:1325376810370384591Binder:3292_9-3953 [000] d..2 150.212991: alarmtimer_cancel:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179552000000 now:150154008122Binder:3292_9-3953 [000] d..2 150.213006: alarmtimer_start:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179551000000 now:150154025622system_server-3000 [002] ...1 162.701940: alarmtimer_suspend:
alarmtimer type:REALTIME expires:1325376840000000000The wakeup time which is selected at suspend time allows to map it back to
the task arming the timer: Binder:3292_2.[ tglx: Store alarm timer expiry time instead of some useless RTC relative
information, add proper type information for wakeups which are
handled via the clock_nanosleep/freezer and massage the changelog. ]Signed-off-by: Baolin Wang
Signed-off-by: John Stultz
Acked-by: Steven Rostedt
Cc: Prarit Bhargava
Cc: Richard Cochran
Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
30 Nov, 2016
1 commit
-
This boot clock can be used as a tracing clock and will account for
suspend time.To keep it NMI safe since we're accessing from tracing, we're not using a
separate timekeeper with updates to monotonic clock and boot offset
protected with seqlocks. This has the following minor side effects:(1) Its possible that a timestamp be taken after the boot offset is updated
but before the timekeeper is updated. If this happens, the new boot offset
is added to the old timekeeping making the clock appear to update slightly
earlier:
CPU 0 CPU 1
timekeeping_inject_sleeptime64()
__timekeeping_inject_sleeptime(tk, delta);
timestamp();
timekeeping_update(tk, TK_CLEAR_NTP...);(2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
partially updated. Since the tk->offs_boot update is a rare event, this
should be a rare occurrence which postprocessing should be able to handle.Signed-off-by: Joel Fernandes
Signed-off-by: John Stultz
Reviewed-by: Thomas Gleixner
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1480372524-15181-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
23 Nov, 2016
1 commit
-
Install the callbacks via the state machine.
Signed-off-by: Sebastian Andrzej Siewior
Cc: rt@linuxtronix.de
Link: http://lkml.kernel.org/r/20161117183541.8588-14-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner
16 Nov, 2016
3 commits
-
Some embedded systems have no use for them. This removes about
25KB from the kernel binary size when configured out.Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.Signed-off-by: Nicolas Pitre
Acked-by: Richard Cochran
Acked-by: Thomas Gleixner
Acked-by: John Stultz
Reviewed-by: Josh Triplett
Cc: Paul Bolle
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek
Cc: Edward Cree
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner -
There is no logical relation between add_device_randomness() and
posix_cpu_timers_exit(). Let's move the former to where the later
is called. This way, when posix-cpu-timers.c is compiled out, there
is no need to worry about not losing a call to add_device_randomness().Signed-off-by: Nicolas Pitre
Acked-by: John Stultz
Cc: Paul Bolle
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran
Cc: Josh Triplett
Cc: Michal Marek
Cc: Edward Cree
Link: http://lkml.kernel.org/r/1478841010-28605-6-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner -
Move the only user of alarm_setitimer to itimer.c where it is defined.
This allows for making alarm_setitimer static, and dropping it from the
build when __ARCH_WANT_SYS_ALARM is not defined.Signed-off-by: Nicolas Pitre
Acked-by: John Stultz
Cc: Paul Bolle
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran
Cc: Josh Triplett
Cc: Michal Marek
Cc: Edward Cree
Link: http://lkml.kernel.org/r/1478841010-28605-5-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner
15 Nov, 2016
1 commit
-
Now since fetch_task_cputime() has no other users than task_cputime(),
its code could be used directly in task_cputime().Moreover since only 2 task_cputime() calls of 17 use a NULL argument,
we can add dummy variables to those calls and remove NULL checks from
task_cputimes().Also remove NULL checks from task_cputimes_scaled().
Signed-off-by: Stanislaw Gruszka
Signed-off-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Heiko Carstens
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar
26 Oct, 2016
2 commits
-
The documentation for schedule_timeout(), schedule_hrtimeout(), and
schedule_hrtimeout_range() all claim that the routines couldn't possibly
return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
not true since wake_up_process() will cause those routines to exit early.We cannot make schedule_[hr]timeout() loop until the timeout expires if the
task state is uninterruptible because we have users which rely on the
existing and designed behaviour.Make the documentation match the (correct) implementation.
schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
woken up. This might look strange, but making the return code depend on the
state is too much of an effort as it would affect all the call sites. There
is no value in doing so, but we spell it out clearly in the documentation.Suggested-by: Daniel Kurtz
Signed-off-by: Douglas Anderson
Cc: huangtao@rock-chips.com
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner -
Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter. However, nothing in the code ensures
this, when the sleeping task is woken by wake_up_process() or any other
mechanism which can wake a task from uninterruptible state.Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
against wakeups. schedule_hrtimeout_range*() is designed this way despite
the fact that the API documentation does not mention it.msleep() already has code to handle this case since it will loop as long
as there was still time left. usleep_range() has no such loop, add it.Presumably this problem was not detected before because usleep_range() is
only used in a few places and the function is mostly used in contexts which
are not exposed to wakeups of any form.An effort was made to look for users relying on the old behavior by
looking for usleep_range() in the same file as wake_up_process().
No problems were found by this search, though it is conceivable that
someone could have put the sleep and wakeup in two different files.An effort was made to ask several upstream maintainers if they were aware
of people relying on wake_up_process() to wake up usleep_range(). No
maintainers were aware of that but they were aware of many people relying
on usleep_range() never returning before the minimum.Reported-by: Tao Huang
Signed-off-by: Douglas Anderson
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz
Cc: djkurtz@chromium.org
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner
25 Oct, 2016
4 commits
-
When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:1) Forwarding a remote base unlocked
The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.Solution is simple: Invoke the forwarding after the target base is locked.
2) Possible corruption due to jiffies advancing
This is similar to the issue in get_net_timer_interrupt() which was fixed
in the previous patch. jiffies can advance between check and assignement
and therefore advancing base->clk beyond the next expiry value.So we need to read jiffies into a local variable once and do the checks and
assignment with the local copy.Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes
Reported-by: Michael Thayer
Signed-off-by: Thomas Gleixner
Cc: Michal Necasek
Cc: Peter Zijlstra
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.253640125@linutronix.de
Signed-off-by: Thomas Gleixner -
Ashton and Michael reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:run_timers()
while (jiffies >= base->clk) {
collect_expired_timers();
base->clk++;
expire_timers();
}So base->clk = jiffies + 1. Now the cpu goes idle:
idle()
get_next_timer_interrupt()
nextevt = __next_time_interrupt();
if (time_after(nextevt, base->clk))
base->clk = jiffies;jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():jiffies = 0
base->clk = 1A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:base->clk + delta = 1 + 63 = 64
The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:if (time_after(jiffies, base->clk) {
if (time_after(nextevt, base->clk))
base->clk = jiffies;jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.Thanks to Ashton for bisecting and providing trace data!
Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes
Reported-by: Michael Thayer
Signed-off-by: Thomas Gleixner
Cc: Michal Necasek
Cc: Peter Zijlstra
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de
Signed-off-by: Thomas Gleixner -
Linus stumbled over the unlocked modification of the timer expiry value in
mod_timer() which is an optimization for timers which stay in the same
bucket - due to the bucket granularity - despite their expiry time getting
updated.The optimization itself still makes sense even if we take the lock, because
in case that the bucket stays the same, we avoid the pointless
queue/enqueue dance.Make the check and the modification of timer->expires protected by the base
lock and shuffle the remaining code around so we can keep the lock held
when we actually have to requeue the timer to a different bucket.Fixes: f00c0afdfa62 ("timers: Implement optimization for same expiry time in mod_timer()")
Reported-by: Linus Torvalds
Signed-off-by: Thomas Gleixner
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton
Cc: Peter Zijlstra -
Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.While this has not been observed (yet), we need to make sure that it never
happens.Fixes: 0eeda71bc30d ("timer: Replace timer base by a cpu index")
Reported-by: Linus Torvalds
Signed-off-by: Thomas Gleixner
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton
Cc: Peter Zijlstra
17 Oct, 2016
1 commit
-
Remove the set but unused variable base in alarm_clock_get to fix the
following warning when building with 'W=1':kernel/time/alarmtimer.c: In function ‘alarm_timer_create’:
kernel/time/alarmtimer.c:545:21: warning: variable ‘base’ set but not used [-Wunused-but-set-variable]Signed-off-by: Tobias Klauser
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161017094702.10873-1-tklauser@distanz.ch
Signed-off-by: Thomas Gleixner
16 Oct, 2016
1 commit
-
Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin
11 Oct, 2016
1 commit
-
The __latent_entropy gcc attribute can be used only on functions and
variables. If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents. The variable must
be an integer, an integer array type or a structure with integer fields.These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.Signed-off-by: Emese Revfy
[kees: expanded commit message]
Signed-off-by: Kees Cook
05 Oct, 2016
1 commit
-
In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.This results in bogus perf timestamps like:
swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])Instead of more reasonable expected timestamps like:
swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg
Reported-by: Alexei Starovoitov
Tested-and-reviewed-by: Mathieu Desnoyers
Signed-off-by: John Stultz
Cc: Peter Zijlstra
Cc: stable
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
13 Sep, 2016
1 commit
-
can_stop_full_tick() has no check for offline cpus. So it allows to stop
the tick on an offline cpu from the interrupt return path, which is wrong
and subsequently makes irq_work_needs_cpu() warn about being called for an
offline cpu.Commit f7ea0fd639c2c4 ("tick: Don't invoke tick_nohz_stop_sched_tick() if
the cpu is offline") added prevention for can_stop_idle_tick(), but forgot
to do the same in can_stop_full_tick(). Add it.[ tglx: Massaged changelog ]
Signed-off-by: Wanpeng Li
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1473245473-4463-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner
08 Sep, 2016
1 commit
-
Signed-off-by: Ingo Molnar
02 Sep, 2016
1 commit
-
tick_nohz_start_idle() is prevented to be called if the idle tick can't
be stopped since commit 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle
enter"). As a result, after suspend/resume the host machine, full dynticks
kvm guest will softlockup:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]
Call Trace:
default_idle+0x31/0x1a0
arch_cpu_idle+0xf/0x20
default_idle_call+0x2a/0x50
cpu_startup_entry+0x39b/0x4d0
rest_init+0x138/0x140
? rest_init+0x5/0x140
start_kernel+0x4c1/0x4ce
? set_init_arg+0x55/0x55
? early_idt_handler_array+0x120/0x120
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x142/0x14fIn addition, cat /proc/stat | grep cpu in guest or host:
cpu 398 16 5049 15754 5490 0 1 46 0 0
cpu0 206 5 450 0 0 0 1 14 0 0
cpu1 81 0 3937 3149 1514 0 0 9 0 0
cpu2 45 6 332 6052 2243 0 0 11 0 0
cpu3 65 2 328 6552 1732 0 0 11 0 0The idle and iowait states are weird 0 for cpu0(housekeeping).
The bug is present in both guest and host kernels, and they both have
cpu0's idle and iowait states issue, however, host kernel's suspend/resume
path etc will touch watchdog to avoid the softlockup.- The watchdog will not be touched in tick_nohz_stop_idle path (need be
touched since the scheduler stall is expected) if idle_active flags are
not detected.
- The idle and iowait states will not be accounted when exit idle loop
(resched or interrupt) if idle start time and idle_active flags are
not set.This patch fixes it by reverting commit 1f3b0f8243cb934 since can't stop
idle tick doesn't mean can't be idle.Fixes: 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle enter")
Signed-off-by: Wanpeng Li
Cc: Sanjeev Yadav
Cc: Gaurav Jindal
Cc: stable@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář
Cc: Peter Zijlstra
Cc: Paolo Bonzini
Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner
01 Sep, 2016
5 commits
-
I ran into this:
================================================================================
UBSAN: Undefined behaviour in kernel/time/hrtimer.c:310:16
signed integer overflow:
9223372036854775807 + 50000 cannot be represented in type 'long long int'
CPU: 2 PID: 4798 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #91
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffff88010ce6fb88 ffffffff82344740 0000000041b58ab3
ffffffff84f97a20 ffffffff82344694 ffff88010ce6fbb0 ffff88010ce6fb60
000000000000c350 ffff88010ce6f968 dffffc0000000000 ffffffff857bc320
Call Trace:
[] dump_stack+0xac/0xfc
[] ? _atomic_dec_and_lock+0xc4/0xc4
[] ubsan_epilogue+0xd/0x8a
[] handle_overflow+0x202/0x23d
[] ? val_to_string.constprop.6+0x11e/0x11e
[] ? timerqueue_add+0x151/0x410
[] ? hrtimer_start_range_ns+0x3b8/0x1380
[] ? memset+0x31/0x40
[] __ubsan_handle_add_overflow+0xe/0x10
[] hrtimer_nanosleep+0x5d9/0x790
[] ? hrtimer_init_sleeper+0x80/0x80
[] ? __might_sleep+0x5b/0x260
[] common_nsleep+0x20/0x30
[] SyS_clock_nanosleep+0x197/0x210
[] ? SyS_clock_getres+0x150/0x150
[] ? __this_cpu_preempt_check+0x13/0x20
[] ? __context_tracking_exit.part.3+0x30/0x1b0
[] ? SyS_clock_getres+0x150/0x150
[] do_syscall_64+0x1b3/0x4b0
[] entry_SYSCALL64_slow_path+0x25/0x25
================================================================================Add a new ktime_add_unsafe() helper which doesn't check for overflow, but
doesn't throw a UBSAN warning when it does overflow either.Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Richard Cochran
Cc: Prarit Bhargava
Signed-off-by: Vegard Nossum
Signed-off-by: John Stultz -
I ran into this:
================================================================================
UBSAN: Undefined behaviour in kernel/time/time.c:783:2
signed integer overflow:
5273 + 9223372036854771711 cannot be represented in type 'long int'
CPU: 0 PID: 17363 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #88
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
04/01/2014
0000000000000000 ffff88011457f8f0 ffffffff82344f50 0000000041b58ab3
ffffffff84f98080 ffffffff82344ea4 ffff88011457f918 ffff88011457f8c8
ffff88011457f8e0 7fffffffffffefff ffff88011457f6d8 dffffc0000000000
Call Trace:
[] dump_stack+0xac/0xfc
[] ? _atomic_dec_and_lock+0xc4/0xc4
[] ubsan_epilogue+0xd/0x8a
[] handle_overflow+0x202/0x23d
[] ? val_to_string.constprop.6+0x11e/0x11e
[] ? debug_smp_processor_id+0x17/0x20
[] ? __sigqueue_free.part.13+0x51/0x70
[] ? rcu_is_watching+0x110/0x110
[] __ubsan_handle_add_overflow+0xe/0x10
[] timespec64_add_safe+0x298/0x340
[] ? timespec_add_safe+0x330/0x330
[] ? wait_noreap_copyout+0x1d0/0x1d0
[] poll_select_set_timeout+0xf8/0x170
[] ? poll_schedule_timeout+0x2b0/0x2b0
[] ? __might_sleep+0x5b/0x260
[] __sys_recvmmsg+0x107/0x790
[] ? SyS_recvmsg+0x20/0x20
[] ? hrtimer_start_range_ns+0x3b8/0x1380
[] ? _raw_spin_unlock_irqrestore+0x3b/0x60
[] ? do_setitimer+0x39a/0x8e0
[] ? __might_sleep+0x5b/0x260
[] ? __sys_recvmmsg+0x790/0x790
[] SyS_recvmmsg+0xd9/0x160
[] ? __sys_recvmmsg+0x790/0x790
[] ? __this_cpu_preempt_check+0x13/0x20
[] ? __context_tracking_exit.part.3+0x30/0x1b0
[] ? __sys_recvmmsg+0x790/0x790
[] do_syscall_64+0x1b3/0x4b0
[] entry_SYSCALL64_slow_path+0x25/0x25
================================================================================Line 783 is this:
783 set_normalized_timespec64(&res, lhs.tv_sec + rhs.tv_sec,
784 lhs.tv_nsec + rhs.tv_nsec);In other words, since lhs.tv_sec and rhs.tv_sec are both time64_t, this
is a signed addition which will cause undefined behaviour on overflow.Note that this is not currently a huge concern since the kernel should be
built with -fno-strict-overflow by default, but could be a problem in the
future, a problem with older compilers, or other compilers than gcc.The easiest way to avoid the overflow is to cast one of the arguments to
unsigned (so the addition will be done using unsigned arithmetic).Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Richard Cochran
Cc: Prarit Bhargava
Signed-off-by: Vegard Nossum
Signed-off-by: John Stultz -
In addition to keeping a histogram of suspend times, also
print out the time spent in suspend to dmesg.This helps to keep track of suspend time while debugging using
kernel logs.Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Richard Cochran
Cc: Prarit Bhargava
Signed-off-by: Ruchi Kandoi
[jstultz: Tweaked commit message]
Signed-off-by: John Stultz -
Clocksources don't get the VALID_FOR_HRES flag until they have been
checked by a watchdog. However, when using an override, the
clocksource_select logic will clear the override value if the
clocksource is not marked VALID_FOR_HRES during that inititial check.
When using the boot arguments clocksource=, this selection can
run before the watchdog, and can cause the override to be incorrectly
cleared.To address this condition, the override_name is only invalidated for
unstable clocksources. Otherwise, the override is left intact until after
the watchdog has validated the clocksource as stable/unstable.Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Martin Schwidefsky
Signed-off-by: Kyle Walker
Signed-off-by: John Stultz -
Fix a minor spelling error.
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Richard Cochran
Cc: Prarit Bhargava
Signed-off-by: Pratyush Patel
[jstultz: Added commit message]
Signed-off-by: John Stultz