15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

07 Jul, 2013

1 commit

  • Pull timer core updates from Thomas Gleixner:
    "The timer changes contain:

    - posix timer code consolidation and fixes for odd corner cases

    - sched_clock implementation moved from ARM to core code to avoid
    duplication by other architectures

    - alarm timer updates

    - clocksource and clockevents unregistration facilities

    - clocksource/events support for new hardware

    - precise nanoseconds RTC readout (Xen feature)

    - generic support for Xen suspend/resume oddities

    - the usual lot of fixes and cleanups all over the place

    The parts which touch other areas (ARM/XEN) have been coordinated with
    the relevant maintainers. Though this results in an handful of
    trivial to solve merge conflicts, which we preferred over nasty cross
    tree merge dependencies.

    The patches which have been committed in the last few days are bug
    fixes plus the posix timer lot. The latter was in akpms queue and
    next for quite some time; they just got forgotten and Frederic
    collected them last minute."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
    hrtimer: Remove unused variable
    hrtimers: Move SMP function call to thread context
    clocksource: Reselect clocksource when watchdog validated high-res capability
    posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
    posix_timers: fix racy timer delta caching on task exit
    posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
    selftests: add basic posix timers selftests
    posix_cpu_timers: consolidate expired timers check
    posix_cpu_timers: consolidate timer list cleanups
    posix_cpu_timer: consolidate expiry time type
    tick: Sanitize broadcast control logic
    tick: Prevent uncontrolled switch to oneshot mode
    tick: Make oneshot broadcast robust vs. CPU offlining
    x86: xen: Sync the CMOS RTC as well as the Xen wallclock
    x86: xen: Sync the wallclock when the system time is set
    timekeeping: Indicate that clock was set in the pvclock gtod notifier
    timekeeping: Pass flags instead of multiple bools to timekeeping_update()
    xen: Remove clock_was_set() call in the resume path
    hrtimers: Support resuming with two or more CPUs online (but stopped)
    timer: Fix jiffies wrap behavior of round_jiffies_common()
    ...

    Linus Torvalds
     

06 Jul, 2013

1 commit


05 Jul, 2013

1 commit

  • smp_call_function_* must not be called from softirq context.

    But clock_was_set() which calls on_each_cpu() is called from softirq
    context to implement a delayed clock_was_set() for the timer interrupt
    handler. Though that almost never gets invoked. A recent change in the
    resume code uses the softirq based delayed clock_was_set to support
    Xens resume mechanism.

    linux-next contains a new warning which warns if smp_call_function_*
    is called from softirq context which gets triggered by that Xen
    change.

    Fix this by moving the delayed clock_was_set() call to a work context.

    Reported-and-tested-by: Artem Savkov
    Reported-by: Sasha Levin
    Cc: David Vrabel
    Cc: Ingo Molnar
    Cc: H. Peter Anvin ,
    Cc: Konrad Wilk
    Cc: John Stultz
    Cc: xen-devel@lists.xen.org
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

29 Jun, 2013

1 commit

  • hrtimers_resume() only reprograms the timers for the current CPU as it
    assumes that all other CPUs are offline at this point in the resume
    process. If other CPUs are online then their timers will not be
    corrected and they may fire at the wrong time.

    When running as a Xen guest, this assumption is not true. Non-boot
    CPUs are only stopped with IRQs disabled instead of offlining them.
    This is a performance optimization as disabling the CPUs would add an
    unacceptable amount of additional downtime during a live migration (>
    200 ms for a 4 VCPU guest).

    hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...)
    as the other CPUs will be stopped with IRQs disabled. Instead, defer
    the call to the next softirq.

    [ tglx: Separated the xen change out ]

    Signed-off-by: David Vrabel
    Cc: Konrad Rzeszutek Wilk
    Cc: John Stultz
    Cc:
    Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.com
    Signed-off-by: Thomas Gleixner

    David Vrabel
     

12 May, 2013

1 commit

  • Avoid waking up every thread sleeping in a nanosleep call during
    suspend and resume by calling a freezable blocking call. Previous
    patches modified the freezer to avoid sending wakeups to threads
    that are blocked in freezable blocking calls.

    This call was selected to be converted to a freezable call because
    it doesn't hold any locks or release any resources when interrupted
    that might be needed by another freezing task or a kernel driver
    during suspend, and is a common site where idle userspace tasks are
    blocked.

    Acked-by: Tejun Heo
    Acked-by: Thomas Gleixner
    Signed-off-by: Colin Cross
    Signed-off-by: Rafael J. Wysocki

    Colin Cross
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

02 May, 2013

1 commit


25 Apr, 2013

1 commit


09 Apr, 2013

2 commits

  • One can trigger an overflow when using ktime_add_ns() on a 32bit
    architecture not supporting CONFIG_KTIME_SCALAR.

    When passing a very high value for u64 nsec, e.g. 7881299347898368000
    the do_div() function converts this value to seconds (7881299347) which
    is still to high to pass to the ktime_set() function as long. The result
    in is a negative value.

    The problem on my system occurs in the tick-sched.c,
    tick_nohz_stop_sched_tick() when time_delta is set to
    timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
    valid, thus ktime_add_ns() is called with a too large value resulting in
    a negative expire value. This leads to an endless loop in the ticker code:

    time_delta: 7881299347898368000
    expires = ktime_add_ns(last_update, time_delta)
    expires: negative value

    This fix caps the value to KTIME_MAX.

    This error doesn't occurs on 64bit or architectures supporting
    CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).

    Cc: stable@vger.kernel.org
    Signed-off-by: David Engraf
    [jstultz: Minor tweaks to commit message & header]
    Signed-off-by: John Stultz

    David Engraf
     
  • The settimeofday01 test in the LTP testsuite effectively does

    gettimeofday(current time);
    settimeofday(Jan 1, 1970 + 100 seconds);
    settimeofday(current time);

    This test causes a stack trace to be displayed on the console during the
    setting of timeofday to Jan 1, 1970 + 100 seconds:

    [ 131.066751] ------------[ cut here ]------------
    [ 131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
    [ 131.104935] Hardware name: Dinar
    [ 131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
    [ 131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
    [ 131.182248] Call Trace:
    [ 131.184684] [] warn_slowpath_common+0x7f/0xc0
    [ 131.191312] [] warn_slowpath_null+0x1a/0x20
    [ 131.197131] [] clockevents_program_event+0x135/0x140
    [ 131.203721] [] tick_program_event+0x24/0x30
    [ 131.209534] [] hrtimer_interrupt+0x131/0x230
    [ 131.215437] [] ? cpufreq_p4_target+0x130/0x130
    [ 131.221509] [] smp_apic_timer_interrupt+0x69/0x99
    [ 131.227839] [] apic_timer_interrupt+0x6d/0x80
    [ 131.233816] [] ? sched_clock_cpu+0xc5/0x120
    [ 131.240267] [] ? cpuidle_wrap_enter+0x50/0xa0
    [ 131.246252] [] ? cpuidle_wrap_enter+0x49/0xa0
    [ 131.252238] [] cpuidle_enter_tk+0x10/0x20
    [ 131.257877] [] cpuidle_idle_call+0xa9/0x260
    [ 131.263692] [] cpu_idle+0xaf/0x120
    [ 131.268727] [] start_secondary+0x255/0x257
    [ 131.274449] ---[ end trace 1151a50552231615 ]---

    When we change the system time to a low value like this, the value of
    timekeeper->offs_real will be a negative value.

    It seems that the WARN occurs because an hrtimer has been started in the time
    between the releasing of the timekeeper lock and the IPI call (via a call to
    on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result
    is that a REALTIME_CLOCK timer has been added with softexpires = expires =
    KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at
    kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the
    clock base's offset (which was set to timekeeper->offs_real in
    do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
    was KTIME_MAX):

    KTIME_MAX - (a negative value) = overflow

    A simple check for an overflow can resolve this problem. Using KTIME_MAX
    instead of the overflow value will result in the hrtimer function being run,
    and the reprogramming of the timer after that.

    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Reviewed-by: Rik van Riel
    Signed-off-by: Prarit Bhargava
    [jstultz: Tweaked commit subject]
    Signed-off-by: John Stultz

    Prarit Bhargava
     

03 Apr, 2013

2 commits

  • We are planning to convert the dynticks Kconfig options layout
    into a choice menu. The user must be able to easily pick
    any of the following implementations: constant periodic tick,
    idle dynticks, full dynticks.

    As this implies a mutual exclusion, the two dynticks implementions
    need to converge on the selection of a common Kconfig option in order
    to ease the sharing of a common infrastructure.

    It would thus seem pretty natural to reuse CONFIG_NO_HZ to
    that end. It already implements all the idle dynticks code
    and the full dynticks depends on all that code for now.
    So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
    CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

    On the other hand we want to stay backward compatible: if
    CONFIG_NO_HZ is set in an older config file, we want to
    enable CONFIG_NO_HZ_IDLE by default.

    But we can't afford both at the same time or we run into
    a circular dependency:

    1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
    CONFIG_NO_HZ
    2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

    We might be able to support that from Kconfig/Kbuild but it
    may not be wise to introduce such a confusing behaviour.

    So to solve this, create a new CONFIG_NO_HZ_COMMON option
    which gathers the common code between idle and full dynticks
    (that common code for now is simply the idle dynticks code)
    and select it from their referring Kconfig.

    Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
    to it for backward compatibility.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Thomas Gleixner
     

27 Mar, 2013

1 commit

  • The current code makes the assumption that a cpu_base lock won't be
    held if the CPU corresponding to that cpu_base is offline, which isn't
    always true.

    If a hrtimer is not queued, then it will not be migrated by
    migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's
    cpu_base may still point to a CPU which has subsequently gone offline
    if the timer wasn't enqueued at the time the CPU went down.

    Normally this wouldn't be a problem, but a cpu_base's lock is blindly
    reinitialized each time a CPU is brought up. If a CPU is brought
    online during the period that another thread is performing a hrtimer
    operation on a stale hrtimer, then the lock will be reinitialized
    under its feet, and a SPIN_BUG() like the following will be observed:

    [ 28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0
    [ 28.087078] lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: /-1, .owner_cpu: -1
    [ 42.451150] [] (unwind_backtrace+0x0/0x120) from [] (do_raw_spin_unlock+0x44/0xdc)
    [ 42.460430] [] (do_raw_spin_unlock+0x44/0xdc) from [] (_raw_spin_unlock+0x8/0x30)
    [ 42.469632] [] (_raw_spin_unlock+0x8/0x30) from [] (__hrtimer_start_range_ns+0x1e4/0x4f8)
    [ 42.479521] [] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [] (hrtimer_start+0x20/0x28)
    [ 42.489247] [] (hrtimer_start+0x20/0x28) from [] (rcu_idle_enter_common+0x1ac/0x320)
    [ 42.498709] [] (rcu_idle_enter_common+0x1ac/0x320) from [] (rcu_idle_enter+0xa0/0xb8)
    [ 42.508259] [] (rcu_idle_enter+0xa0/0xb8) from [] (cpu_idle+0x24/0xf0)
    [ 42.516503] [] (cpu_idle+0x24/0xf0) from [] (rest_init+0x88/0xa0)
    [ 42.524319] [] (rest_init+0x88/0xa0) from [] (start_kernel+0x3d0/0x434)

    As an example, this particular crash occurred when hrtimer_start() was
    executed on CPU #0. The code locked the hrtimer's current cpu_base
    corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's
    cpu_base to an optimal CPU which was online. In this case, it selected
    the cpu_base corresponding to CPU #3.

    Before it could proceed, CPU #1 came online and reinitialized the
    spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock
    which was reinitialized. When CPU #0 finally ended up unlocking the
    old cpu_base corresponding to CPU #1 so that it could switch to CPU
    #3, we hit this SPIN_BUG() above while in switch_hrtimer_base().

    CPU #0 CPU #1
    ---- ----
    ...
    hrtimer_start()
    lock_hrtimer_base(base #1)
    ... init_hrtimers_cpu()
    switch_hrtimer_base() ...
    ... raw_spin_lock_init(&cpu_base->lock)
    raw_spin_unlock(&cpu_base->lock) ...

    Solve this by statically initializing the lock.

    Signed-off-by: Michael Bohan
    Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Michael Bohan
     

25 Mar, 2013

1 commit

  • The comments mention HRTIMER_ABS and HRTIMER_REL, these symbols don't
    exist, the proper names are HRTIMER_MODE_ABS and HRTIMER_MODE_REL.

    Signed-off-by: David Daney
    Cc: Jiri Kosina
    Link: http://lkml.kernel.org/r/1363202438-21234-1-git-send-email-ddaney.cavm@gmail.com
    Signed-off-by: Thomas Gleixner

    David Daney
     

23 Mar, 2013

1 commit


20 Feb, 2013

1 commit

  • Pull timer changes from Ingo Molnar:
    "Main changes:

    - ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility
    complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to
    keep the hardware clock updated.

    - posix-timers: Fix clock_adjtime to always return timex data on
    success. This is changing the ABI, but no breakage was expected
    and found - caution is warranted nevertheless.

    - platform persistent clock improvements/cleanups.

    - clockevents: refactor timer broadcast handling to be more generic
    and less duplicated with matching architecture code (mostly ARM
    motivated.)

    - various fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet()
    posix-cpu-timers: Fix nanosleep task_struct leak
    clockevents: Fix generic broadcast for FEAT_C3STOP
    time, Fix setting of hardware clock in NTP code
    hrtimer: Prevent hrtimer_enqueue_reprogram race
    clockevents: Add generic timer broadcast function
    clockevents: Add generic timer broadcast receiver
    timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK
    x86/time/rtc: Don't print extended CMOS year when reading RTC
    x86: Select HAS_PERSISTENT_CLOCK on x86
    timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option
    rtc: Skip the suspend/resume handling if persistent clock exist
    timekeeping: Add persistent_clock_exist flag
    posix-timers: Fix clock_adjtime to always return timex data on success
    Round the calculated scale factor in set_cyc2ns_scale()
    NTP: Add a CONFIG_RTC_SYSTOHC configuration
    MAINTAINERS: Update John Stultz's email
    time: create __getnstimeofday for WARNless calls

    Linus Torvalds
     

08 Feb, 2013

2 commits

  • Move rt scheduler definitions out of include/linux/sched.h into
    new file include/linux/sched/rt.h

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams
     
  • Move the sysctl-related bits from include/linux/sched.h into
    a new file: include/linux/sched/sysctl.h. Then update source
    files requiring access to those bits by including the new
    header file.

    Signed-off-by: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20130207094659.06dced96@riff.lan
    Signed-off-by: Ingo Molnar

    Clark Williams
     

05 Feb, 2013

1 commit

  • hrtimer_enqueue_reprogram contains a race which could result in
    timer.base switch during unlock/lock sequence.

    hrtimer_enqueue_reprogram is releasing the lock protecting the timer
    base for calling raise_softirq_irqsoff() due to a lock ordering issue
    versus rq->lock.

    If during that time another CPU calls __hrtimer_start_range_ns() on
    the same hrtimer, the timer base might switch, before the current CPU
    can lock base->lock again and therefor the unlock_timer_base() call
    will unlock the wrong lock.

    [ tglx: Added comment and massaged changelog ]

    Signed-off-by: Leonid Shatz
    Signed-off-by: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1359981217-389-1-git-send-email-izik.eidus@ravellosystems.com
    Signed-off-by: Thomas Gleixner

    Leonid Shatz
     

12 Jul, 2012

3 commits

  • The update of the hrtimer base offsets on all cpus cannot be made
    atomically from the timekeeper.lock held and interrupt disabled region
    as smp function calls are not allowed there.

    clock_was_set(), which enforces the update on all cpus, is called
    either from preemptible process context in case of do_settimeofday()
    or from the softirq context when the offset modification happened in
    the timer interrupt itself due to a leap second.

    In both cases there is a race window for an hrtimer interrupt between
    dropping timekeeper lock, enabling interrupts and clock_was_set()
    issuing the updates. Any interrupt which arrives in that window will
    see the new time but operate on stale offsets.

    So we need to make sure that an hrtimer interrupt always sees a
    consistent state of time and offsets.

    ktime_get_update_offsets() allows us to get the current monotonic time
    and update the per cpu hrtimer base offsets from hrtimer_interrupt()
    to capture a consistent state of monotonic time and the offsets. The
    function replaces the existing ktime_get() calls in hrtimer_interrupt().

    The overhead of the new function vs. ktime_get() is minimal as it just
    adds two store operations.

    This ensures that any changes to realtime or boottime offsets are
    noticed and stored into the per-cpu hrtimer base structures, prior to
    any hrtimer expiration and guarantees that timers are not expired early.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • We need to update the base offsets from this code and we need to do
    that under base->lock. Move the lock held region around the
    ktime_get() calls. The ktime_get() calls are going to be replaced with
    a function which gets the time and the offsets atomically.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Signed-off-by: John Stultz
    Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • clock_was_set() cannot be called from hard interrupt context because
    it calls on_each_cpu().

    For fixing the widely reported leap seconds issue it is necessary to
    call it from hard interrupt context, i.e. the timer tick code, which
    does the timekeeping updates.

    Provide a new function which denotes it in the hrtimer cpu base
    structure of the cpu on which it is called and raise the hrtimer
    softirq. We then execute the clock_was_set() notificiation from
    softirq context in run_hrtimer_softirq(). The hrtimer softirq is
    rarely used, so polling the flag there is not a performance issue.

    [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
    rid of all this ifdeffery ASAP ]

    Signed-off-by: John Stultz
    Reported-by: Jan Engelhardt
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    John Stultz
     

29 Nov, 2011

1 commit


19 Nov, 2011

1 commit

  • __remove_hrtimer() attempts to reprogram the clockevent device when
    the timer being removed is the next to expire. However,
    __remove_hrtimer() reprograms the clockevent *before* removing the
    timer from the timerqueue and thus when hrtimer_force_reprogram()
    finds the next timer to expire it finds the timer we're trying to
    remove.

    This is especially noticeable when the system switches to NOHz mode
    and the system tick is removed. The timer tick is removed from the
    system but the clockevent is programmed to wakeup in another HZ
    anyway.

    Silence the extra wakeup by removing the timer from the timerqueue
    before calling hrtimer_force_reprogram() so that we actually program
    the clockevent for the next timer to expire.

    This was broken by 998adc3 "hrtimers: Convert hrtimers to use
    timerlist infrastructure".

    Signed-off-by: Jeff Ohlstein
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
    Signed-off-by: Thomas Gleixner

    Jeff Ohlstein
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

26 May, 2011

1 commit

  • commit 9ec2690758a5 ("timerfd: Manage cancelable timers in timerfd")
    introduced a CONFIG_HIGHRES_TIMERS (should be CONFIG_HIGH_RES_TIMERS)
    typo, which caused applications depending on CLOCK_REALTIME timers to
    become sluggy due to the fact that the time base of the realtime
    timers was not updated when the wall clock time was set.

    This causes anything from 100% CPU use for some applications to odd
    delays and hickups.

    Reported-bisected-and-tested-by: Anca Emanuel
    Tested-by: Linus Torvalds
    Fatfingered-by: Thomas Gleixner
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

23 May, 2011

3 commits

  • The ordering of the clock bases is historical due to the
    CLOCK_REALTIME and CLOCK_MONOTONIC constants. Now the hrtimer bases
    have their own enumeration due to the gap between CLOCK_MONOTONIC and
    CLOCK_BOOTTIME. So we can be more clever as most timers end up on the
    CLOCK_MONOTONIC base due to the virtue of POSIX declaring that
    relative CLOCK_REALTIME timers are not affected by time changes. In
    desktop environments this is slowly changing as applications switch to
    absolute timers, but I've observed empty CLOCK_REALTIME bases often
    enough. There is no performance penalty or overhead when
    CLOCK_REALTIME timers are active, but in case they are not we don't
    skip over a full cache line.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Peter Zijlstra

    Thomas Gleixner
     
  • Instead of iterating over all possible timer bases avoid it by marking
    the active bases in the cpu base.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Peter Zijlstra

    Thomas Gleixner
     
  • Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the
    timer interrupt. Yes, I did not think about it, because the solution
    was so elegant. I didn't like the extra list in timerfd when it was
    proposed some time ago, but with a rcu based list the list walk it's
    less horrible than the original global lock, which was held over the
    list iteration.

    Requested-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Peter Zijlstra

    Thomas Gleixner
     

03 May, 2011

3 commits

  • Some applications must be aware of clock realtime being set
    backward. A simple example is a clock applet which arms a timer for
    the next minute display. If clock realtime is set backward then the
    applet displays a stale time for the amount of time which the clock
    was set backwards. Due to that applications poll the time because we
    don't have an interface.

    Extend the timerfd interface by adding a flag which puts the timer
    onto a different internal realtime clock. All timers on this clock are
    expired whenever the clock was set.

    The timerfd core records the monotonic offset when the timer is
    created. When the timer is armed, then the current offset is compared
    to the previous recorded offset. When it has changed, then
    timerfd_settime returns -ECANCELED. When a timer is read the offset is
    compared and if it changed -ECANCELED returned to user space. Periodic
    timers are not rearmed in the cancelation case.

    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Chris Friesen
    Tested-by: Kay Sievers
    Cc: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Davide Libenzi
    Reviewed-by: Alexander Shishkin
    Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Make clock_was_set() unconditional and rename hres_timers_resume to
    hrtimers_resume. This is a preparatory patch for hrtimers which are
    cancelled when clock realtime was set.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Signed-off-by: Mike Frysinger
    Link: http://lkml.kernel.org/r/%3C1304364267-14489-1-git-send-email-vapier%40gentoo.org%3E
    Signed-off-by: Thomas Gleixner

    Mike Frysinger
     

29 Apr, 2011

1 commit

  • Sedat and Bruno reported RCU stalls which turned out to be caused by
    the following;

    sched_init() calls init_rt_bandwidth() which calls hrtimer_init()
    _BEFORE_ hrtimers_init() is called. While not entirely correct this
    worked because hrtimer_init() only accessed statically initialized
    data (hrtimer_bases.clock_base[CLOCK_MONOTONIC])

    Commit e06383db9 (hrtimers: extend hrtimer base code to handle more
    then 2 clockids) added an indirection to the hrtimer_bases.clock_base
    lookup to avoid gap handling in the hot path. The table which is used
    for the translataion from CLOCK_ID to HRTIMER_BASE index is
    initialized at runtime in hrtimers_init(). So the early call of the
    scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME.

    Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is
    armed and the wall clock time is set (e.g. ntpdate in the early boot
    process - which also gives the problem deterministic behaviour
    i.e. magic recovery after N hours), then the timer ends up with an
    expiry time far into the future. That breaks the RT throttler
    mechanism as rt runtime is accumulated and never cleared, so the rt
    throttler detects a false cpu hog condition and blocks all RT tasks
    until the timer finally expires. That in turn stalls the RCU thread of
    TINYRCU which leads to an huge amount of RCU callbacks piling up.

    Make the translation table statically initialized, so we are back to
    the status of
    Reported-by: Bruno Prémont
    Cc: John stultz
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E
    Reviewed-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Mar, 2011

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
    posix-clocks: Check write permissions in posix syscalls
    hrtimer: Remove empty hrtimer_init_hres_timer()
    hrtimer: Update hrtimer->state documentation
    hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
    timers: Export CLOCK_BOOTTIME via the posix timers interface
    timers: Add CLOCK_BOOTTIME hrtimer base
    time: Extend get_xtime_and_monotonic_offset() to also return sleep
    time: Introduce get_monotonic_boottime and ktime_get_boottime
    hrtimers: extend hrtimer base code to handle more then 2 clockids
    ntp: Remove redundant and incorrect parameter check
    mn10300: Switch do_timer() to xtimer_update()
    posix clocks: Introduce dynamic clocks
    posix-timers: Cleanup namespace
    posix-timers: Add support for fd based clocks
    x86: Add clock_adjtime for x86
    posix-timers: Introduce a syscall for clock tuning.
    time: Splitout compat timex accessors
    ntp: Add ADJ_SETOFFSET mode bit
    time: Introduce timekeeping_inject_offset
    posix-timer: Update comment
    ...

    Fix up new system-call-related conflicts in
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/syscall_table_32.S
    (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
    due to movement of get_jiffies_64() in:
    kernel/time.c

    Linus Torvalds
     

11 Mar, 2011

1 commit


08 Mar, 2011

1 commit

  • In complex subsystems like mac80211 structures can contain several
    timers and work structs, so identifying a specific instance from the
    call trace and object type output of debugobjects can be hard.

    Allow the subsystems which support debugobjects to provide a hint
    function. This function returns a pointer to a kernel address
    (preferrably the objects callback function) which is printed along
    with the debugobjects type.

    Add hint methods for timer_list, work_struct and hrtimer.

    [ tglx: Massaged changelog, made it compile ]

    Signed-off-by: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     

03 Mar, 2011

1 commit

  • We calculate the current time of each clock base by adding an offset
    to clock_monotonic. The offset for the clock bases is set in
    retrigger_next_event() which is called when we switch a cpu to highres
    mode or when the clock was set.

    Add the missing update for clock boottime.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz

    Thomas Gleixner
     

22 Feb, 2011

2 commits

  • CLOCK_MONOTONIC stops while the system is in suspend. This is because
    to applications system suspend is invisible. However, there is a
    growing set of applications that are wanting to be suspend-aware,
    but do not want to deal with the complications of CLOCK_REALTIME
    (which might jump around if settimeofday is called).

    For these applications, I propose a new clockid: CLOCK_BOOTTIME.
    CLOCK_BOOTTIME is idential to CLOCK_MONOTONIC, except it also
    includes any time spent in suspend.

    This patch add hrtimer base for CLOCK_BOOTTIME, using
    get_monotonic_boottime/ktime_get_boottime, to allow
    in kernel users to set timers against.

    CC: Jamie Lokier
    CC: Thomas Gleixner
    CC: Alexander Shishkin
    CC: Arve Hjønnevåg
    Signed-off-by: John Stultz

    John Stultz
     
  • Extend get_xtime_and_monotonic_offset to
    get_xtime_and_monotonic_and_sleep_offset().

    CC: Jamie Lokier
    CC: Thomas Gleixner
    CC: Alexander Shishkin
    CC: Arve Hjønnevåg
    Signed-off-by: John Stultz

    John Stultz