04 May, 2008

2 commits


01 May, 2008

10 commits

  • Remove the leap second handling from second_overflow(), which doesn't have to
    check for it every second anymore. With CONFIG_NO_HZ this also makes sure the
    leap second is handled close to the full second. Additionally this makes it
    possible to abort a leap second properly by resetting the STA_INS/STA_DEL
    status bits.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • current_tick_length used to do a little more, but now it just returns
    tick_length, which we can also access directly at the few places, where it's
    needed.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • As TICK_LENGTH_SHIFT is used for more than just the tick length, the name
    isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • This adds support for setting the TAI value (International Atomic Time). The
    value is reported back to userspace via timex (as we don't have a
    ntp_gettime() syscall).

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • time_offset is already a 64bit value but its resolution barely used, so this
    makes better use of it by replacing SHIFT_UPDATE with TICK_LENGTH_SHIFT.

    Side note: the SHIFT_HZ in SHIFT_UPDATE was incorrect for CONFIG_NO_HZ and the
    primary reason for changing time_offset to 64bit to avoid the overflow.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • This changes time_freq to a 64bit value and makes it static (the only outside
    user had no real need to modify it). Intermediate values were already 64bit,
    so the change isn't that big, but it saves a little in shifts by replacing
    SHIFT_NSEC with TICK_LENGTH_SHIFT. PPM_SCALE is then used to convert between
    user space and kernel space representation.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • This adds a few more things from the ntp nanokernel related to user space.
    It's now possible to select the resolution used of some values via STA_NANO
    and the kernel reports in which mode it works (pll/fll).

    If some values for adjtimex() are outside the acceptable range, they are now
    simply normalized instead of letting the syscall fail. I removed
    MOD_CLKA/MOD_CLKB as the mapping didn't really makes any sense, the kernel
    doesn't support setting the clock.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • This is mostly a style cleanup of ntp.c and extracts part of do_adjtimex as
    ntp_update_offset(). Otherwise the functionality is still the same as before.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • x86 is the only arch right now, which provides an optimized for
    div_long_long_rem and it has the downside that one has to be very careful that
    the divide doesn't overflow.

    The API is a little akward, as the arguments for the unsigned divide are
    signed. The signed version also doesn't handle a negative divisor and
    produces worse code on 64bit archs.

    There is little incentive to keep this API alive, so this converts the few
    users to the new API.

    Signed-off-by: Roman Zippel
    Cc: Ralf Baechle
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • This converts a few users of do_div to div_[su]64 and this demonstrates nicely
    how it can reduce some expressions to one-liners.

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

29 Apr, 2008

1 commit


25 Apr, 2008

1 commit

  • David Miller reported:

    |--------------->
    the following commit:

    | commit 27ec4407790d075c325e1f4da0a19c56953cce23
    | Author: Ingo Molnar
    | Date: Thu Feb 28 21:00:21 2008 +0100
    |
    | sched: make cpu_clock() globally synchronous
    |
    | Alexey Zaytsev reported (and bisected) that the introduction of
    | cpu_clock() in printk made the timestamps jump back and forth.
    |
    | Make cpu_clock() more reliable while still keeping it fast when it's
    | called frequently.
    |
    | Signed-off-by: Ingo Molnar

    causes watchdog triggers when a cpu exits NOHZ state when it has been
    there for >= the soft lockup threshold, for example here are some
    messages from a 128 cpu Niagara2 box:

    [ 168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
    [ 168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
    [ 168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
    [ 168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
    [ 169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
    [ 169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
    [ 169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
    [ 169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
    [ 169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
    [ 169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
    [ 169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
    [ 169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
    [ 169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
    [ 169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
    [ 169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
    [ 169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
    [ 169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]

    It's simple enough to trigger this by doing a 10 minute sleep after a
    fresh bootup then starting a parallel kernel build.

    I suspect this might be reintroducing a problem we've had and fixed
    before, see the thread:

    http://marc.info/?l=linux-kernel&m=119546414004065&w=2

    Ingo Molnar
     

22 Apr, 2008

2 commits

  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
    hrtimer: optimize the softirq time optimization
    hrtimer: reduce calls to hrtimer_get_softirq_time()
    clockevents: fix typo in tick-broadcast.c
    jiffies: add time_is_after_jiffies and others which compare with jiffies

    Linus Torvalds
     
  • …linux-2.6-sched-devel

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
    sched: build fix
    sched: better rt-group documentation
    sched: features fix
    sched: /debug/sched_features
    sched: add SCHED_FEAT_DEADLINE
    sched: debug: show a weight tree
    sched: fair: weight calculations
    sched: fair-group: de-couple load-balancing from the rb-trees
    sched: fair-group scheduling vs latency
    sched: rt-group: optimize dequeue_rt_stack
    sched: debug: add some debug code to handle the full hierarchy
    sched: fair-group: SMP-nice for group scheduling
    sched, cpuset: customize sched domains, core
    sched, cpuset: customize sched domains, docs
    sched: prepatory code movement
    sched: rt: multi level group constraints
    sched: task_group hierarchy
    sched: fix the task_group hierarchy for UID grouping
    sched: allow the group scheduler to have multiple levels
    sched: mix tasks and groups
    ...

    Linus Torvalds
     

21 Apr, 2008

1 commit


20 Apr, 2008

2 commits

  • Various SMP balancing algorithms require that the bandwidth period
    run in sync.

    Possible improvements are moving the rt_bandwidth thing into root_domain
    and keeping a span per rt_bandwidth which marks throttled cpus.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We already catch most of the TSC problems by sanity checks, but there
    is a subtle bug which has been in the code forever. This can cause
    time jumps in the range of hours.

    This was reported in:
    http://lkml.org/lkml/2007/8/23/96
    and
    http://lkml.org/lkml/2008/3/31/23

    I was able to reproduce the problem with a gettimeofday loop test on a
    dual core and a quad core machine which both have sychronized
    TSCs. The TSCs seems not to be perfectly in sync though, but the
    kernel is not able to detect the slight delta in the sync check. Still
    there exists an extremly small window where this delta can be observed
    with a real big time jump. So far I was only able to reproduce this
    with the vsyscall gettimeofday implementation, but in theory this
    might be observable with the syscall based version as well.

    CPU 0 updates the clock source variables under xtime/vyscall lock and
    CPU1, where the TSC is slighty behind CPU0, is reading the time right
    after the seqlock was unlocked.

    The clocksource reference data was updated with the TSC from CPU0 and
    the value which is read from TSC on CPU1 is less than the reference
    data. This results in a huge delta value due to the unsigned
    subtraction of the TSC value and the reference value. This algorithm
    can not be changed due to the support of wrapping clock sources like
    pm timer.

    The huge delta is converted to nanoseconds and added to xtime, which
    is then observable by the caller. The next gettimeofday call on CPU1
    will show the correct time again as now the TSC has advanced above the
    reference value.

    To prevent this TSC specific wreckage we need to compare the TSC value
    against the reference value and return the latter when it is larger
    than the actual TSC value.

    I pondered to mark the TSC unstable when the readout is smaller than
    the reference value, but this would render an otherwise good and fast
    clocksource unusable without a real good reason.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

18 Apr, 2008

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
    clocksource: make clocksource watchdog cycle through online CPUs
    Documentation: move timer related documentation to a single place
    clockevents: optimise tick_nohz_stop_sched_tick() a bit
    locking: remove unused double_spin_lock()
    hrtimers: simplify lockdep handling
    timers: simplify lockdep handling
    posix-timers: fix shadowed variables
    timer_list: add annotations to workqueue.c
    hrtimer: use nanosleep specific restart_block fields
    hrtimer: add nanosleep specific restart_block member

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb:
    kgdb: always use icache flush for sw breakpoints
    kgdb: fix SMP NMI kgdb_handle_exception exit race
    kgdb: documentation fixes
    kgdb: allow static kgdbts boot configuration
    kgdb: add documentation
    kgdb: Kconfig fix
    kgdb: add kgdb internal test suite
    kgdb: fix several kgdb regressions
    kgdb: kgdboc pl011 I/O module
    kgdb: fix optional arch functions and probe_kernel_*
    kgdb: add x86 HW breakpoints
    kgdb: print breakpoint removed on exception
    kgdb: clocksource watchdog
    kgdb: fix NMI hangs
    kgdb: fix kgdboc dynamic module configuration
    kgdb: document parameters
    x86: kgdb support
    consoles: polling support, kgdboc
    kgdb: core
    uaccess: add probe_kernel_write()

    Linus Torvalds
     
  • In order to not trip the clocksource watchdog, kgdb must touch the
    clocksource watchdog on the return to normal system run state.

    Signed-off-by: Jason Wessel
    Signed-off-by: Ingo Molnar

    Jason Wessel
     

17 Apr, 2008

3 commits

  • This way it checks if the clocks are synchronized between CPUs too.
    This might be able to detect slowly drifting TSCs which only
    go wrong over longer time.

    Signed-off-by: Andi Kleen
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Andi Kleen
     
  • Call
    ts = &per_cpu(tick_cpu_sched, cpu);
    and
    cpu = smp_processor_id();
    once instead of twice.

    No functional change done, as changed code runs with local irq off.
    Reduces source lines and text size (20bytes on x86_64).

    [ akpm@linux-foundation.org: Build fix ]

    Signed-off-by: Karsten Wiese
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Karsten Wiese
     
  • > Generic code is not supposed to include irq.h. Replace this include
    > by linux/hardirq.h instead and add/replace an include of linux/irq.h
    > in asm header files where necessary.
    > This change should only matter for architectures that make use of
    > GENERIC_CLOCKEVENTS.
    > Architectures in question are mips, x86, arm, sh, powerpc, uml and sparc64.
    >
    > I did some cross compile tests for mips, x86_64, arm, powerpc and sparc64.
    > This patch fixes also build breakages caused by the include replacement in
    > tick-common.h.

    I generally dislike adding optional linux/* includes in asm/* includes -
    I'm nervous about this causing include loops.

    However, there's a separate point to be discussed here.

    That is, what interfaces are expected of every architecture in the kernel.
    If generic code wants to be able to set the affinity of interrupts, then
    that needs to become part of the interfaces listed in linux/interrupt.h
    rather than linux/irq.h.

    So what I suggest is this approach instead (against Linus' tree of a
    couple of days ago) - we move irq_set_affinity() and irq_can_set_affinity()
    to linux/interrupt.h, change the linux/irq.h includes to linux/interrupt.h
    and include asm/irq_regs.h where needed (asm/irq_regs.h is supposed to be
    rarely used include since not much touches the stacked parent context
    registers.)

    Build tested on ARM PXA family kernels and ARM's Realview platform
    kernels which both use genirq.

    [ tglx@linutronix.de: add GENERIC_HARDIRQ dependencies ]

    Signed-off-by: Russell King
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Russell King
     

26 Mar, 2008

1 commit

  • Revert

    commit 1077f5a917b7c630231037826b344b2f7f5b903f
    Author: Parag Warudkar
    Date: Wed Jan 30 13:30:01 2008 +0100

    clocksource.c: use init_timer_deferrable for clocksource_watchdog

    clocksource_watchdog can use a deferrable timer - reduces wakeups from
    idle per second.

    The watchdog timer needs to run with the specified interval. Otherwise
    it will miss the possible wrap of the watchdog clocksource.

    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

25 Mar, 2008

1 commit


20 Mar, 2008

1 commit

  • Revert commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38 ("clocksource:
    make clocksource watchdog cycle through online CPUs") due to the
    regression reported by Gabriel C at

    http://lkml.org/lkml/2008/2/24/281

    (short vesion: it makes TSC be marked as always unstable on his
    machine).

    Cc: Andi Kleen
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Robert Hancock
    Acked-by: Linus Torvalds
    Cc: "Rafael J. Wysocki"
    Cc: Gabriel C
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

09 Mar, 2008

3 commits

  • The first version of the ntp_interval/tick_length inconsistent usage patch was
    recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

    http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

    While the fix did greatly improve the situation, it was correctly pointed out
    by Roman that it does have a small bug: If the users change clocksources after
    the system has been running and NTP has made corrections, the correctoins made
    against the old clocksource will be applied against the new clocksource,
    causing error.

    The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH
    definition has also made it up-stream as commit
    e13a2e61dd5152f5499d2003470acf9c838eab84

    http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84

    Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated
    based on the PIT's frequency, and isn't really relevant to non-PIT
    driven clocksources (that is, clocksources other then jiffies and pit).

    This patch reverts both of those changes, and simply removes
    CLOCK_TICK_ADJUST.

    This does remove the granularity error correction for users of PIT and Jiffies
    clocksource users, but the granularity error but for the majority of users, it
    should be within the 500ppm range NTP can accommodate for.

    For systems that have granularity errors greater then 500ppm, the
    "ntp_tick_adj=" boot option can be used to compensate.

    [johnstul@us.ibm.com: provided changelog]
    [mattilinnanvuori@yahoo.com: maek ntp_tick_adj static]
    Signed-off-by: Roman Zippel
    Acked-by: john stultz
    Signed-off-by: Matti Linnanvuori
    Signed-off-by: Andrew Morton
    Cc: mingo@elte.hu
    Signed-off-by: Thomas Gleixner

    Roman Zippel
     
  • Silences WARN_ONs in rcu_enter_nohz() and rcu_exit_nohz(), which appeared
    before caused by (repeated) calls to:
    $ echo 0 > /sys/devices/system/cpu/cpu1/online
    $ echo 1 > /sys/devices/system/cpu/cpu1/online

    Signed-off-by: Karsten Wiese
    Cc: johnstul@us.ibm.com
    Cc: Rafael Wysocki
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Acked-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Karsten Wiese
     
  • The kernel NTP code shouldn't hand 64-bit *signed* values to do_div(). Make it
    instead hand 64-bit unsigned values. This gets rid of a couple of warnings.

    Signed-off-by: David Howells
    Cc: Roman Zippel
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    David Howells
     

01 Mar, 2008

1 commit

  • The PREEMPT-RCU can get stuck if a CPU goes idle and NO_HZ is set. The
    idle CPU will not progress the RCU through its grace period and a
    synchronize_rcu my get stuck. Without this patch I have a box that will
    not boot when PREEMPT_RCU and NO_HZ are set. That same box boots fine
    with this patch.

    This patch comes from the -rt kernel where it has been tested for
    several months.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

18 Feb, 2008

1 commit


10 Feb, 2008

1 commit

  • clocksource initialization and error accumulation. This corrects a 280ppm
    drift seen on some systems using acpi_pm, and affects other clocksources as
    well (likely to a lesser degree).

    Signed-off-by: John Stultz
    Cc: Roman Zippel
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    john stultz
     

09 Feb, 2008

4 commits

  • Fix typo in comments.

    BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
    checkpatch.pl will be complaining.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Function timekeeping_is_continuous() no longer checks flag
    CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename
    the function accordingly.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • list_for_each_safe() suffices here.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Flag CLOCK_SOURCE_WATCHDOG is cleared twice. Note clocksource_change_rating()
    won't do anyting with the cs flag.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

07 Feb, 2008

1 commit

  • I found that there is a buffer overflow problem in the following code.

    Version: 2.6.24-rc2,
    File: kernel/time/clocksource.c:417-432
    --------------------------------------------------------------------
    static ssize_t
    sysfs_show_available_clocksources(struct sys_device *dev, char *buf)
    {
    struct clocksource *src;
    char *curr = buf;

    spin_lock_irq(&clocksource_lock);
    list_for_each_entry(src, &clocksource_list, list) {
    curr += sprintf(curr, "%s ", src->name);
    }
    spin_unlock_irq(&clocksource_lock);

    curr += sprintf(curr, "\n");

    return curr - buf;
    }
    -----------------------------------------------------------------------

    sysfs_show_current_clocksources() also has the same problem though in practice
    the size of current clocksource's name won't exceed PAGE_SIZE.

    I fix the bug by using snprintf according to the specification of the kernel
    (Version:2.6.24-rc2,File:Documentation/filesystems/sysfs.txt)

    Fix sysfs_show_available_clocksources() and sysfs_show_current_clocksources()
    buffer overflow problem with snprintf().

    Signed-off-by: Miao Xie
    Cc: WANG Cong
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miao Xie
     

02 Feb, 2008

1 commit

  • To allow better diagnosis of tick-sched related, especially NOHZ
    related problems, we need to know when the last wakeup via an irq
    happened and when the CPU left the idle state.

    Add two fields (idle_waketime, idle_exittime) to the tick_sched
    structure and add them to the timer_list output.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner