08 Oct, 2007

1 commit

  • When using /proc/timer_stats on ppc64 I noticed the events/sec field wasnt
    accurate. Sometimes the integer part was incorrect due to rounding (we
    werent taking the fractional seconds into consideration).

    The fraction part is also wrong, we need to pad the printf statement and
    take the bottom three digits of 1000 times the value.

    Signed-off-by: Anton Blanchard
    Acked-by: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     

23 Sep, 2007

1 commit

  • In a desparate attempt to fix the suspend/resume problem on Andrews
    VAIO I added a workaround which enforced the broadcast of the oneshot
    timer on resume. This was actually resolving the problem on the VAIO
    but was just a stupid workaround, which was not tackling the root
    cause: the assignement of lower idle C-States in the ACPI processor_idle
    code. The cpuidle patches, which utilize the dynamic tick feature and
    go faster into deeper C-states exposed the problem again. The correct
    solution is the previous patch, which prevents lower C-states across
    the suspend/resume.

    Remove the enforcement code, including the conditional broadcast timer
    arming, which helped to pamper over the real problem for quite a time.
    The oneshot broadcast flag for the cpu, which runs the resume code can
    never be set at the time when this code is executed. It only gets set,
    when the CPU is entering a lower idle C-State.

    Signed-off-by: Thomas Gleixner
    Tested-by: Andrew Morton
    Cc: Len Brown
    Cc: Venkatesh Pallipadi
    Cc: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

16 Sep, 2007

5 commits

  • Taking a cpu offline removes the cpu from the online mask before the
    CPU_DEAD notification is done. The clock events layer does the cleanup
    of the dead CPU from the CPU_DEAD notifier chain. tick_do_timer_cpu is
    used to avoid xtime lock contention by assigning the task of jiffies
    xtime updates to one CPU. If a CPU is taken offline, then this
    assignment becomes stale. This went unnoticed because most of the time
    the offline CPU went dead before the online CPU reached __cpu_die(),
    where the CPU_DEAD state is checked. In the case that the offline CPU did
    not reach the DEAD state before we reach __cpu_die(), the code in there
    goes to sleep for 100ms. Due to the stale time update assignment, the
    system is stuck forever.

    Take the assignment away when a cpu is not longer in the cpu_online_mask.
    We do this in the last call to tick_nohz_stop_sched_tick() when the offline
    CPU is on the way to the final play_dead() idle entry.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • When a cpu goes offline it is removed from the broadcast masks. If the
    mask becomes empty the code shuts down the broadcast device. This is
    wrong, because the broadcast device needs to be ready for the online
    cpu going idle (into a c-state, which stops the local apic timer).

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The jinxed VAIO refuses to resume without hitting keys on the keyboard
    when this is not enforced. It is unclear why the cpu ends up in a lower
    C State without notifying the clock events layer, but enforcing the
    oneshot broadcast here is safe.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Timekeeping resume adjusts xtime by adding the slept time in seconds and
    resets the reference value of the clock source (clock->cycle_last).
    clock->cycle last is used to calculate the delta between the last xtime
    update and the readout of the clock source in __get_nsec_offset(). xtime
    plus the offset is the current time. The resume code ignores the delta
    which had already elapsed between the last xtime update and the actual
    time of suspend. If the suspend time is short, then we can see time
    going backwards on resume.

    Suspend:
    offs_s = clock->read() - clock->cycle_last;
    now = xtime + offs_s;
    timekeeping_suspend_time = read_rtc();

    Resume:
    sleep_time = read_rtc() - timekeeping_suspend_time;
    xtime.tv_sec += sleep_time;
    clock->cycle_last = clock->read();
    offs_r = clock->read() - clock->cycle_last;
    now = xtime + offs_r;

    if sleep_time_seconds == 0 and offs_r < offs_s, then time goes
    backwards.

    Fix this by storing the offset from the last xtime update and add it to
    xtime during resume, when we reset clock->cycle_last:

    sleep_time = read_rtc() - timekeeping_suspend_time;
    xtime.tv_sec += sleep_time;
    xtime += offs_s; /* Fixup xtime offset at suspend time */
    clock->cycle_last = clock->read();
    offs_r = clock->read() - clock->cycle_last;
    now = xtime + offs_r;

    Thanks to Marcelo for tracking this down on the OLPC and providing the
    necessary details to analyze the root cause.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Tosatti

    Thomas Gleixner
     
  • Lockdep complains about the access of rtc in timekeeping_suspend
    inside the interrupt disabled region of the write locked xtime lock.
    Move the access outside.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz

    Thomas Gleixner
     

12 Sep, 2007

1 commit


12 Aug, 2007

1 commit


01 Aug, 2007

1 commit


26 Jul, 2007

2 commits

  • This avoids xtime lag seen with dynticks, because while 'xtime' itself
    is still not updated often, we keep a 'xtime_cache' variable around that
    contains the approximate real-time that _is_ updated each time we do a
    'update_wall_time()', and is thus never off by more than one tick.

    IOW, this restores the original semantics for 'xtime' users, as long as
    you use the proper abstraction functions (ie 'current_kernel_time()' or
    'get_seconds()' depending on whether you want a timespec or just the
    seconds field).

    [ Updated Patch. As penance for my sins I've also yanked another #ifdef
    that was added to avoid the xtime lag w/ hrtimers. ]

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    john stultz
     
  • This avoids use of the kernel-internal "xtime" variable directly outside
    of the actual time-related functions. Instead, use the helper functions
    that we already have available to us.

    This doesn't actually change any behaviour, but this will allow us to
    fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
    (because much of the realtime information is maintained as separate
    offsets to 'xtime'), which has caused interfaces that use xtime directly
    to get a time that is out of sync with the real-time clock by up to a
    third of a second or so.

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    john stultz
     

22 Jul, 2007

5 commits

  • i386 and sparc64 have the identical code to update the cmos clock. Move it
    into kernel/time/ntp.c as there are other architectures coming along with the
    same requirements.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Thomas Gleixner
    Cc: Chris Wright
    Cc: Ingo Molnar
    Cc: john stultz
    Cc: David Miller
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Add some more debug information to the hrtimer and clock events code.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • After discussing w/ Thomas over IRC, it seems the issue is the sched tick
    fires on every cpu at the same time, causing extra lock contention.

    This smaller change, adds an extra offset per cpu so the ticks don't line up.
    This patch also drops the idle latency from 40us down to under 20us.

    Signed-off-by: john stultz
    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • When a device is replaced by a better rated device, then the broadcast
    mode needs to be evaluated again. When the new device has no requirement
    for broadcasting, then the broadcast bits for the CPU must be cleared.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • We need to make sure, that the clockevent devices are resumed, before
    the tick is resumed. The current resume logic does not guarantee this.

    Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
    event devices before resuming the tick / oneshot functionality.

    Fixup the existing users.

    Thanks to Nigel Cunningham for tracking down a long standing thinko,
    which affected the jinxed VAIO.

    [akpm@linux-foundation.org: xen build fix]
    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

21 Jul, 2007

2 commits


20 Jul, 2007

1 commit


18 Jul, 2007

1 commit

  • KSYM_NAME_LEN is peculiar in that it does not include the space for the
    trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
    buffer. This is nonsense and error-prone. Moreover, when the caller
    forgets that it's very likely to subtly bite back by corrupting the stack
    because the last position of the buffer is always cleared to zero.

    This patch increments KSYM_NAME_LEN by one and updates code accordingly.

    * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
    is fixed.

    * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
    MODULE_NAME_LEN was treated as if it didn't include space for the
    trailing '\0'. Fix it.

    Signed-off-by: Tejun Heo
    Acked-by: Paulo Marques
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

17 Jul, 2007

4 commits

  • I forgot to remove capability.h from mm.h while removing sched.h! This
    patch remedies that, because the only inline function which was using
    CAP_something was made out of line.

    Cross-compile tested without regressions on:

    all powerpc defconfigs
    all mips defconfigs
    all m68k defconfigs
    all arm defconfigs
    all ia64 defconfigs

    alpha alpha-allnoconfig alpha-defconfig alpha-up
    arm
    i386 i386-allnoconfig i386-defconfig i386-up
    ia64 ia64-allnoconfig ia64-defconfig ia64-up
    m68k
    mips
    parisc parisc-allnoconfig parisc-defconfig parisc-up
    powerpc powerpc-up
    s390 s390-allnoconfig s390-defconfig s390-up
    sparc sparc-allnoconfig sparc-defconfig sparc-up
    sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up
    um-x86_64
    x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add a flag in /proc/timer_stats to indicate deferrable timers. This will
    let developers/users to differentiate between types of tiemrs in
    /proc/timer_stats.

    Deferrable timer and normal timer will appear in /proc/timer_stats as below.
    10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

    Also version of timer_stats changes from v0.1 to v0.2

    Signed-off-by: Venkatesh Pallipadi
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     
  • Not called by anything in tree.

    Signed-off-by: Andi Kleen
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The commits

    411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
    c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
    support)

    changed the monotonic time so that it no longer jumps after resume, but it's
    not possible to use it for boot time and process start time calculations then.
    Also, the uptime no longer increases during suspend.

    I add a variable to track the wall_to_monotonic changes, a function to get the
    real boot time and a function to get the boot based time from the monotonic
    one.

    [akpm@linux-foundation.org: remove exports, add comment]
    Signed-off-by: Tomas Janousek
    Cc: Tomas Smetana
    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomas Janousek
     

04 Jul, 2007

1 commit

  • The clock_was_set() call in seconds_overflow() which happens only when
    leap seconds are inserted / deleted is wrong in two aspects:

    1. it results in a call to on_each_cpu() with interrupts disabled
    2. it is potential deadlock source vs. call_lock in smp_call_function()

    The only possible side effect of the removal might be, that an absolute
    CLOCK_REALTIME timer fires 1 second too late, in the rare case of leap
    second deletion and an absolute CLOCK_REALTIME timer which expires in
    the affected time frame. It will never fire too early.

    This was probably observed by the reporter of a June 30th -> July 1st
    hang: http://lkml.org/lkml/2007/7/3/103

    A similar problem was observed by Dave Jones, who provided a screen shot
    with a lockdep back trace, which allowed to analyse the problem.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

01 Jun, 2007

2 commits

  • Make timer-stats have almost zero overhead when enabled in the config but
    not used. (this way distros can enable it more easily)

    Also update the documentation about overhead of timer_stats - it was
    written for the first version which had a global lock and a linear list
    walk based lookup ;-)

    Signed-off-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Fix two races in the timer stats lookup code. One by ensuring that the
    initialization of a new entry is finished upon insertion of that entry.
    The other by cleaning up the hash table when the entries array is cleared,
    so that we don't have any "pre-inserted" entries.

    Thanks to Eric Dumazet for reminding me of the memory barriers.

    Signed-off-by: Bjorn Steinbrink
    Signed-off-by: Ian Kumlien
    Acked-by: Ingo Molnar
    Cc: Eric Dumazet
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Steinbrink
     

30 May, 2007

1 commit

  • get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
    there is no timer pending. On 64 bit machines this results in a
    multiplication overflow in tick_nohz_stop_sched_tick().

    Reported by: Dave Miller

    Make the return value a constant and limit the return value to a 32 bit
    value.

    When the max timeout value is returned, we can safely stop the tick
    timer device. The max jiffies delta results in a 12 days timeout for
    HZ=1000.

    In the long term the get_next_timer_interrupt() code needs to be
    reworked to return ktime instead of jiffies, but we have to wait until
    the last users of the original NO_IDLE_HZ code are converted.

    Signed-off-by: Thomas Gleixner
    Acked-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

24 May, 2007

2 commits

  • The warning in the NOHZ code, which triggers when a CPU goes idle with
    softirqs pending can fill up the logs quite quickly. Rate limit the output
    until we found the root cause of that problem.

    Signed-off-by: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard hang,
    because ACPI ignores the maxcpus setting and sends timer broadcast info for
    the offline CPUs. This results in a stuck for ever call to
    smp_call_function_single() on an offline CPU.

    Ignore the bogus information and print a kernel error to remind ACPI
    folks to fix it.

    Signed-off-by: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

22 May, 2007

1 commit

  • First thing mm.h does is including sched.h solely for can_do_mlock() inline
    function which has "current" dereference inside. By dealing with can_do_mlock()
    mm.h can be detached from sched.h which is good. See below, why.

    This patch
    a) removes unconditional inclusion of sched.h from mm.h
    b) makes can_do_mlock() normal function in mm/mlock.c
    c) exports can_do_mlock() to not break compilation
    d) adds sched.h inclusions back to files that were getting it indirectly.
    e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
    getting them indirectly

    Net result is:
    a) mm.h users would get less code to open, read, preprocess, parse, ... if
    they don't need sched.h
    b) sched.h stops being dependency for significant number of files:
    on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
    after patch it's only 3744 (-8.3%).

    Cross-compile tested on

    all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
    alpha alpha-up
    arm
    i386 i386-up i386-defconfig i386-allnoconfig
    ia64 ia64-up
    m68k
    mips
    parisc parisc-up
    powerpc powerpc-up
    s390 s390-up
    sparc sparc-up
    sparc64 sparc64-up
    um-x86_64
    x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

15 May, 2007

2 commits

  • lockdep complains about the lock nesting of clocksource and watchdog lock
    in the resume path.

    Change the resume marker to a bit operation and remove the lock from this
    path.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The time keeping code move to kernel/time/timekeeping.c broke the
    clocksource resume logic patch, which got applied to the old file by a
    fuzzy application. Fix it up and move the clocksource_resume() call to
    the appropriate place.

    Signed-off-by: Thomas Gleixner
    [ tssk, tssk, everybody should use --fuzz=0 ]
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

10 May, 2007

3 commits

  • We need to make sure that the clocksources are resumed, when timekeeping is
    resumed. The current resume logic does not guarantee this.

    Add a resume function pointer to the clocksource struct, so clocksource
    drivers which need to reinitialize the clocksource can provide a resume
    function.

    Add a resume function, which calls the maybe available clocksource resume
    functions and resets the watchdog function, so a stable TSC can be used
    accross suspend/resume.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • u64 and s64 are not necessarily 'long long' on some 64-bit
    platforms, so explicit the type to kill the compiler warnings.

    Also consistently use '%Lu' which is unsigned.

    Signed-off-by: David S. Miller
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Miller
     
  • There's more that need fixing, and fix my own subject spelling error too.

    Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     

09 May, 2007

3 commits

  • Fix the process idle load balancing in the presence of dynticks. cpus for
    which ticks are stopped will sleep till the next event wakes it up.
    Potentially these sleeps can be for large durations and during which today,
    there is no periodic idle load balancing being done.

    This patch nominates an owner among the idle cpus, which does the idle load
    balancing on behalf of the other idle cpus. And once all the cpus are
    completely idle, then we can stop this idle load balancing too. Checks added
    in fast path are minimized. Whenever there are busy cpus in the system, there
    will be an owner(idle cpu) doing the system wide idle load balancing.

    Open items:
    1. Intelligent owner selection (like an idle core in a busy package).
    2. Merge with rcu's nohz_cpu_mask?

    Signed-off-by: Suresh Siddha
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     
  • While the !highres/!dyntick code assigns the duty of the do_timer() call to
    one specific CPU, this was dropped in the highres/dyntick part during
    development.

    Steven Rostedt discovered the xtime lock contention on highres/dyntick due
    to several CPUs trying to update jiffies.

    Add the single CPU assignement back. In the dyntick case this needs to be
    handled carefully, as the CPU which has the do_timer() duty must drop the
    assignement and let it be grabbed by another CPU, which is active.
    Otherwise the do_timer() calls would not happen during the long sleep.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Steven Rostedt
    Acked-by: Mark Lord
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • kallsyms_lookup() can go iterating over modules list unprotected which is OK
    for emergency situations (oops), but not OK for regular stuff like
    /proc/*/wchan.

    Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol
    name into caller-supplied buffer or return -ERANGE. All copying is done with
    module_mutex held, so...

    Signed-off-by: Alexey Dobriyan
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan