14 Jan, 2008

1 commit

  • task_ppid_nr_ns is called in three places. One of these should never
    have called it. In the other two, using it broke the existing
    semantics. This was presumably accidental. If the function had not
    been there, it would have been much more obvious to the eye that those
    patches were changing the behavior. We don't need this function.

    In task_state, the pid of the ptracer is not the ppid of the ptracer.

    In do_task_stat, ppid is the tgid of the real_parent, not its pid.
    I also moved the call outside of lock_task_sighand, since it doesn't
    need it.

    In sys_getppid, ppid is the tgid of the real_parent, not its pid.

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

19 Dec, 2007

1 commit

  • This patch fixes the following section mismatches with CONFIG_HOTPLUG=n,
    CONFIG_HOTPLUG_CPU=y:

    ...
    WARNING: vmlinux.o(.text+0x41cd3): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
    WARNING: vmlinux.o(.text+0x41d67): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
    ...

    Signed-off-by: Adrian Bunk
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Adrian Bunk
     

10 Nov, 2007

1 commit

  • Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
    deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
    broken on powerpc, because we end up counting user time twice: once in
    timer_interrupt() and once in update_process_times().

    This fixes the problem by pulling the code in update_process_times
    that updates utime and stime into a separate function called
    account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
    there is a version of account_process_tick in kernel/timer.c that
    simply accounts a whole tick to either utime or stime as before. If
    CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
    implement account_process_tick.

    This also lets us simplify the s390 code a bit; it means that the s390
    timer interrupt can now call update_process_times even when
    CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
    suitable account_process_tick().

    account_process_tick() now takes the task_struct * as an argument.
    Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

06 Nov, 2007

1 commit


20 Oct, 2007

1 commit

  • This is the largest patch in the set. Make all (I hope) the places where
    the pid is shown to or get from user operate on the virtual pids.

    The idea is:
    - all in-kernel data structures must store either struct pid itself
    or the pid's global nr, obtained with pid_nr() call;
    - when seeking the task from kernel code with the stored id one
    should use find_task_by_pid() call that works with global pids;
    - when showing pid's numerical value to the user the virtual one
    should be used, but however when one shows task's pid outside this
    task's namespace the global one is to be used;
    - when getting the pid from userspace one need to consider this as
    the virtual one and use appropriate task/pid-searching functions.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    [akpm@linux-foundation.org: yet nuther build fix]
    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

19 Oct, 2007

2 commits

  • This adds items to the taststats struct to account for user and system
    time based on scaling the CPU frequency and instruction issue rates.

    Adds account_(user|system)_time_scaled callbacks which architectures
    can use to account for time using this mechanism.

    Signed-off-by: Michael Neuling
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Neuling
     
  • Signed-off-by: Daniel Walker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     

21 Jul, 2007

2 commits


20 Jul, 2007

1 commit


18 Jul, 2007

1 commit

  • kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
    variant in the past. But with __GFP_ZERO it is possible now to do zeroing
    while allocating.

    Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
    we can.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jul, 2007

2 commits

  • Add a flag in /proc/timer_stats to indicate deferrable timers. This will
    let developers/users to differentiate between types of tiemrs in
    /proc/timer_stats.

    Deferrable timer and normal timer will appear in /proc/timer_stats as below.
    10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

    Also version of timer_stats changes from v0.1 to v0.2

    Signed-off-by: Venkatesh Pallipadi
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     
  • Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
    during suspend. This may cause confusion so I restore the old behaviour by
    using the boot based time instead of monotonic for uptime.

    Signed-off-by: Tomas Janousek
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomas Janousek
     

30 May, 2007

1 commit

  • get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
    there is no timer pending. On 64 bit machines this results in a
    multiplication overflow in tick_nohz_stop_sched_tick().

    Reported by: Dave Miller

    Make the return value a constant and limit the return value to a 32 bit
    value.

    When the max timeout value is returned, we can safely stop the tick
    timer device. The max jiffies delta results in a 12 days timeout for
    HZ=1000.

    In the long term the get_next_timer_interrupt() code needs to be
    reworked to return ktime instead of jiffies, but we have to wait until
    the last users of the original NO_IDLE_HZ code are converted.

    Signed-off-by: Thomas Gleixner
    Acked-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

15 May, 2007

1 commit

  • The time keeping code move to kernel/time/timekeeping.c broke the
    clocksource resume logic patch, which got applied to the old file by a
    fuzzy application. Fix it up and move the clocksource_resume() call to
    the appropriate place.

    Signed-off-by: Thomas Gleixner
    [ tssk, tssk, everybody should use --fuzz=0 ]
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

11 May, 2007

1 commit

  • On 09-05-2007 21:10, Pallipadi, Venkatesh wrote:
    ...
    > On a 64 bit system, converting pointer to int causes unnecessary
    > compiler warning, and intermediate long conversion was to avoid that.
    > I will have to rephrase my comment to remove 32 bit value and use int,
    > as that is what the function returns.

    So, this patch reverts all changes done by my previous patch.

    I apologize for my wrong comment about "logical error" here.

    Cc: "Pallipadi, Venkatesh"
    Cc: Satyam Sharma
    Cc: Oleg Nesterov
    Signed-off-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     

10 May, 2007

3 commits

  • We need to make sure that the clocksources are resumed, when timekeeping is
    resumed. The current resume logic does not guarantee this.

    Add a resume function pointer to the clocksource struct, so clocksource
    drivers which need to reinitialize the clocksource can provide a resume
    function.

    Add a resume function, which calls the maybe available clocksource resume
    functions and resets the watchdog function, so a stable TSC can be used
    accross suspend/resume.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Signed-off-by: Jarek Poplawski
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jarek Poplawski
     

09 May, 2007

3 commits

  • There are many places in the kernel where the construction like

    foo = list_entry(head->next, struct foo_struct, list);

    are used.
    The code might look more descriptive and neat if using the macro

    list_first_entry(head, type, member) \
    list_entry((head)->next, type, member)

    Here is the macro itself and the examples of its usage in the generic code.
    If it will turn out to be useful, I can prepare the set of patches to
    inject in into arch-specific code, drivers, networking, etc.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Kirill Korotaev
    Cc: Randy Dunlap
    Cc: Andi Kleen
    Cc: Zach Brown
    Cc: Davide Libenzi
    Cc: John McCutchan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Cc: Ram Pai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Move the timekeeping code out of kernel/timer.c and into
    kernel/time/timekeeping.c. I made no cleanups or other changes in transit.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • Introduce a new flag for timers - deferrable: Timers that work normally
    when system is busy. But, will not cause CPU to come out of idle (just to
    service this timer), when CPU is idle. Instead, this timer will be
    serviced when CPU eventually wakes up with a subsequent non-deferrable
    timer.

    The main advantage of this is to avoid unnecessary timer interrupts when
    CPU is idle. If the routine currently called by a timer can wait until
    next event without any issues, this new timer can be used to setup timer
    event for that routine. This, with dynticks, allows CPUs to be lazy,
    allowing them to stay in idle for extended period of time by reducing
    unnecesary wakeup and thereby reducing the power consumption.

    This patch:

    Builds this new timer on top of existing timer infrastructure. It uses
    last bit in 'base' pointer of timer_list structure to store this deferrable
    timer flag. __next_timer_interrupt() function skips over these deferrable
    timers when CPU looks for next timer event for which it has to wake up.

    This is exported by a new interface init_timer_deferrable() that can be
    called in place of regular init_timer().

    [akpm@linux-foundation.org: Privatise a #define]
    Signed-off-by: Venkatesh Pallipadi
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Oleg Nesterov
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     

27 Apr, 2007

1 commit


08 Apr, 2007

1 commit

  • Soeren Sonnenburg reported that upon resume he is getting
    this backtrace:

    [] smp_apic_timer_interrupt+0x57/0x90
    [] retrigger_next_event+0x0/0xb0
    [] apic_timer_interrupt+0x28/0x30
    [] retrigger_next_event+0x0/0xb0
    [] __kfifo_put+0x8/0x90
    [] on_each_cpu+0x35/0x60
    [] clock_was_set+0x18/0x20
    [] timekeeping_resume+0x7c/0xa0
    [] __sysdev_resume+0x11/0x80
    [] sysdev_resume+0x47/0x80
    [] device_power_up+0x5/0x10

    it turns out that on resume we mistakenly re-enable interrupts too
    early. Do the timer retrigger only on the current CPU.

    Signed-off-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Soeren Sonnenburg
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

26 Mar, 2007

1 commit

  • The rework of next_timer_interrupt() fixed the timer wheel bugs, but
    invented a rounding error versus the next hrtimer event. This is caused
    by the conversion of the hrtimer internal representation to relative
    jiffies.

    This causes bug #8100:
    http://bugzilla.kernel.org/show_bug.cgi?id=8100

    next_timer_interrupt() returns "now" in such a case and causes the code
    in tick_nohz_stop_sched_tick() to trigger the timer softirq, which is
    bogus as no timer is due for expiry. This results in an endless context
    switching between idle and ksoftirqd until a timer is due for expiry.

    Modify the hrtimer evaluation so that, it returns now + 1, when the
    conversion results in a delta < 1 jiffie.

    It's confirmed to resolve bug #8100

    Reported-by: Emil Karlson
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

07 Mar, 2007

2 commits

  • I've only seen this on x86_64.

    The vsyscall state only gets updated when a timer interrupts comes in. So
    if the time is set long before the next timer, there will be a period when
    a gettimeofday() won't reflect the correct time.

    I added an explicit update_vsyscall() during the settimeofday(), that way
    the vsyscall state doesn't get stale.

    Signed-off-by: Daniel Walker
    Cc: Thomas Gleixner
    Acked-by: Ingo Molnar
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • The programming of periodic tick devices needs to be saved/restored
    across suspend/resume - otherwise we might end up with a system coming
    up that relies on getting a PIT (or HPET) interrupt, while those devices
    default to 'no interrupts' after powerup. (To confuse things it worked
    to a certain degree on some systems because the lapic gets initialized
    as a side-effect of SMP bootup.)

    This suspend / resume thing was dropped unintentionally during the
    last-minute -mm code reshuffling.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

05 Mar, 2007

1 commit

  • Doing something like this on a two cpu system

    # echo 0 > /sys/devices/system/cpu/cpu0/online
    # echo 1 > /sys/devices/system/cpu/cpu0/online
    # echo 0 > /sys/devices/system/cpu/cpu1/online

    will give me this:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.21-rc2-g562aa1d4-dirty #7
    -------------------------------------------------------
    bash/1282 is trying to acquire lock:
    (&cpu_base->lock_key){.+..}, at: [] hrtimer_cpu_notify+0xc6/0x240

    but task is already holding lock:
    (&cpu_base->lock_key#2){.+..}, at: [] hrtimer_cpu_notify+0xbc/0x240

    which lock already depends on the new lock.

    This happens because we have the following code in kernel/hrtimer.c:

    migrate_hrtimers(int cpu)
    [...]
    old_base = &per_cpu(hrtimer_bases, cpu);
    new_base = &get_cpu_var(hrtimer_bases);
    [...]
    spin_lock(&new_base->lock);
    spin_lock(&old_base->lock);

    Which means the spinlocks are taken in an order which depends on which cpu
    gets shut down from which other cpu. Therefore lockdep complains that there
    might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
    cpu hotplug it's safe to assume that it isn't executed concurrently on a

    The same problem exists in kernel/timer.c: migrate_timers().

    As pointed out by Christian Borntraeger one possible solution to avoid
    the locking order complaints would be to make sure that the locks are
    always taken in the same order. E.g. by taking the lock of the cpu with
    the lower number first.

    To achieve this we introduce two new spinlock functions double_spin_lock
    and double_spin_unlock which lock or unlock two locks in a given order.

    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Roman Zippel
    Cc: John Stultz
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

02 Mar, 2007

2 commits


17 Feb, 2007

10 commits

  • Provides generic infrastructure for vsyscall-gtod.

    [akpm@osdl.org: cleanup]
    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Roman Zippel

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • Add /proc/timer_stats support: debugging feature to profile timer expiration.
    Both the starting site, process/PID and the expiration function is captured.
    This allows the quick identification of timer event sources in a system.

    Sample output:

    # echo 1 > /proc/timer_stats
    # cat /proc/timer_stats
    Timer Stats Version: v0.1
    Sample period: 4.010 s
    24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
    11, 0 swapper sk_reset_timer (tcp_delack_timer)
    6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
    2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
    2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
    4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
    5, 4179 sshd sk_reset_timer (tcp_write_timer)
    4, 2248 yum-updatesd schedule_timeout (process_timeout)
    18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
    3, 0 swapper sk_reset_timer (tcp_delack_timer)
    1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
    2, 1 swapper e1000_up (e1000_watchdog)
    1, 1 init schedule_timeout (process_timeout)
    100 total events, 25.24 events/sec

    [ cleanups and hrtimers support from Thomas Gleixner ]
    [bunk@stusta.de: nr_entries can become static]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Roman Zippel
    Cc: Andi Kleen
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • With Ingo Molnar

    Add functions to provide dynamic ticks and high resolution timers. The code
    which keeps track of jiffies and handles the long idle periods is shared
    between tick based and high resolution timer based dynticks. The dyntick
    functionality can be disabled on the kernel commandline. Provide also the
    infrastructure to support high resolution timers.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Architectures register their clock event devices, in the clock events core.
    Users of the clockevents core can get clock event devices for their use. The
    clockevents core code provides notification mechanisms for various clock
    related management events.

    This allows to control the clock event devices without the architectures
    having to worry about the details of function assignment. This is also a
    preliminary for high resolution timers and dynamic ticks to allow the core
    code to control the clock functionality without intrusive changes to the
    architecture code.

    [Fixes-by: Ingo Molnar ]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Roman Zippel
    Cc: john stultz
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
    given jiffie value. Extend the existing code to allow the extra 'now'
    argument. Provide a compability function for the existing implementations to
    call the function with now == jiffies. (This also solves the racyness of the
    original code vs. jiffies changing during the iteration.)

    No functional changes to existing users of this infrastructure.

    [ remove WARN_ON() that triggered on s390, by Carsten Otte ]
    [ made new helper static, Adrian Bunk ]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • When searching for the next pending timer in the timer wheel we need to take
    the cascade into account. The current code has several problems:

    1. it looks into the previous cascade
    2. it ignores a pending cascade
    3. it ignores multiple cascades

    Change the cascade lookup, so it calculates the array index from the point of
    the next cascade and always look at the cascade buckets, when the cascade is
    pending, i.e. gets executed in the next timer softirq. When multiple
    cascades are pending, then lookup the next buckets too.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The TSC needs to be verified against another clocksource. Instead of using
    hardwired assumptions of available hardware, provide a generic verification
    mechanism. The verification uses the best available clocksource and handles
    the usability for high resolution timers / dynticks of the clocksource which
    needs to be verified.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The clocksource code allows direct updates of the rating of a given
    clocksource now. Change TSC unstable tracking to use this interface and
    remove the update callback.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Using a flag filed allows to encode more than one information into a variable.
    Preparatory patch for the generic clocksource verification.

    [mingo@elte.hu: convert vmitime.c to the new clocksource flag]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Persistent clock support: do proper timekeeping across suspend/resume.

    [bunk@stusta.de: cleanup]
    Signed-off-by: John Stultz
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Roman Zippel
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Stultz