26 Mar, 2006

2 commits

  • This removes the support for pps. It's completely unused within the kernel
    and is basically in the way for further cleanups. It should be easier to
    readd proper support for it after the rest has been converted to NTP4
    (where the pps mechanisms are quite different from NTP3 anyway).

    Signed-off-by: Roman Zippel
    Cc: Adrian Bunk
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • alarm() calls the kernel with an unsigend int timeout in seconds. The
    value is stored in the tv_sec field of a struct timeval to setup the
    itimer. The tv_sec field of struct timeval is of type long, which causes
    the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.

    Before the hrtimer merge (pre 2.6.16) such a negative value was converted
    to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
    not clear whether this was intended or just happened to be done by the
    timeval_to_jiffies code.

    hrtimers expect a timeval in canonical form and treat a negative timeout as
    already expired. This breaks the legitimate usage of alarm() with a
    timeout value > INT_MAX seconds.

    For 32 bit machines it is therefor necessary to limit the internal seconds
    value to avoid API breakage. Instead of doing this in all implementations
    of sys_alarm the duplicated sys_alarm code is moved into a common function
    in itimer.c

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

24 Mar, 2006

2 commits

  • Make the softlockup detector purely timer-interrupt driven, removing
    softirq-context (timer) dependencies. This means that if the softlockup
    watchdog triggers, it has truly observed a longer than 10 seconds
    scheduling delay of a SCHED_FIFO prio 99 task.

    (the patch also turns off the softlockup detector during the initial bootup
    phase and does small style fixes)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • With internal Xen-enabled kernels we see the kernel's static per-cpu data
    area exceed the limit of 32k on x86-64, and even native x86-64 kernels get
    fairly close to that limit. I generally question whether it is reasonable
    to have data structures several kb in size allocated as per-cpu data when
    the space there is rather limited.

    The biggest arch-independent consumer is tvec_bases (over 4k on 32-bit
    archs, over 8k on 64-bit ones), which now gets converted to use dynamically
    allocated memory instead.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

17 Mar, 2006

1 commit

  • The pointer to the current time interpolator and the current list of time
    interpolators are typically only changed during bootup. Adding
    __read_mostly takes them away from possibly hot cachelines.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Mar, 2006

2 commits

  • Add a compiler barrier so that we don't read jiffies before updating
    jiffies_64.

    Signed-off-by: Atsushi Nemoto
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Nemoto
     
  • Also from Thomas Gleixner

    Function next_timer_interrupt() got broken with a recent patch
    6ba1b91213e81aa92b5cf7539f7d2a94ff54947c as sys_nanosleep() was moved to
    hrtimer. This broke things as next_timer_interrupt() did not check hrtimer
    tree for next event.

    Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ,
    VST) implementations, as the system can be in idle when next hrtimer event
    was supposed to happen. At least ARM and S390 currently use
    next_timer_interrupt().

    Signed-off-by: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Lindgren
     

03 Mar, 2006

1 commit

  • On some platforms readq performs additional work to make sure I/O is done
    in a coherent way. This is not needed for time retrieval as done by the
    time interpolator. So we can use readq_relaxed instead which will improve
    performance.

    It affects sparc64 and ia64 only. Apparently it makes a significant
    difference on ia64.

    Signed-off-by: Christoph Lameter
    Cc: john stultz
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

18 Feb, 2006

1 commit

  • This provides an interface for arch code to find out how many
    nanoseconds are going to be added on to xtime by the next call to
    do_timer. The value returned is a fixed-point number in 52.12 format
    in nanoseconds. The reason for this format is that it gives the
    full precision that the timekeeping code is using internally.

    The motivation for this is to fix a problem that has arisen on 32-bit
    powerpc in that the value returned by do_gettimeofday drifts apart
    from xtime if NTP is being used. PowerPC is now using a lockless
    do_gettimeofday based on reading the timebase register and performing
    some simple arithmetic. (This method of getting the time is also
    exported to userspace via the VDSO.) However, the factor and offset
    it uses were calculated based on the nominal tick length and weren't
    being adjusted when NTP varied the tick length.

    Note that 64-bit powerpc has had the lockless do_gettimeofday for a
    long time now. It also had an extremely hairy routine that got called
    from the 32-bit compat routine for adjtimex, which adjusted the
    factor and offset according to what it thought the timekeeping code
    was going to do. Not only was this only called if a 32-bit task did
    adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
    duplicating computations from kernel/timer.c and it wasn't clear that
    it was (still) correct.

    The simple solution is to ask the timekeeping code how long the
    current jiffy will be on each timer interrupt, after calling
    do_timer. If this jiffy will be a different length from the last one,
    we then need to compute new values for the factor and offset used in
    the lockless do_gettimeofday. In this way we can keep xtime and
    do_gettimeofday in sync, even when NTP is varying the tick length.

    Note that when adjtimex varies the tick length, it almost always
    introduces the variation from the next tick on. The only case I could
    see where adjtimex would vary the length of the current tick is when
    an old-style adjtime adjustment is being cancelled. (It's not clear
    to me why the adjustment has to be cancelled immediately rather than
    from the next tick on.) Thus I don't see any real need for a hook in
    adjtimex; the rare case of an old-style adjustment being cancelled can
    be fixed up at the next tick.

    Signed-off-by: Paul Mackerras
    Acked-by: john stultz
    Signed-off-by: Linus Torvalds

    Paul Mackerras
     

08 Feb, 2006

1 commit


11 Jan, 2006

2 commits


09 Jan, 2006

1 commit

  • This patch contains the following cleanups:
    - make needlessly global functions static
    - every file should include the headers containing the prototypes for
    it's global functions

    Signed-off-by: Adrian Bunk
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

31 Oct, 2005

5 commits

  • Define jiffies_64 in kernel/timer.c rather than having 24 duplicated
    defines in each architecture.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Remove timer_list.magic and associated debugging code.

    I originally added this when a spinlock was added to timer_list - this meant
    that an all-zeroes timer became illegal and init_timer() was required.

    That spinlock isn't even there any more, although timer.base must now be
    initialised.

    I'll keep this debugging code in -mm.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Fix bizarre 4-space coding style in the NTP code.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Create a macro shift_right() that avoids the numerous ugly conditionals in the
    NTP code that look like:

    if(a < 0)
    b = -(-a >> shift);
    else
    b = a >> shift;

    Replacing it with:

    b = shift_right(a, shift);

    This should have zero effect on the logic, however it should probably have
    a bit of testing just to be sure.

    Also replace open-coded min/max with the macros.

    Signed-off-by : John Stultz

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • Every user of init_timer() also needs to initialize ->function and ->data
    fields. This patch adds a simple setup_timer() helper for that.

    The schedule_timeout() is patched as an example of usage.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

30 Oct, 2005

1 commit


13 Sep, 2005

1 commit


11 Sep, 2005

2 commits


08 Sep, 2005

2 commits

  • Christoph Lameter

    When using a time interpolator that is susceptible to jitter there's
    potentially contention over a cmpxchg used to prevent time from going
    backwards. This is unnecessary when the caller holds the xtime write
    seqlock as all readers will be blocked from returning until the write is
    complete. We can therefore allow writers to insert a new value and exit
    rather than fight with CPUs who only hold a reader lock.

    Signed-off-by: Alex Williamson
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Williamson
     
  • This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.

    When enabled then per-CPU watchdog threads are started, which try to run
    once per second. If they get delayed for more than 10 seconds then a
    callback from the timer interrupt detects this condition and prints out a
    warning message and a stack dump (once per lockup incident). The feature
    is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
    only gets the debug info out, automatically, and on all CPUs affected by
    the lockup.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Nishanth Aravamudan
    Signed-Off-By: Matthias Urlichs
    Signed-off-by: Richard Purdie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

24 Aug, 2005

1 commit

  • With CONFIG_PREEMPT && !CONFIG_SMP, it's possible for sys_getppid to
    return a bogus value if the parent's task_struct gets reallocated after
    current->group_leader->real_parent is read:

    asmlinkage long sys_getppid(void)
    {
    int pid;
    struct task_struct *me = current;
    struct task_struct *parent;

    parent = me->group_leader->real_parent;
    RACE HERE => for (;;) {
    pid = parent->tgid;
    #ifdef CONFIG_SMP
    {
    struct task_struct *old = parent;

    /*
    * Make sure we read the pid before re-reading the
    * parent pointer:
    */
    smp_rmb();
    parent = me->group_leader->real_parent;
    if (old != parent)
    continue;
    }
    #endif
    break;
    }
    return pid;
    }

    If the process gets preempted at the indicated point, the parent process
    can go ahead and call exit() and then get wait()'d on to reap its
    task_struct. When the preempted process gets resumed, it will not do any
    further checks of the parent pointer on !CONFIG_SMP: it will read the
    bad pid and return.

    So, the same algorithm used when SMP is enabled should be used when
    preempt is enabled, which will recheck ->real_parent in this case.

    Signed-off-by: David Meybohm
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Meybohm
     

26 Jun, 2005

1 commit


24 Jun, 2005

3 commits

  • In kernel/sched.c the return value from preempt_count() is cast to an int.
    That made sense when preempt_count was defined as different types on is not
    needed and should go away. The patch removes the cast.

    In kernel/timer.c the return value from preempt_count() is assigned to a
    variable of type u32 and then that unsigned value is later compared to
    preempt_count(). Since preempt_count() returns an int, an int is what
    should be used to store its return value. Storing the result in an
    unsigned 32bit integer made a tiny bit of sense back when preempt_count was
    different types on different archs, but no more - let's not play signed vs
    unsigned comparison games when we don't have to. The patch modifies the
    code to use an int to hold the value. While I was around that bit of code
    I also made two changes to a nearby (related) printk() - I modified it to
    specify the loglevel explicitly and also broke the line into a few pieces
    to avoid it being longer than 80 chars and clarified the text a bit.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • This patch splits del_timer_sync() into 2 functions. The new one,
    try_to_del_timer_sync(), returns -1 when it hits executing timer.

    It can be used in interrupt context, or when the caller hold locks which
    can prevent completion of the timer's handler.

    NOTE. Currently it can't be used in interrupt context in UP case, because
    ->running_timer is used only with CONFIG_SMP.

    Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
    set_running_timer(), it is cheap.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This patch tries to solve following problems:

    1. del_timer_sync() is racy. The timer can be fired again after
    del_timer_sync have checked all cpus and before it will recheck
    timer_pending().

    2. It has scalability problems. All cpus are scanned to determine
    if the timer is running on that cpu.

    With this patch del_timer_sync is O(1) and no slower than plain
    del_timer(pending_timer), unless it has to actually wait for
    completion of the currently running timer.

    The only restriction is that the recurring timer should not use
    add_timer_on().

    3. The timers are not serialized wrt to itself.

    If CPU_0 does mod_timer(jiffies+1) while the timer is currently
    running on CPU 1, it is quite possible that local interrupt on
    CPU_0 will start that timer before it finished on CPU_1.

    4. The timers locking is suboptimal. __mod_timer() takes 3 locks
    at once and still requires wmb() in del_timer/run_timers.

    The new implementation takes 2 locks sequentially and does not
    need memory barriers.

    Currently ->base != NULL means that the timer is pending. In that case
    ->base.lock is used to lock the timer. __mod_timer also takes timer->lock
    because ->base can be == NULL.

    This patch uses timer->entry.next != NULL as indication that the timer is
    pending. So it does __list_del(), entry->next = NULL instead of list_del()
    when the timer is deleted.

    The ->base field is used for hashed locking only, it is initialized
    in init_timer() which sets ->base = per_cpu(tvec_bases). When the
    tvec_bases.lock is locked, it means that all timers which are tied
    to this base via timer->base are locked, and the base itself is locked
    too.

    So __run_timers/migrate_timers can safely modify all timers which could
    be found on ->tvX lists (pending timers).

    When the timer's base is locked, and the timer removed from ->entry list
    (which means that _run_timers/migrate_timers can't see this timer), it is
    possible to set timer->base = NULL and drop the lock: the timer remains
    locked.

    This patch adds lock_timer_base() helper, which waits for ->base != NULL,
    locks the ->base, and checks it is still the same.

    __mod_timer() schedules the timer on the local CPU and changes it's base.
    However, it does not lock both old and new bases at once. It locks the
    timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
    unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
    and adds this timer. This simplifies the code, because AB-BA deadlock is not
    possible. __mod_timer() also ensures that the timer's base is not changed
    while the timer's handler is running on the old base.

    __run_timers(), del_timer() do not change ->base anymore, they only clear
    pending flag.

    So del_timer_sync() can test timer->base->running_timer == timer to detect
    whether it is running or not.

    We don't need timer_list->lock anymore, this patch kills it.

    We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
    before clearing timer's pending flag. It was needed because __mod_timer()
    did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
    could race with del_timer()->list_del(). With this patch these functions are
    serialized through base->lock.

    One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
    adds global

    struct timer_base_s {
    spinlock_t lock;
    struct timer_list *running_timer;
    } __init_timer_base;

    which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
    struct are replaced by struct timer_base_s t_base.

    It is indeed ugly. But this can't have scalability problems. The global
    __init_timer_base.lock is used only when __mod_timer() is called for the first
    time AND the timer was compile time initialized. After that the timer migrates
    to the local CPU.

    Signed-off-by: Oleg Nesterov
    Acked-by: Ingo Molnar
    Signed-off-by: Renaud Lienhart
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

01 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds