31 Jan, 2009

1 commit

  • Impact: fix CPU hotplug hang on Power6 testbox

    On architectures that support offlining all cpus (at least powerpc/pseries),
    hot-unpluging the tick_do_timer_cpu can result in a system hang.

    This comes from the fact that if the cpu going down happens to be the
    cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
    cpu is dead (via the CPU_DEAD notification), we're left without ticks,
    jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
    That's particularly the case for the cpu looping in __cpu_die() waiting
    for the dying cpu to be dead.

    This patch addresses this by having the tick_do_timer_cpu handover happen
    earlier during the CPU_DYING notification. For this, a new clockevent
    notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
    in hrtimer_cpu_notify().

    Signed-off-by: Sebastien Dugue
    Cc:
    Signed-off-by: Ingo Molnar

    Sebastien Dugue
     

15 Jan, 2009

1 commit


08 Jan, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
    trivial: chack -> check typo fix in main Makefile
    trivial: Add a space (and a comma) to a printk in 8250 driver
    trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
    trivial: Fix misspelling of "firmware" in powerpc Makefile
    trivial: Fix misspelling of "firmware" in usb.c
    trivial: Fix misspelling of "firmware" in qla1280.c
    trivial: Fix misspelling of "firmware" in a100u2w.c
    trivial: Fix misspelling of "firmware" in megaraid.c
    trivial: Fix misspelling of "firmware" in ql4_mbx.c
    trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
    trivial: Fix misspelling of "firmware" in ipw2100.c
    trivial: Fix misspelling of "firmware" in atmel.c
    trivial: Fix misspelled firmware in Kconfig
    trivial: fix an -> a typos in documentation and comments
    trivial: fix then -> than typos in comments and documentation
    trivial: update Jesper Juhl CREDITS entry with new email
    trivial: fix singal -> signal typo
    trivial: Fix incorrect use of "loose" in event.c
    trivial: printk: fix indentation of new_text_line declaration
    trivial: rtc-stk17ta8: fix sparse warning
    ...

    Linus Torvalds
     

06 Jan, 2009

2 commits


04 Jan, 2009

3 commits

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
    x86: setup_per_cpu_areas() cleanup
    cpumask: fix compile error when CONFIG_NR_CPUS is not defined
    cpumask: use alloc_cpumask_var_node where appropriate
    cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
    x86: use cpumask_var_t in acpi/boot.c
    x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
    sched: put back some stack hog changes that were undone in kernel/sched.c
    x86: enable cpus display of kernel_max and offlined cpus
    ia64: cpumask fix for is_affinity_mask_valid()
    cpumask: convert RCU implementations, fix
    xtensa: define __fls
    mn10300: define __fls
    m32r: define __fls
    h8300: define __fls
    frv: define __fls
    cris: define __fls
    cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
    cpumask: zero extra bits in alloc_cpumask_var_node
    cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
    cpumask: convert mm/
    ...

    Linus Torvalds
     
  • * 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
    [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID
    [PATCH] improve idle cputime accounting
    [PATCH] improve precision of idle time detection.
    [PATCH] improve precision of process accounting.
    [PATCH] idle cputime accounting
    [PATCH] fix scaled & unscaled cputime accounting

    Linus Torvalds
     
  • …ux-2.6-cpumask into merge-rr-cpumask

    Conflicts:
    arch/x86/kernel/io_apic.c
    kernel/rcuclassic.c
    kernel/sched.c
    kernel/time/tick-sched.c

    Signed-off-by: Mike Travis <travis@sgi.com>
    [ mingo@elte.hu: backmerged typo fix for io_apic.c ]
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    Mike Travis
     

03 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     

01 Jan, 2009

2 commits


31 Dec, 2008

4 commits

  • The cpu time spent by the idle process actually doing something is
    currently accounted as idle time. This is plain wrong, the architectures
    that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
    time spent doing nothing and the time spent by idle doing work. The first
    is accounted with account_idle_time and the second with account_system_time.
    The architectures that use the account_xxx_time interface directly and not
    the account_xxx_ticks interface now need to do the check for the idle
    process in their arch code. In particular to improve the system vs true
    idle time accounting the arch code needs to measure the true idle time
    instead of just testing for the idle process.
    To improve the tick based accounting as well we would need an architecture
    primitive that can tell us if the pt_regs of the interrupted context
    points to the magic instruction that halts the cpu.

    In addition idle time is no more added to the stime of the idle process.
    This field now contains the system time of the idle process as it should
    be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
    every tick that occurs while idle is running will be accounted as idle
    time.

    This patch contains the necessary common code changes to be able to
    distinguish idle system time and true idle time. The architectures with
    support for VIRT_CPU_ACCOUNTING need some changes to exploit this.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The utimescaled / stimescaled fields in the task structure and the
    global cpustat should be set on all architectures. On s390 the calls
    to account_user_time_scaled and account_system_time_scaled never have
    been added. In addition system time that is accounted as guest time
    to the user time of a process is accounted to the scaled system time
    instead of the scaled user time.
    To fix the bugs and to prevent future forgetfulness this patch merges
    account_system_time_scaled into account_system_time and
    account_user_time_scaled into account_user_time.

    Cc: Benjamin Herrenschmidt
    Cc: Hidetoshi Seto
    Cc: Tony Luck
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Cc: Michael Neuling
    Acked-by: Paul Mackerras
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Conflicts:

    arch/x86/kernel/io_apic.c

    Rusty Russell
     
  • Redo:

    5b7dba4: sched_clock: prevent scd->clock from moving backwards

    which had to be reverted due to s2ram hangs:

    ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards"

    ... this time with resume restoring GTOD later in the sequence
    taken into account as well.

    The "timekeeping_suspended" flag is not very nice but we cannot call into
    GTOD before it has been properly resumed and the scheduler will run very
    early in the resume sequence.

    Cc:
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

26 Dec, 2008

1 commit


13 Dec, 2008

3 commits

  • Conflicts:

    arch/x86/kernel/io_apic.c
    kernel/sched.c
    kernel/sched_stats.h

    Rusty Russell
     
  • Impact: change calling convention of existing clock_event APIs

    struct clock_event_timer's cpumask field gets changed to take pointer,
    as does the ->broadcast function.

    Another single-patch change. For safety, we BUG_ON() in
    clockevents_register_device() if it's not set.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     
  • Impact: change existing irq_chip API

    Not much point with gentle transition here: the struct irq_chip's
    setaffinity method signature needs to change.

    Fortunately, not widely used code, but hits a few architectures.

    Note: In irq_select_affinity() I save a temporary in by mangling
    irq_desc[irq].affinity directly. Ingo, does this break anything?

    (Folded in fix from KOSAKI Motohiro)

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Reviewed-by: Grant Grundler
    Acked-by: Ingo Molnar
    Cc: ralf@linux-mips.org
    Cc: grundler@parisc-linux.org
    Cc: jeremy@xensource.com
    Cc: KOSAKI Motohiro

    Rusty Russell
     

12 Dec, 2008

3 commits

  • In my device I get many interrupts from a high speed USB device in a very
    short period of time. The system spends a lot of time reprogramming the
    hardware timer which is in a slower timing domain as compared to the CPU.
    This results in the CPU spending a huge amount of time waiting for the
    timer posting to be done. All of this reprogramming is useless as the
    wake up time has not changed.

    As measured using ETM trace this drops my reprogramming penalty from
    almost 60% CPU load down to 15% during high interrupt rate. I can send
    traces to show this.

    Suppress setting of duplicate timer event when timer already stopped.
    Timer programming can be very costly and can result in long cpu stall/wait
    times.

    [akpm@linux-foundation.org: coding-style fixes]
    [tglx@linutronix.de: move the check to the right place and avoid raising
    the softirq for nothing]

    Signed-off-by: Richard Woodruff
    Cc: johnstul@us.ibm.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Woodruff, Richard
     
  • Conflicts:
    include/linux/ftrace.h
    kernel/sched.c

    Ingo Molnar
     
  • Impact: remove false positive warning

    After a cpu was taken down during cpu hotplug (read: disabled for interrupts)
    it still might have pending softirqs. However take_cpu_down makes sure
    that the idle task will run next instead of ksoftirqd on the taken down cpu.
    The idle task will call tick_nohz_stop_sched_tick which might warn about
    pending softirqs just before the cpu kills itself completely.

    However the pending softirqs on the dead cpu aren't a problem because they
    will be moved to an online cpu during CPU_DEAD handling.

    So make sure we warn only for online cpus.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

04 Dec, 2008

1 commit

  • Impact: fix time warp bug

    Alex Shi, along with Yanmin Zhang have been noticing occasional time
    inconsistencies recently. Through their great diagnosis, they found that
    the xtime_nsec value used in update_wall_time was occasionally going
    negative. After looking through the code for awhile, I realized we have
    the possibility for an underflow when three conditions are met in
    update_wall_time():

    1) We have accumulated a second's worth of nanoseconds, so we
    incremented xtime.tv_sec and appropriately decrement xtime_nsec.
    (This doesn't cause xtime_nsec to go negative, but it can cause it
    to be small).

    2) The remaining offset value is large, but just slightly less then
    cycle_interval.

    3) clocksource_adjust() is speeding up the clock, causing a
    corrective amount (compensating for the increase in the multiplier
    being multiplied against the unaccumulated offset value) to be
    subtracted from xtime_nsec.

    This can cause xtime_nsec to underflow.

    Unfortunately, since we notify the NTP subsystem via second_overflow()
    whenever we accumulate a full second, and this effects the error
    accumulation that has already occured, we cannot simply revert the
    accumulated second from xtime nor move the second accumulation to after
    the clocksource_adjust call without a change in behavior.

    This leaves us with (at least) two options:

    1) Simply return from clocksource_adjust() without making a change if we
    notice the adjustment would cause xtime_nsec to go negative.

    This would work, but I'm concerned that if a large adjustment was needed
    (due to the error being large), it may be possible to get stuck with an
    ever increasing error that becomes too large to correct (since it may
    always force xtime_nsec negative). This may just be paranoia on my part.

    2) Catch xtime_nsec if it is negative, then add back the amount its
    negative to both xtime_nsec and the error.

    This second method is consistent with how we've handled earlier rounding
    issues, and also has the benefit that the error being added is always in
    the oposite direction also always equal or smaller then the correction
    being applied. So the risk of a corner case where things get out of
    control is lessened.

    This patch fixes bug 11970, as tested by Yanmin Zhang
    http://bugzilla.kernel.org/show_bug.cgi?id=11970

    Reported-by: alex.shi@intel.com
    Signed-off-by: John Stultz
    Acked-by: "Zhang, Yanmin"
    Tested-by: "Zhang, Yanmin"
    Signed-off-by: Ingo Molnar

    john stultz
     

25 Nov, 2008

2 commits

  • Impact: cleanup, move all hrtimer processing into hardirq context

    This is an attempt at removing some of the hrtimer complexity by
    reducing the number of callback modes to 1.

    This means that all hrtimer callback functions will be ran from HARD-irq
    context.

    I went through all the 30 odd hrtimer callback functions in the kernel
    and saw only one that I'm not quite sure of, which is the one in
    net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.

    Furthermore, the hrtimer core now calls callbacks directly with IRQs
    disabled in case you try to enqueue an expired timer. If this timer is a
    periodic timer (which should use hrtimer_forward() to advance its time)
    then it might be possible to end up in an inf. recursive loop due to the
    fact that hrtimer_forward() doesn't round up to the next timer
    granularity, and therefore keeps on calling the callback - obviously
    this needs a fix.

    Aside from that, this seems to compile and actually boot on my dual core
    test box - although I'm sure there are some bugs in, me not hitting any
    makes me certain :-)

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Impact: (future) size reduction for large NR_CPUS.

    Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
    space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t
    is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.

    Signed-off-by: Rusty Russell
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

11 Nov, 2008

1 commit

  • Impact: nohz powersavings and wakeup regression

    commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick
    device from irq_enter()) causes a serious wakeup regression.

    While the patch is correct it does not take into account that spurious
    wakeups happen on x86. A fix for this issue is available, but we just
    revert to the .27 behaviour and let long running softirqs screw
    themself.

    Disable it for now.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

22 Oct, 2008

2 commits

  • Conflicts:

    kernel/time/tick-sched.c

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick
    device from irq_enter())

    solves the problem of stale jiffies when long running softirqs happen
    in a long idle sleep period, but it has a major thinko in it:

    When the interrupt which came in _is_ the timer interrupt which should
    expire ts->sched_timer then we cancel and rearm the timer _before_ it
    gets expired in hrtimer_interrupt() to the next period. That means the
    call back function is not called. This game can go on for ever :(

    Prevent this by making sure to only rearm the timer when the expiry
    time is more than one tick_period away. Otherwise keep it running as
    it is either already expired or will expiry at the right point to
    update jiffies.

    Signed-off-by: Thomas Gleixner
    Tested-by: Venkatesch Pallipadi

    Thomas Gleixner
     

20 Oct, 2008

4 commits


18 Oct, 2008

4 commits

  • Conflicts:

    arch/x86/kvm/i8254.c

    Arjan van de Ven
     
  • We did not restart the tick device from irq_enter() to avoid double
    reprogramming and extra events in the return immediate to idle case.

    But long lasting softirqs can lead to a situation where jiffies become
    stale:

    idle()
    tick stopped (reprogrammed to next pending timer)
    halt()
    interrupt
    jiffies updated from irq_enter()
    interrupt handler
    softirq function 1 runs 20ms
    softirq function 2 arms a 10ms timer with a stale jiffies value
    jiffies updated from irq_exit()
    timer wheel has now an already expired timer
    (the one added in function 2)
    timer fires and timer softirq runs

    This was discovered when debugging a timer problem which happend only
    when the ath5k driver is active. The debugging proved that there is a
    softirq function running for more than 20ms, which is a bug by itself.

    To solve this we restart the tick timer right from irq_enter(), but do
    not go through the other functions which are necessary to return from
    idle when need_resched() is set.

    Reported-by: Elias Oltmanns
    Signed-off-by: Thomas Gleixner
    Tested-by: Elias Oltmanns

    Thomas Gleixner
     
  • Split out the clock event device reprogramming. Preparatory
    patch.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • We have two separate nohz function calls in irq_enter() for no good
    reason. Just call a single NOHZ function from irq_enter() and call
    the bits in the tick code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

17 Oct, 2008

2 commits

  • * 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails
    softirq, warning fix: correct a format to avoid a warning
    softirqs, debug: preemption check
    x86, pci-hotplug, calgary / rio: fix EBDA ioremap()
    IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix
    IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes
    softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description
    dmi scan: warn about too early calls to dmi_check_system()
    generic: redefine resource_size_t as phys_addr_t
    generic: make PFN_PHYS explicitly return phys_addr_t
    generic: add phys_addr_t for holding physical addresses
    softirq: allocate less vectors
    IO resources: fix/remove printk
    printk: robustify printk, update comment
    printk: robustify printk, fix #2
    printk: robustify printk, fix
    printk: robustify printk

    Fixed up conflicts in:
    arch/powerpc/include/asm/types.h
    arch/powerpc/platforms/Kconfig.cputype
    manually.

    Linus Torvalds
     
  • Using "def_bool n" is pointless, simply using bool here appears more
    appropriate.

    Further, retaining such options that don't have a prompt and aren't
    selected by anything seems also at least questionable.

    Signed-off-by: Jan Beulich
    Cc: Ingo Molnar
    Cc: Tony Luck
    Cc: Thomas Gleixner
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

15 Oct, 2008

1 commit


10 Oct, 2008

1 commit