07 Jan, 2012

1 commit

  • It was brought to my attention that my x86 change to use NMI in
    the reboot path broke Intel Nehalem and Westmere boxes when
    using kexec.

    I realized I had mistyped the if statement in commit
    3603a2512f9e69dc87914ba922eb4a0812b21cd6 and stuck the ')' in
    the wrong spot. Putting it in the right spot fixes kexec again.

    Doh.

    Reported-by: Yinghai Lu
    Cc: Linus Torvalds
    Signed-off-by: Don Zickus
    Link: http://lkml.kernel.org/r/1325866671-9797-1-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    Don Zickus
     

05 Dec, 2011

2 commits

  • Some machines may exhibit problems using the NMI to stop other
    cpus. This knob just allows one to revert back to the original
    behaviour to help diagnose the problem.

    V2:
    make function static

    Signed-off-by: Don Zickus
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: seiji.aguchi@hds.com
    Cc: vgoyal@redhat.com
    Cc: mjg@redhat.com
    Cc: tony.luck@intel.com
    Cc: gong.chen@intel.com
    Cc: satoru.moriya@hds.com
    Cc: avi@redhat.com
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/1318533267-18880-4-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    Don Zickus
     
  • A recent discussion started talking about the locking on the
    pstore fs and how it relates to the kmsg infrastructure. We
    noticed it was possible for userspace to r/w to the pstore fs
    (grabbing the locks in the process) and block the panic path
    from r/w to the same fs.

    The reason was the cpu with the lock could be doing work while
    the crashing cpu is panic'ing. Busting those spinlocks might
    cause those cpus to step on each other's data. Fine, fair
    enough.

    It was suggested it would be nice to serialize the panic path
    (ie stop the other cpus) and have only one cpu running. This
    would allow us to bust the spinlocks and not worry about another
    cpu stepping on the data.

    Of course, smp_send_stop() does this in the panic case.
    kmsg_dump() would have to be moved to be called after it. Easy
    enough.

    The only problem is on x86 the smp_send_stop() function calls
    the REBOOT_VECTOR. Any cpu with irqs disabled (which pstore and
    its backend ERST would do), block this IPI and thus do not stop.
    This makes it difficult to reliably log data to the pstore fs.

    The patch below switches from the REBOOT_VECTOR to NMI (and
    mimics what kdump does). Switching to NMI allows us to deliver
    the IPI when irqs are disabled, increasing the reliability of
    this function.

    However, Andi carefully noted that on some machines this
    approach does not work because of broken BIOSes or whatever.

    To help accomodate this, the next couple of patches will run a
    selftest and provide a knob to disable.

    V2:
    uses atomic ops to serialize the cpu that shuts everyone down
    V3:
    comment cleanup

    Signed-off-by: Don Zickus
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: seiji.aguchi@hds.com
    Cc: vgoyal@redhat.com
    Cc: mjg@redhat.com
    Cc: tony.luck@intel.com
    Cc: gong.chen@intel.com
    Cc: satoru.moriya@hds.com
    Cc: avi@redhat.com
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/1318533267-18880-2-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    Don Zickus
     

01 Nov, 2011

1 commit

  • These files were implicitly getting EXPORT_SYMBOL via device.h
    which was including module.h, but that will be fixed up shortly.

    By fixing these now, we can avoid seeing things like:

    arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
    arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
    arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’

    [ with input from Randy Dunlap and also
    from Stephen Rothwell ]

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

14 Apr, 2011

1 commit

  • For future rework of try_to_wake_up() we'd like to push part of that
    function onto the CPU the task is actually going to run on.

    In order to do so we need a generic callback from the existing scheduler IPI.

    This patch introduces such a generic callback: scheduler_ipi() and
    implements it as a NOP.

    BenH notes: PowerPC might use this IPI on offline CPUs under rare conditions!

    Acked-by: Russell King
    Acked-by: Martin Schwidefsky
    Acked-by: Chris Metcalf
    Acked-by: Jesper Nilsson
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Ralf Baechle
    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.744338123@chello.nl

    Peter Zijlstra
     

22 Oct, 2010

1 commit

  • x86 smp_ops now has a new op, stop_other_cpus which takes a parameter
    "wait" this allows the caller to specify if it wants to stop until all
    the cpus have processed the stop IPI. This is required specifically
    for the kexec case where we should wait for all the cpus to be stopped
    before starting the new kernel. We now wait for the cpus to stop in
    all cases except for panic/kdump where we expect things to be broken
    and we are doing our best to make things work anyway.

    This patch fixes a legitimate regression, which was introduced during
    2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75.

    Signed-off-by: Alok N Kataria
    LKML-Reference:
    Cc: Eric W. Biederman
    Cc: Jeremy Fitzhardinge
    Cc: v2.6.30-36
    Signed-off-by: H. Peter Anvin

    Alok Kataria
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Oct, 2009

1 commit

  • This reverts commit 9bcbdd9c58617f1301dd4f17c738bb9bc73aca70.

    The real bug producing LatencyTop latencies has been fixed in:

    f5dc375: sched: Update the clock of runqueue select_task_rq() selected

    And the commit being reverted here triggers local timer processing
    from every device IRQ. If device IRQs come in at a high frequency,
    this could cause a performance regression.

    The commit being reverted here purely 'fixed' the reported latency
    as a side effect, because CPUs were being moved out of idle more
    often.

    Acked-by: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: Frans Pop
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

08 Oct, 2009

1 commit

  • Now that range timers and deferred timers are common, I found a
    problem with these using the "perf timechart" tool. Frans Pop also
    reported high scheduler latencies via LatencyTop, when using
    iwlagn.

    It turns out that on x86, these two 'opportunistic' timers only get
    checked when another "real" timer happens. These opportunistic
    timers have the objective to save power by hitchhiking on other
    wakeups, as to avoid CPU wakeups by themselves as much as possible.

    The change in this patch runs this check not only at timer
    interrupts, but at all (device) interrupts. The effect is that:

    1) the deferred timers/range timers get delayed less

    2) the range timers cause less wakeups by themselves because
    the percentage of hitchhiking on existing wakeup events goes up.

    I've verified the working of the patch using "perf timechart", the
    original exposed bug is gone with this patch. Frans also reported
    success - the latencies are now down in the expected ~10 msec
    range.

    Signed-off-by: Arjan van de Ven
    Tested-by: Frans Pop
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

12 Jun, 2009

2 commits

  • Conflicts:
    arch/x86/kernel/cpu/mcheck/mce_64.c
    arch/x86/kernel/irq.c

    Merge reason: Resolve the conflicts above.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • * 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
    KVM: Prevent overflow in largepages calculation
    KVM: Disable large pages on misaligned memory slots
    KVM: Add VT-x machine check support
    KVM: VMX: Rename rmode.active to rmode.vm86_active
    KVM: Move "exit due to NMI" handling into vmx_complete_interrupts()
    KVM: Disable CR8 intercept if tpr patching is active
    KVM: Do not migrate pending software interrupts.
    KVM: inject NMI after IRET from a previous NMI, not before.
    KVM: Always request IRQ/NMI window if an interrupt is pending
    KVM: Do not re-execute INTn instruction.
    KVM: skip_emulated_instruction() decode instruction if size is not known
    KVM: Remove irq_pending bitmap
    KVM: Do not allow interrupt injection from userspace if there is a pending event.
    KVM: Unprotect a page if #PF happens during NMI injection.
    KVM: s390: Verify memory in kvm run
    KVM: s390: Sanity check on validity intercept
    KVM: s390: Unlink vcpu on destroy - v2
    KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock
    KVM: s390: use hrtimer for clock wakeup from idle - v2
    KVM: s390: Fix memory slot versus run - v3
    ...

    Linus Torvalds
     

10 Jun, 2009

1 commit

  • KVM uses a function call IPI to cause the exit of a guest running on a
    physical cpu. For virtual interrupt notification there is no need to
    wait on IPI receival, or to execute any function.

    This is exactly what the reschedule IPI does, without the overhead
    of function IPI. So use it instead of smp_call_function_single in
    kvm_vcpu_kick.

    Also change the "guest_mode" variable to a bit in vcpu->requests, and
    use that to collapse multiple IPI's that would be issued between the
    first one and zeroing of guest mode.

    This allows kvm_vcpu_kick to called with interrupts disabled.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     

04 Jun, 2009

1 commit

  • For some time each panic() called with interrupts disabled
    triggered the !irqs_disabled() WARN_ON in smp_call_function(),
    producing ugly backtraces and confusing users.

    This is a common situation with machine checks for example which
    tend to call panic with interrupts disabled, but will also hit
    in other situations e.g. panic during early boot. In fact it
    means that panic cannot be called in many circumstances, which
    would be bad.

    This all started with the new fancy queued smp_call_function,
    which is then used by the shutdown path to shut down the other
    CPUs.

    On closer examination it turned out that the fancy RCU
    smp_call_function() does lots of things not suitable in a panic
    situation anyways, like allocating memory and relying on complex
    system state.

    I originally tried to patch this over by checking for panic
    there, but it was quite complicated and the original patch
    was also not very popular. This also didn't fix some of the
    underlying complexity problems.

    The new code in post 2.6.29 tries to patch around this by
    checking for oops_in_progress, but that is not enough to make
    this fully safe and I don't think that's a real solution
    because panic has to be reliable.

    So instead use an own vector to reboot. This makes the reboot
    code extremly straight forward, which is definitely a big plus
    in a panic situation where it is important to avoid relying on
    too much kernel state. The new simple code is also safe to be
    called from interupts off region because it is very very simple.

    There can be situations where it is important that panic
    is reliable. For example on a fatal machine check the panic
    is needed to get the system up again and running as quickly
    as possible. So it's important that panic is reliable and
    all function it calls simple.

    This is why I came up with this simple vector scheme.
    It's very hard to beat in simplicity. Vectors are not
    particularly precious anymore since all big systems are
    using per CPU vectors.

    Another possibility would have been to use an NMI similar
    to kdump, but there is still the problem that NMIs don't
    work reliably on some systems due to BIOS issues. NMIs
    would have been able to stop CPUs running with interrupts
    off too. In the sake of universal reliability I opted for
    using a non NMI vector for now.

    I put the reboot vector into the highest priority bucket of
    the APIC vectors and moved the 64bit UV_BAU message down
    instead into the next lower priority.

    [ Impact: bug fix, fixes an old regression ]

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

13 Apr, 2009

1 commit


18 Feb, 2009

1 commit


31 Jan, 2009

1 commit


29 Jan, 2009

3 commits


10 Jan, 2009

1 commit

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
    x86: fix section mismatch warnings in mcheck/mce_amd_64.c
    x86: offer frame pointers in all build modes
    x86: remove duplicated #include's
    x86: k8 numa register active regions later
    x86: update Alan Cox's email addresses
    x86: rename all fields of mpc_table mpc_X to X
    x86: rename all fields of mpc_oemtable oem_X to X
    x86: rename all fields of mpc_bus mpc_X to X
    x86: rename all fields of mpc_cpu mpc_X to X
    x86: rename all fields of mpc_intsrc mpc_X to X
    x86: rename all fields of mpc_lintsrc mpc_X to X
    x86: rename all fields of mpc_iopic mpc_X to X
    x86: irqinit_64.c init_ISA_irqs should be static
    Documentation/x86/boot.txt: payload length was changed to payload_length
    x86: setup_percpu.c fix style problems
    x86: irqinit_64.c fix style problems
    x86: irqinit_32.c fix style problems
    x86: i8259.c fix style problems
    x86: irq_32.c fix style problems
    x86: ioport.c fix style problems
    ...

    Linus Torvalds
     

05 Jan, 2009

1 commit


04 Jan, 2009

1 commit

  • Impact: use new cpumask API to reduce memory and stack usage

    Allocate the following local cpumasks based on the number of cpus that
    are present. References will use new cpumask API. (Currently only
    modified for x86_64, x86_32 continues to use the *_map variants.)

    cpu_callin_mask
    cpu_callout_mask
    cpu_initialized_mask
    cpu_sibling_setup_mask

    Provide the following accessor functions:

    struct cpumask *cpu_sibling_mask(int cpu)
    struct cpumask *cpu_core_mask(int cpu)

    Other changes are when setting or clearing the cpu online, possible
    or present maps, use the accessor functions.

    Signed-off-by: Mike Travis
    Acked-by: Rusty Russell
    Signed-off-by: Ingo Molnar

    Mike Travis
     

03 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     

23 Dec, 2008

1 commit


17 Dec, 2008

2 commits

  • This patch simply changes cpumask_t to struct cpumask and similar
    trivial modernizations.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Mike Travis
     
  • Impact: cleanup, change parameter passing

    * Change genapic interfaces to accept cpumask_t pointers where possible.

    * Modify external callers to use cpumask_t pointers in function calls.

    * Create new send_IPI_mask_allbutself which is the same as the
    send_IPI_mask functions but removes smp_processor_id() from list.
    This removes another common need for a temporary cpumask_t variable.

    * Functions that used a temp cpumask_t variable for:

    cpumask_t allbutme = cpu_online_map;

    cpu_clear(smp_processor_id(), allbutme);
    if (!cpus_empty(allbutme))
    ...

    become:

    if (!cpus_equal(cpu_online_map, cpumask_of_cpu(cpu)))
    ...

    * Other minor code optimizations (like using cpus_clear instead of
    CPU_MASK_NONE, etc.)

    Applies to linux-2.6.tip/master.

    Signed-off-by: Mike Travis
    Signed-off-by: Rusty Russell
    Acked-by: Ingo Molnar

    Mike Travis
     

12 Dec, 2008

1 commit


11 Nov, 2008

1 commit

  • Impact: really halt all CPUs on halt

    Function machine_halt (resp. native_machine_halt) is empty for x86
    architectures. When command 'halt -f' is invoked, the message "System
    halted." is displayed but this is not really true because all CPUs are
    still running.

    There are also similar inconsistencies for other arches (some uses
    power-off for halt or forever-loop with IRQs enabled/disabled).

    IMO there should be used the same approach for all architectures OR
    what does the message "System halted" really mean?

    This patch fixes it for x86.

    Signed-off-by: Ivan Vecera
    Signed-off-by: Ingo Molnar

    Ivan Vecera
     

25 Aug, 2008

1 commit


06 Jul, 2008

1 commit


26 Jun, 2008

2 commits


14 May, 2008

1 commit

  • After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
    sh or make segfault (often on 20295564), real-time signal to cc1 etc.

    Several hurdles to jump, but a manually-assisted bisect led to -rc1's
    d2bcbad5f3ad38a1c09861bca7e252dde7bb8259 x86: do not zap_low_mappings
    in __smp_prepare_cpus. Though the low mappings were removed at bootup,
    they were left behind (with Global flags helping to keep them in TLB)
    after resume or cpu online, causing the crashes seen.

    Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
    on x86_32. This used to be serialized by smp_commenced_mask: that's now
    gone, but a low_mappings flag will do. No need for native_smp_cpus_done
    to repeat the zap: let mem_init zap BSP's low mappings just like on UP.

    (In passing, fix error code from native_cpu_up: do_boot_cpu returns a
    variety of diagnostic values, Dprintk what it says but convert to -EIO.
    And save_pg_dir separately before zap_low_mappings: doesn't matter now,
    but zapping twice in succession wiped out resume's swsusp_pg_dir.)

    That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
    cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock. The TLB flush
    IPI now being sent reveals a long-standing bug: the booting cpu has its
    APIC readied in smp_callin at the top of start_secondary, but isn't put
    into the cpu_online_map until just before that unlock_ipi_call_lock.

    So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
    including the cpu just coming up, though it has been excluded from the
    count to wait for: by the time it handles the IPI, the call data on
    native_smp_call_function_mask's stack may well have been overwritten.

    So fall back to send_IPI_mask while cpu_online_map does not match
    cpu_callout_map: perhaps there's a better APICological fix to be
    made at the start_secondary end, but I wouldn't know that.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Ingo Molnar

    Hugh Dickins
     

17 Apr, 2008

4 commits