21 Sep, 2010

1 commit


29 May, 2010

1 commit


14 Apr, 2010

1 commit

  • This is a partial revert of 4cd8b5e2a159 "lguest: use KVM hypercalls";
    we revert to using (just as questionable but more reliable) int $15 for
    hypercalls. I didn't revert the register mapping, so we still use the
    same calling convention as kvm.

    KVM in more recent incarnations stopped injecting a fault when a guest
    tried to use the VMCALL instruction from ring 1, so lguest under kvm
    fails to make hypercalls. It was nice to share code with our KVM
    cousins, but this was overreach.

    Signed-off-by: Rusty Russell
    Cc: Matias Zabaljauregui
    Cc: Avi Kivity

    Rusty Russell
     

15 Mar, 2010

1 commit

  • acpi=ht was important in 2003 -- before ACPI was
    universally deployed and enabled by default in
    the major Linux distributions.

    At that time, there were a fair number of people who
    or chose to, or needed to, run with acpi=off,
    yet also wanted access to Hyper-threading.

    Today we find that many invocations of "acpi=ht"
    are accidental, and thus is it possible that it
    is doing more harm than good.

    In 2.6.34, we warn on invocation of acpi=ht.
    In 2.6.35, we delete the boot option.

    Signed-off-by: Len Brown

    Len Brown
     

23 Sep, 2009

1 commit


16 Sep, 2009

1 commit

  • get/set_wallclock() have already a set of platform dependent
    implementations (default, EFI, paravirt). MRST will add another
    variant.

    Moving them to platform ops simplifies the existing code and minimizes
    the effort to integrate new variants.

    Signed-off-by: Feng Tang
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Feng Tang
     

31 Aug, 2009

3 commits

  • TSC calibration is modified by the vmware hypervisor and paravirt by
    separate means. Moorestown wants to add its own calibration routine as
    well. So make calibrate_tsc a proper x86_init_ops function and
    override it by paravirt or by the early setup of the vmware
    hypervisor.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The timer init code is convoluted with several quirks and the paravirt
    timer chooser. Figuring out which code path is actually taken is not
    for the faint hearted.

    Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and
    replace the paravirt time chooser and the remaining x86 quirk with a
    simple x86_init_ops function.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • irq_init is overridden by x86_quirks and by paravirts. Unify the whole
    mess and make it an unconditional x86_init_ops function which defaults
    to the standard function and can be overridden by the early platform
    code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

27 Aug, 2009

1 commit

  • memory_setup is overridden by x86_quirks and by paravirts with weak
    functions and quirks. Unify the whole mess and make it an
    unconditional x86_init_ops function which defaults to the standard
    function and can be overridden by the early platform code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

30 Jul, 2009

2 commits

  • Every so often, after code shuffles, I need to go through and unbitrot
    the Lguest Journey (see drivers/lguest/README). Since we now use RCU in
    a simple form in one place I took the opportunity to expand that explanation.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    Cc: Paul McKenney

    Rusty Russell
     
  • I don't really notice it (except to begrudge the extra vertical
    space), but Ingo does. And he pointed out that one excuse of lguest
    is as a teaching tool, it should set a good example.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     

17 Jul, 2009

2 commits


12 Jun, 2009

7 commits

  • This version requires that host and guest have the same PAE status.
    NX cap is not offered to the guest, yet.

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • Add support for kvm_hypercall4(); PAE wants it.

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name
    (That's really what it is, and the confusion gets worse with PAE support)

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell
    Reported-by: Jeremy Fitzhardinge

    Matias Zabaljauregui
     
  • Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • The downside of the last patch which made restore_flags and irq_enable
    check interrupts is that they are now too big to be patched directly
    into the callsites, so the C versions are always used.

    But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
    the registers. In fact, we don't need any registers in the fast path,
    so we can do better than this if we actually code them in assembler.

    The results are in the noise, but since it's about the same amount of
    code, it's worth applying.

    1GB Guest->Host: input(suppressed),output(suppressed)
    Before:
    Seconds: 0:16.53
    Packets: 377268,753673
    Interrupts: 22461,24297
    Notifications: 1(5245),21303(732370)
    Net IRQs triggered: 377023(245),42578(711095)

    After:
    Seconds: 0:16.48
    Packets: 377289,753673
    Interrupts: 22281,24465
    Notifications: 1(5245),21296(732377)
    Net IRQs triggered: 377060(229),42564(711109)

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • lguest never checked for pending interrupts when enabling interrupts, and
    things still worked. However, it makes a significant difference to TCP
    performance, so it's time we fixed it by introducing a pending_irq flag
    and checking it on irq_restore and irq_enable.

    These two routines are now too big to patch into the 8/10 bytes
    patch space, so we drop that code.

    Note: The high latency on interrupt delivery had a very curious
    effect: once everything else was optimized, networking without GSO was
    faster than networking with GSO, since more interrupts were sent and
    hence a greater chance of one getting through to the Guest!

    Note2: (Almost) Closing the same loophole for iret doesn't have any
    measurable effect, so I'm leaving that patch for the moment.

    Before:
    1GB tcpblast Guest->Host: 30.7 seconds
    1GB tcpblast Guest->Host (no GSO): 76.0 seconds

    After:
    1GB tcpblast Guest->Host: 6.8 seconds
    1GB tcpblast Guest->Host (no GSO): 27.8 seconds

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Copy from arch/x86/kernel/irqinit_32.c: we don't use the vectors beyond
    LGUEST_IRQS (if any), but we might as well set them all.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

11 Jun, 2009

2 commits

  • * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits)
    xen: cache cr0 value to avoid trap'n'emulate for read_cr0
    xen/x86-64: clean up warnings about IST-using traps
    xen/x86-64: fix breakpoints and hardware watchpoints
    xen: reserve Xen start_info rather than e820 reserving
    xen: add FIX_TEXT_POKE to fixmap
    lguest: update lazy mmu changes to match lguest's use of kvm hypercalls
    xen: honour VCPU availability on boot
    xen: add "capabilities" file
    xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet
    xen/sys/hypervisor: change writable_pt to features
    xen: add /sys/hypervisor support
    xen/xenbus: export xenbus_dev_changed
    xen: use device model for suspending xenbus devices
    xen: remove suspend_cancel hook
    xen/dev-evtchn: clean up locking in evtchn
    xen: export ioctl headers to userspace
    xen: add /dev/xen/evtchn driver
    xen: add irq_from_evtchn
    xen: clean up gate trap/interrupt constants
    xen: set _PAGE_NX in __supported_pte_mask before pagetable construction
    ...

    Linus Torvalds
     
  • * 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits)
    x86, apic: Fix dummy apic read operation together with broken MP handling
    x86, apic: Restore irqs on fail paths
    x86: Print real IOAPIC version for x86-64
    x86: enable_update_mptable should be a macro
    sparseirq: Allow early irq_desc allocation
    x86, io-apic: Don't mark pin_programmed early
    x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled
    x86, irq: update_mptable needs pci_routeirq
    x86: don't call read_apic_id if !cpu_has_apic
    x86, apic: introduce io_apic_irq_attr
    x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix
    x86: read apic ID in the !acpi_lapic case
    x86: apic: Fixmap apic address even if apic disabled
    x86: display extended apic registers with print_local_APIC and cpu_debug code
    x86: read apic ID in the !acpi_lapic case
    x86: clean up and fix setup_clear/force_cpu_cap handling
    x86: apic: Check rev 3 fadt correctly for physical_apic bit
    x86/pci: update pirq_enable_irq() to setup io apic routing
    x86/acpi: move setup io apic routing out of CONFIG_ACPI scope
    x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector()
    ...

    Linus Torvalds
     

05 Jun, 2009

1 commit


08 May, 2009

1 commit

  • Conflicts:
    arch/frv/include/asm/pgtable.h
    arch/x86/include/asm/required-features.h
    arch/x86/xen/mmu.c

    Merge reason: x86/xen was on a .29 base still, move it to a fresher
    branch and pick up Xen fixes as well, plus resolve
    conflicts

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Apr, 2009

1 commit

  • This simplifies the node awareness of the code. All our allocators
    only deal with a NUMA node ID locality not with CPU ids anyway - so
    there's no need to maintain (and transform) a CPU id all across the
    IRq layer.

    v2: keep move_irq_desc related

    [ Impact: cleanup, prepare IRQ code to be NUMA-aware ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

22 Apr, 2009

1 commit

  • Pass clocksource pointer to the read() callback for clocksources. This
    allows us to share the callback between multiple instances.

    [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Magnus Damm
    Acked-by: John Stultz
    Cc: Thomas Gleixner
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Magnus Damm
     

19 Apr, 2009

1 commit

  • Fixes guest crash 'lguest: bad read address 0x4800000 len 256'

    The new per-cpu allocator ends up handing a non-linear address to
    write_gdt_entry. We do __pa() on it, and hand it to the host, which
    kills us.

    I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
    code, but had no pressing reason until now.

    Signed-off-by: Rusty Russell
    Cc: lguest@ozlabs.org

    Rusty Russell
     

08 Apr, 2009

2 commits

  • Duplicate hcall -> kvm_hypercall0 convertion from "lguest: use KVM
    hypercalls".

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Matias Zabaljauregui
    Cc: Rusty Russell

    Jeremy Fitzhardinge
     
  • * commit 'origin/master': (4825 commits)
    Fix build errors due to CONFIG_BRANCH_TRACER=y
    parport: Use the PCI IRQ if offered
    tty: jsm cleanups
    Adjust path to gpio headers
    KGDB_SERIAL_CONSOLE check for module
    Change KCONFIG name
    tty: Blackin CTS/RTS
    Change hardware flow control from poll to interrupt driven
    Add support for the MAX3100 SPI UART.
    lanana: assign a device name and numbering for MAX3100
    serqt: initial clean up pass for tty side
    tty: Use the generic RS485 ioctl on CRIS
    tty: Correct inline types for tty_driver_kref_get()
    splice: fix deadlock in splicing to file
    nilfs2: support nanosecond timestamp
    nilfs2: introduce secondary super block
    nilfs2: simplify handling of active state of segments
    nilfs2: mark minor flag for checkpoint created by internal operation
    nilfs2: clean up sketch file
    nilfs2: super block operations fix endian bug
    ...

    Conflicts:
    arch/x86/include/asm/thread_info.h
    arch/x86/lguest/boot.c
    drivers/xen/manage.c

    Jeremy Fitzhardinge
     

31 Mar, 2009

1 commit


30 Mar, 2009

4 commits

  • Impact: cleanup

    This patch allow us to use KVM hypercalls

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • Impact: intermittent guest segv/crash fix

    I've been seeing random guest bad address crashes and segmentation faults:
    bisect led to 4f98a2fee8 (vmscan: split LRU lists into anon & file sets),
    but that's a red herring.

    It turns out that lguest never hooked up the pte_update/pte_update_defer
    calls, so our ptes were not always in sync. After the vmscan commit, the
    bug became reproducible; now a fsck in a 64MB guest causes reproducible
    pagetable corruption.

    Signed-off-by: Rusty Russell
    Cc: jeremy@xensource.com
    Cc: virtualization@lists.osdl.org
    Cc: stable@kernel.org

    Rusty Russell
     
  • Impact: fix lazy context switch API

    Pass the previous and next tasks into the context switch start
    end calls, so that the called functions can properly access the
    task state (esp in end_context_switch, in which the next task
    is not yet completely current).

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Peter Zijlstra

    Jeremy Fitzhardinge
     
  • Impact: allow preemption during lazy mmu updates

    If we're in lazy mmu mode when context switching, leave
    lazy mmu mode, but remember the task's state in
    TIF_LAZY_MMU_UPDATES. When we resume the task, check this
    flag and re-enter lazy mmu mode if its set.

    This sets things up for allowing lazy mmu mode while preemptible,
    though that won't actually be active until the next change.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Peter Zijlstra

    Jeremy Fitzhardinge
     

28 Mar, 2009

1 commit


15 Mar, 2009

1 commit

  • Impact: use new interface instead of previous ad hoc implementation

    Rather than having special purpose init_pg_table_start/end variables
    to delimit the kernel pagetable built by head_32.S, just use the brk
    mechanism to extend the bss for the new pagetable.

    This patch removes init_pg_table_start/end and pg0, defines __brk_base
    (which is page-aligned and immediately follows _end), initializes
    the brk region to start there, and uses it for the 32-bit pagetable.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     

11 Mar, 2009

1 commit


09 Mar, 2009

2 commits

  • Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y

    We now need to call irq_to_desc_alloc_cpu() before
    set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no
    kmalloc available).

    So do it as we use interrupts instead. Also means we only alloc for
    irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     
  • Impact: fix lguest boot crash on modern Intel machines

    The code in early_init_intel does:

    if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
    u64 misc_enable;

    rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);

    And that rdmsr faults (not allowed from non-0 PL). We can get around
    this by mugging the family ID part of the cpuid. 5 seems like a good
    number.

    Of course, this is a hack (how very lguest!). We could just indicate
    that we don't support MSRs, or implement lguest_rdmst.

    Reported-by: Patrick McHardy
    Signed-off-by: Rusty Russell
    Tested-by: Patrick McHardy

    Rusty Russell