19 Apr, 2009

2 commits

  • Fixes guest crash 'lguest: bad read address 0x4800000 len 256'

    The new per-cpu allocator ends up handing a non-linear address to
    write_gdt_entry. We do __pa() on it, and hand it to the host, which
    kills us.

    I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
    code, but had no pressing reason until now.

    Signed-off-by: Rusty Russell
    Cc: lguest@ozlabs.org

    Rusty Russell
     
  • Typical message: 'lguest: unhandled trap 6 at 0x418726 (0x0)'

    vmlinux guests were broken by 4cd8b5e2a159f18a1507f1187b44a1acbfa6341b
    'lguest: use KVM hypercalls', which rewrites guest text from kvm hypercalls
    to trap 31.

    The Launcher mmaps the kernel image. The Guest executes and
    immediately faults in the first text page (read-only). Then it hits a
    hypercall, and we rewrite that hypercall, causing a copy-on-write.
    But the Guest pagetables still refer to the old page: we fault again,
    but as Host we see the hypercall already rewritten, and pass the fault
    back to the Guest. The Guest hasn't set up an IDT yet, so we kill it.

    This doesn't happen with bzImages: they unpack themselves and so the
    text pages are already read-write.

    Signed-off-by: Rusty Russell
    Tested-by: Patrick McHardy

    Matias Zabaljauregui
     

30 Mar, 2009

3 commits


28 Mar, 2009

1 commit


09 Mar, 2009

1 commit

  • Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y

    We now need to call irq_to_desc_alloc_cpu() before
    set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no
    kmalloc available).

    So do it as we use interrupts instead. Also means we only alloc for
    irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     

23 Feb, 2009

1 commit

  • Impact: remove unused/broken code

    The Voyager subarch last built successfully on the v2.6.26 kernel
    and has been stale since then and does not build on the v2.6.27,
    v2.6.28 and v2.6.29-rc5 kernels.

    No actual users beyond the maintainer reported this breakage.
    Patches were sent and most of the fixes were accepted but the
    discussion around how to do a few remaining issues cleanly
    fizzled out with no resolution and the code remained broken.

    In the v2.6.30 x86 tree development cycle 32-bit subarch support
    has been reworked and removed - and the Voyager code, beyond the
    build problems already known, needs serious and significant
    changes and probably a rewrite to support it.

    CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has
    been notified but no patches have been sent so far to fix it.

    While all other subarchs have been converted to the new scheme,
    voyager is still broken. We'd prefer to receive patches which
    clean up the current situation in a constructive way, but even in
    case of removal there is no obstacle to add that support back
    after the issues have been sorted out in a mutually acceptable
    fashion.

    So remove this inactive code for now.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

30 Jan, 2009

2 commits


07 Jan, 2009

1 commit


03 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     

30 Dec, 2008

4 commits


24 Dec, 2008

1 commit

  • Impact: fix lguest, clean up

    32-bit lguest used used_vectors to record vectors, but that model of
    allocating vectors changed and got broken, after we changed vector
    allocation to a per_cpu array.

    Try enable that for 64bit, and the array is used for all vectors that
    are not managed by vector_irq per_cpu array.

    Also kill system_vectors[], that is now a duplication of the
    used_vectors bitmap.

    [ merged in cpus4096 due to io_apic.c cpumask changes. ]
    [ -v2, fix build failure ]

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

25 Aug, 2008

1 commit


12 Aug, 2008

1 commit

  • Using a simple page table thrashing program I measure a slight
    improvement. The program creates five processes. Each touches 1000
    pages then schedules the next process. We repeat this 1000 times. As
    lguest only caches 4 cr3 values, this rebuilds a lot of shadow page
    tables requiring virt->phys mappings.

    Before: 5.93 seconds
    After: 5.40 seconds

    (Counts of slow vs fastpath in this usage are 6092 and 2852462 respectively.)

    And more importantly for lguest, the code is simpler.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

29 Jul, 2008

3 commits


25 Jul, 2008

2 commits


16 Jul, 2008

1 commit

  • Conflicts:

    arch/powerpc/Kconfig
    arch/s390/kernel/time.c
    arch/x86/kernel/apic_32.c
    arch/x86/kernel/cpu/perfctr-watchdog.c
    arch/x86/kernel/i8259_64.c
    arch/x86/kernel/ldt.c
    arch/x86/kernel/nmi_64.c
    arch/x86/kernel/smpboot.c
    arch/x86/xen/smp.c
    include/asm-x86/hw_irq_32.h
    include/asm-x86/hw_irq_64.h
    include/asm-x86/mach-default/irq_vectors.h
    include/asm-x86/mach-voyager/irq_vectors.h
    include/asm-x86/smp.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

11 Jul, 2008

1 commit


26 Jun, 2008

1 commit


25 Jun, 2008

1 commit


20 Jun, 2008

1 commit

  • I am able to reproduce the oops reported by Simon in __switch_to() with
    lguest.

    My debug showed that there is at least one lguest specific
    issue (which should be present in 2.6.25 and before aswell) and it got
    exposed with a kernel oops with the recent fpu dynamic allocation patches.

    In addition to the previous possible scenario (with fpu_counter), in the
    presence of lguest, it is possible that the cpu's TS bit it still set and the
    lguest launcher task's thread_info has TS_USEDFPU still set.

    This is because of the way the lguest launcher handling the guest's TS bit.
    (look at lguest_set_ts() in lguest_arch_run_guest()). This can result
    in a DNA fault while doing unlazy_fpu() in __switch_to(). This will
    end up causing a DNA fault in the context of new process thats
    getting context switched in (as opossed to handling DNA fault in the context
    of lguest launcher/helper process).

    This is wrong in both pre and post 2.6.25 kernels. In the recent
    2.6.26-rc series, this is showing up as NULL pointer dereferences or
    sleeping function called from atomic context(__switch_to()), as
    we free and dynamically allocate the FPU context for the newly
    created threads. Older kernels might show some FPU corruption for processes
    running inside of lguest.

    With the appended patch, my test system is running for more than 50 mins
    now. So atleast some of your oops (hopefully all!) should get fixed.
    Please give it a try. I will spend more time with this fix tomorrow.

    Reported-by: Simon Holm Thøgersen
    Reported-by: Patrick McHardy
    Signed-off-by: Suresh Siddha
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     

16 Jun, 2008

1 commit


30 May, 2008

2 commits

  • Anthony Liguori points out that three different transports use the virtio code,
    but each one keeps its own counter to set the virtio_device's index field. In
    theory (though not in current practice) this means that names could be
    duplicated, and that risk grows as more transports are created.

    So we move the selection of the unique virtio_device.index into the common code
    in virtio.c, which has the side-benefit of removing duplicate code.

    The only complexity is that lguest and S/390 use the index to uniquely identify
    the device in case of catastrophic failure before register_virtio_device() is
    called: now we use the offset within the descriptor page as a unique identifier
    for the printks.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell
     
  • Thanks to Jon Corbet & LWN. Only took me a day to join the dots.

    Host->Guest netcat before (with unnecessily large receive buffers):
    1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s

    After:
    1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s

    Signed-off-by: Rusty Russell

    Rusty Russell
     

27 May, 2008

1 commit

  • Add pte_flags() to extract the flags from a pte. This is a special
    case of pte_val() which is only guaranteed to return the pte's flags
    correctly; the page number may be corrupted or missing.

    The intent is to allow paravirt implementations to return pte flags
    without having to do any translation of the page number (most notably,
    Xen).

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     

02 May, 2008

4 commits

  • This brings us closer to Real Life, where we'd examine the device
    features once it's set the DRIVER_OK status bit.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • If lg isn't NULL, and cpu_id is sane, &lg->cpus[cpu_id] can't be NULL.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • NR_CPUS (being a host number) is an arbitrary limit for the Guest.
    Using the array size directly (which currently happes to be NR_CPUS)
    is more futureproof.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • A recent proposed feature addition to the virtio block driver revealed
    some flaws in the API: in particular, we assume that feature
    negotiation is complete once a driver's probe function returns.

    There is nothing in the API to require this, however, and even I
    didn't notice when it was violated.

    So instead, we require the driver to specify what features it supports
    in a table, we can then move the feature negotiation into the virtio
    core. The intersection of device and driver features are presented in
    a new 'features' bitmap in the struct virtio_device.

    Note that this highlights the difference between Linux unsigned-long
    bitmaps where each unsigned long is in native endian, and a
    straight-forward little-endian array of bytes.

    Drivers can still remove feature bits in their probe routine if they
    really have to.

    API changes:
    - dev->config->feature() no longer gets and acks a feature.
    - drivers should advertise their features in the 'feature_table' field
    - use virtio_has_feature() for extra sanity when checking feature bits

    Signed-off-by: Rusty Russell

    Rusty Russell
     

19 Apr, 2008

1 commit


31 Mar, 2008

1 commit


28 Mar, 2008

1 commit