08 May, 2007

2 commits

  • If you actually clear the bit, you need to:

    + pte_update_defer(vma->vm_mm, addr, ptep);

    The reason is, when updating PTEs, the hypervisor must be notified. Using
    atomic operations to do this is fine for all hypervisors I am aware of.
    However, for hypervisors which shadow page tables, if these PTE
    modifications are not trapped, you need a post-modification call to fulfill
    the update of the shadow page table.

    Acked-by: Zachary Amsden
    Cc: Hugh Dickins
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • Add ptep_test_and_clear_{dirty,young} to i386. They advertise that they
    have it and there is at least one place where it needs to be called without
    the page table lock: to clear the accessed bit on write to
    /proc/pid/clear_refs.

    ptep_clear_flush_{dirty,young} are updated to use the new functions. The
    overall net effect to current users of ptep_clear_flush_{dirty,young} is
    that we introduce an additional branch.

    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Signed-off-by: David Rientjes
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

06 May, 2007

2 commits

  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
    [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
    [PATCH] i386: type may be unused
    [PATCH] i386: Some additional chipset register values validation.
    [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
    [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
    [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
    [PATCH] i386: white space fixes in i387.h
    [PATCH] i386: Drop noisy e820 debugging printks
    [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
    [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
    [PATCH] x86-64: Share identical video.S between i386 and x86-64
    [PATCH] x86-64: Remove CONFIG_REORDER
    [PATCH] x86-64: Print type and size correctly for unknown compat ioctls
    [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
    [PATCH] i386: Little cleanups in smpboot.c
    [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
    [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
    [PATCH] i386: Add X86_FEATURE_RDTSCP
    [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
    [PATCH] i386: Implement alternative_io for i386
    ...

    Fix up trivial conflict in include/linux/highmem.h manually.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6:
    [VOYAGER] add smp alternatives
    [VOYAGER] Use modern techniques to setup and teardown low identiy mappings.
    [VOYAGER] Convert the monitor thread to use the kthread API
    [VOYAGER] clockevents driver: bring voyager in to line
    [VOYAGER] clockevents: correct boot cpu is zero assumption
    [VOYAGER] add smp_call_function_single

    Linus Torvalds
     

05 May, 2007

2 commits

  • * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits)
    PCI: Free resource files in error path of pci_create_sysfs_dev_files()
    pci-quirks: disable MSI on RS400-200 and RS480
    PCI hotplug: Use menuconfig objects
    PCI: ZT5550 CPCI Hotplug driver fix
    PCI: rpaphp: Remove semaphores
    PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry
    PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically
    PCI: rpaphp: Document is_php_dn()
    PCI: rpaphp: Document find_php_slot()
    PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot()
    PCI: rpaphp: refactor tail call to rpaphp_register_slot()
    PCI: rpaphp: remove rpaphp_set_attention_status()
    PCI: rpaphp: remove print_slot_pci_funcs()
    PCI: rpaphp: Remove setup_pci_slot()
    PCI: rpaphp: remove a call that does nothing but a pointer lookup
    PCI: rpaphp: Remove another wrappered function
    PCI: rpaphp: Remve another call that is a wrapper
    PCI: rpaphp: remove a function that does nothing but wrap debug printks
    PCI: rpaphp: Remove un-needed goto
    PCI: rpaphp: Fix a memleak; slot->location string was never freed
    ...

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
    [AGPGART] sworks-agp: Switch to PCI ref counting APIs
    [AGPGART] Nvidia AGP: Use refcount aware PCI interfaces
    [AGPGART] Fix sparse warning in sgi-agp.c
    [AGPGART] Intel-agp adjustments
    [AGPGART] Move [un]map_page_into_agp into asm/agp.h
    [AGPGART] Add missing calls to global_flush_tlb() to ali-agp
    [AGPGART] prevent probe collision of sis-agp and amd64_agp

    Linus Torvalds
     

03 May, 2007

34 commits

  • Most architectures' scatterlist.h use the type dma_addr_t, but omit to
    include which defines it. This could lead to build failures,
    so let's add the missing includes.

    Signed-off-by: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Jean Delvare
     
  • There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and
    none of them appear to require additional preempt_disable/enable here.
    Let's open-code save_init_fpu in __unlazy_fpu to save a few ops.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Andi Kleen

    Jan Kiszka
     
  • Signed-off-by: Jan Kiszka
    Signed-off-by: Andi Kleen

    Jan Kiszka
     
  • RDTSCP is already synchronous and doesn't need an explicit CPUID.
    This is a little faster and more importantly avoids VMEXITs on Hypervisors.

    Original patch from Joerg Roedel, but reworked by AK
    Also includes miscompilation fix by Eric Biederman

    Cc: "Joerg Roedel"

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Following x86-64
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Syncs up with x86-64.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Ported from x86-64.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Redefine cpu_has() to evaluate cpu features already checked in early
    boot at compile time. This way the compiler might eliminate some dead code.
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Check some CPUID bits that are needed for compiler generated early in boot.
    When the system is still in real mode before changing the VESA BIOS mode
    it is possible to still display an visible error message on the screen.

    Similar to x86-64.

    Includes cleanups from Eric Biederman

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • - Introduce a wd_ops structure
    - Convert the various nmi watchdogs over to it
    - This allows to split the perfctr reservation from the watchdog
    setup cleanly.
    - Do perfctr reservation globally as it should have always been
    - Remove dead code referenced only by unused EXPORT_SYMBOLs

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Add comment and condense code to make use of native_local_ptep_get_and_clear
    function. Also, it turns out the 2-level and 3-level paging definitions were
    identical, so move the common definition into pgtable.h

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • In situations where page table updates need only be made locally, and there is
    no cross-processor A/D bit races involved, we need not use the heavyweight
    xchg instruction to atomically fetch and clear page table entries. Instead,
    we can just read and clear them directly.

    This introduces a neat optimization for non-SMP kernels; drop the atomic xchg
    operations from page table updates.

    Thanks to Michel Lespinasse for noting this potential optimization.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • When exiting from an address space, no special hypervisor notification of page
    table updates needs to occur; direct page table hypervisors, such as Xen,
    switch to another address space first (init_mm) and unprotects the page tables
    to avoid the cost of trapping to the hypervisor for each pte_clear. Shadow
    mode hypervisors, such as VMI and lhype don't need to do the extra work of
    calling through paravirt-ops, and can just directly clear the page table
    entries without notifiying the hypervisor, since all the page tables are about
    to be freed.

    So introduce native_pte_clear functions which bypass any paravirt-ops
    notification. This results in a significant performance win for VMI and
    removes some indirect calls from zap_pte_range.

    Note the 3-level paging already had a native_pte_clear function, thus
    demanding argument conformance and extra args for the 2-level definition.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Zachary Amsden
     
  • apic_wait_icr_idle looks like this:

    static __inline__ void apic_wait_icr_idle(void)
    {
    while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
    cpu_relax();
    }

    The busy loop in this function would not be problematic if the
    corresponding status bit in the ICR were always updated, but that does
    not seem to be the case under certain crash scenarios. Kdump uses an IPI
    to stop the other CPUs in the event of a crash, but when any of the
    other CPUs are locked-up inside the NMI handler the CPU that sends the
    IPI will end up looping forever in the ICR check, effectively
    hard-locking the whole system.

    Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

    "A local APIC unit indicates successful dispatch of an IPI by
    resetting the Delivery Status bit in the Interrupt Command
    Register (ICR). The operating system polls the delivery status
    bit after sending an INIT or STARTUP IPI until the command has
    been dispatched.

    A period of 20 microseconds should be sufficient for IPI dispatch
    to complete under normal operating conditions. If the IPI is not
    successfully dispatched, the operating system can abort the
    command. Alternatively, the operating system can retry the IPI by
    writing the lower 32-bit double word of the ICR. This “time-out”
    mechanism can be implemented through an external interrupt, if
    interrupts are enabled on the processor, or through execution of
    an instruction or time-stamp counter spin loop."

    Intel's documentation suggests the implementation of a time-out
    mechanism, which, by the way, is already being open-coded in some parts
    of the kernel that tinker with ICR.

    Create a apic_wait_icr_idle replacement that implements the time-out
    mechanism and that can be used to solve the aforementioned problem.

    AK: moved both functions out of line
    AK: added improved loop from Keith Owens

    Signed-off-by: Fernando Luis Vazquez Cao
    Signed-off-by: Andi Kleen

    Fernando Luis VazquezCao
     
  • If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
    we are running on an AMD64/K8 system, the boot CPU must have had
    MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
    have copied these bits cleared), so we set them on this CPU as well.

    This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
    across the CPUs of SMP systems in order to fullfill the duty of
    system software to "initialize and maintain MTRR consistency
    across all processors." as written in the AMD and Intel manuals.

    If an WRMSR instruction fails because MtrrFixDramModEn is not
    set, I expect that also the Intel-style MTRR bits are not updated.

    AK: minor cleanup, moved MSR defines around

    Signed-off-by: Bernhard Kaindl
    Signed-off-by: Andi Kleen
    Cc: Andrew Morton
    Cc: Andi Kleen
    Cc: Dave Jones

    Bernhard Kaindl
     
  • Applied fix by Andew Morton:
    http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.

    AMD and Intel x86 CPU manuals state that it is the responsibility of
    system software to initialize and maintain MTRR consistency across
    all processors in Multi-Processing Environments.

    Quote from page 188 of the AMD64 System Programming manual (Volume 2):

    7.6.5 MTRRs in Multi-Processing Environments

    "In multi-processing environments, the MTRRs located in all processors must
    characterize memory in the same way. Generally, this means that identical
    values are written to the MTRRs used by the processors." (short omission here)
    "Failure to do so may result in coherency violations or loss of atomicity.
    Processor implementations do not check the MTRR settings in other processors
    to ensure consistency. It is the responsibility of system software to
    initialize and maintain MTRR consistency across all processors."

    Current Linux MTRR code already implements the above in the case that the
    BIOS does not properly initialize MTRRs on the secondary processors,
    but the case where the fixed-range MTRRs of the boot processor are changed
    after Linux started to boot, before the initialsation of a secondary
    processor, is not handled yet.

    In this case, secondary processors are currently initialized by Linux
    with MTRRs which the boot processor had very early, when mtrr_bp_init()
    did run, but not with the MTRRs which the boot processor uses at the
    time when that secondary processors is actually booted,
    causing differing MTRR contents on the secondary processors.

    Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
    BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
    of the boot processor when it transitions the system into ACPI mode.
    The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
    code runs acpi_enable().

    Other occasions where the SMI handler of the BIOS may change bits in
    the MTRRs could occur as well. To initialize newly booted secodary
    processors with the fixed-range MTRRs which the boot processor uses
    at that time, this patch saves the fixed-range MTRRs of the boot
    processor before new secondary processors are started. When the
    secondary processors run their Linux initialisation code, their
    fixed-range MTRRs will be updated with the saved fixed-range MTRRs.

    If CONFIG_MTRR is not set, we define mtrr_save_state
    as an empty statement because there is nothing to do.

    Possible TODOs:

    *) CPU-hotplugging outside of SMP suspend/resume is not yet tested
    with this patch.

    *) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
    then the calls to mtrr_save_state() could be replaced by calls to
    mtrr_save_fixed_ranges(NULL) and mtrr_save_state() would not be
    needed.

    That would need either verification of the CPU-hotplug code or
    at least a test on a >2 CPU machine.

    *) The MTRRs of other running processors are not yet checked at this
    time but it might be interesting to syncronize the MTTRs of all
    processors before booting. That would be an incremental patch,
    but of rather low priority since there is no machine known so
    far which would require this.

    AK: moved prototypes on x86-64 around to fix warnings

    Signed-off-by: Bernhard Kaindl
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Dave Jones

    Bernhard Kaindl
     
  • In this current implementation which is used in other patches,
    mtrr_save_fixed_ranges() accepts a dummy void pointer because
    in the current implementation of one of these patches, this
    function may be called from smp_call_function_single() which
    requires that this function takes a void pointer argument.

    This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
    which is the element of the static struct which stores our current
    backup of the fixed-range MTRR values which all CPUs shall be
    using.

    Because mtrr_save_fixed_ranges calls get_fixed_ranges after
    kernel initialisation time, __init needs to be removed from
    the declaration of get_fixed_ranges().

    If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
    as an empty statement because there is nothing to do.

    AK: Moved prototypes for x86-64 around to fix warnings

    Signed-off-by: Bernhard Kaindl
    Signed-off-by: Andi Kleen
    Cc: Andrew Morton
    Cc: Andi Kleen
    Cc: Dave Jones

    Bernhard Kaindl
     
  • The other symbols used to delineate the alt-instructions sections have the
    form __foo/__foo_end. Rename parainstructions to match.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton

    Jeremy Fitzhardinge
     
  • Convert VMI timer to use clock events, making it properly able to use the NO_HZ
    infrastructure. On UP systems, with no local APIC, we just continue to route
    these events through the PIT. On systems with a local APIC, or SMP, we provide
    a single source interrupt chip which creates the local timer IRQ. It actually
    gets delivered by the APIC hardware, but we don't want to use the same local
    APIC clocksource processing, so we create our own handler here.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    CC: Dan Hecht
    CC: Ingo Molnar
    CC: Thomas Gleixner

    Zachary Amsden
     
  • Remove spurious comments, headers and keywords from x86-64 bugs.[ch].

    Use identify_boot_cpu()

    AK: merged with other patch

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen

    Jeremy Fitzhardinge
     
  • Fixes two problems with the GDT when compiling for uniprocessor:
    - There's no percpu segment, so trying to load its selector into %fs fails.
    Use a null selector instead.
    - The real gdt needs to be loaded at some point. Do it in cpu_init().

    Signed-off-by: Chris Wright
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell

    Jeremy Fitzhardinge
     
  • Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
    asm-generic/percpu.h does for UP.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell
    Cc: Andi Kleen

    Jeremy Fitzhardinge
     
  • This patch does a few small cleanups:
    - use PER_CPU_NAME to generate the names of per-cpu variables
    - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
    affect condition flags
    - add PER_CPU_VAR which allows direct access to pre-cpu variables
    with the %fs: prefix on SMP.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell
    Cc: Andi Kleen

    Jeremy Fitzhardinge
     
  • Currently x86 (similar to x84-64) has a special per-cpu structure
    called "i386_pda" which can be easily and efficiently referenced via
    the %fs register. An ELF section is more flexible than a structure,
    allowing any piece of code to use this area. Indeed, such a section
    already exists: the per-cpu area.

    So this patch:
    (1) Removes the PDA and uses per-cpu variables for each current member.
    (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
    (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
    (4) Simplifies startup, because %fs doesn't need to be loaded with a
    special segment at early boot; it can be deferred until the first
    percpu area is allocated (or never for UP).

    The result is less code and one less x86-specific concept.

    Signed-off-by: Rusty Russell
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen

    Jeremy Fitzhardinge
     
  • Xen wants a dedicated page for the GDT. I believe VMI likes it too.
    lguest, KVM and native don't care.

    Simple transformation to page-aligned "struct gdt_page".

    Signed-off-by: Rusty Russell
    Signed-off-by: Andi Kleen
    Acked-by: Jeremy Fitzhardinge

    Jeremy Fitzhardinge
     
  • In shadow mode hypervisors, ptep_get_and_clear achieves the desired
    purpose of keeping the shadows in sync by issuing a native_get_and_clear,
    followed by a call to pte_update, which indicates the PTE has been
    modified.

    Direct mode hypervisors (Xen) have no need for this anyway, and will trap
    the update using writable pagetables.

    This means no hypervisor makes use of ptep_get_and_clear; there is no
    reason to have it in the paravirt-ops structure. Change confusing
    terminology about raw vs. native functions into consistent use of
    native_pte_xxx for operations which do not invoke paravirt-ops.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen

    Jeremy Fitzhardinge
     
  • Replace all the open-coded macros for generating calls with a pair of
    more general macros (__PVOP_CALL/VCALL), and redefine all the
    PVOP_V?CALL[0-4] in terms of them.

    [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
    (paravirt_ops-document-asm-i386-paravirth.patch) ]

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen

    Jeremy Fitzhardinge
     
  • Xen and VMI both have special requirements when mapping a highmem pte
    page into the kernel address space. These can be dealt with by adding
    a new kmap_atomic_pte() function for mapping highptes, and hooking it
    into the paravirt_ops infrastructure.

    Xen specifically wants to map the pte page RO, so this patch exposes a
    helper function, kmap_atomic_prot, which maps the page with the
    specified page protections.

    This also adds a kmap_flush_unused() function to clear out the cached
    kmap mappings. Xen needs this to clear out any potential stray RW
    mappings of pages which will become part of a pagetable.

    [ Zach - vmi.c will need some attention after this patch. It wasn't
    immediately obvious to me what needs to be done. ]

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Zachary Amsden

    Jeremy Fitzhardinge
     
  • Back out the map_pt_hook to clear the way for kmap_atomic_pte.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Zachary Amsden

    Jeremy Fitzhardinge
     
  • This patch adds a pv_op for flush_tlb_others. Linux running on native
    hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
    have a particular mm's pagetable entries cached in its TLB. This is
    inefficient in a paravirtualized environment, since the hypervisor
    knows which real CPUs actually contain cached mappings, which may be a
    small subset of a guest's VCPUs.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen

    Jeremy Fitzhardinge
     
  • Implement the actual patching machinery. paravirt_patch_default()
    contains the logic to automatically patch a callsite based on a few
    simple rules:

    - if the paravirt_op function is paravirt_nop, then patch nops
    - if the paravirt_op function is a jmp target, then jmp to it
    - if the paravirt_op function is callable and doesn't clobber too much
    for the callsite, call it directly

    paravirt_patch_default is suitable as a default implementation of
    paravirt_ops.patch, will remove most of the expensive indirect calls
    in favour of either a direct call or a pile of nops.

    Backends may implement their own patcher, however. There are several
    helper functions to help with this:

    paravirt_patch_nop nop out a callsite
    paravirt_patch_ignore leave the callsite as-is
    paravirt_patch_call patch a call if the caller and callee
    have compatible clobbers
    paravirt_patch_jmp patch in a jmp
    paravirt_patch_insns patch some literal instructions over
    the callsite, if they fit

    This patch also implements more direct patches for the native case, so
    that when running on native hardware many common operations are
    implemented inline.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell
    Cc: Zachary Amsden
    Cc: Anthony Liguori
    Acked-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Clean things up, and broadly document:
    - the paravirt_ops functions themselves
    - the patching mechanism

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell

    Jeremy Fitzhardinge
     
  • Wrap a set of interesting paravirt_ops calls in a wrapper which makes
    the callsites available for patching. Unfortunately this is pretty
    ugly because there's no way to get gcc to generate a function call,
    but also wrap just the callsite itself with the necessary labels.

    This patch supports functions with 0-4 arguments, and either void or
    returning a value. 64-bit arguments must be split into a pair of
    32-bit arguments (lower word first). Small structures are returned in
    registers.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andi Kleen
    Cc: Rusty Russell
    Cc: Zachary Amsden
    Cc: Anthony Liguori

    Jeremy Fitzhardinge