06 May, 2013

1 commit

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     

03 May, 2013

1 commit

  • * 'kvm-arm-for-3.10' of git://github.com/columbia/linux-kvm-arm:
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ARM: KVM: fix HYP mapping limitations around zero
    ARM: KVM: simplify HYP mapping population
    ARM: KVM: arch_timer: use symbolic constants
    ARM: KVM: add support for minimal host vs guest profiling

    Marcelo Tosatti
     

02 May, 2013

1 commit

  • This adds the API for userspace to instantiate an XICS device in a VM
    and connect VCPUs to it. The API consists of a new device type for
    the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
    functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
    which is used to assert and deassert interrupt inputs of the XICS.

    The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
    Each attribute within this group corresponds to the state of one
    interrupt source. The attribute number is the same as the interrupt
    source number.

    This does not support irq routing or irqfd yet.

    Signed-off-by: Paul Mackerras
    Acked-by: David Gibson
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

30 Apr, 2013

1 commit


29 Apr, 2013

1 commit


27 Apr, 2013

9 commits

  • This adds the ability for userspace to save and restore the state
    of the XICS interrupt presentation controllers (ICPs) via the
    KVM_GET/SET_ONE_REG interface. Since there is one ICP per vcpu, we
    simply define a new 64-bit register in the ONE_REG space for the ICP
    state. The state includes the CPU priority setting, the pending IPI
    priority, and the priority and source number of any pending external
    interrupt.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Alexander Graf

    Paul Mackerras
     
  • For pseries machine emulation, in order to move the interrupt
    controller code to the kernel, we need to intercept some RTAS
    calls in the kernel itself. This adds an infrastructure to allow
    in-kernel handlers to be registered for RTAS services by name.
    A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
    associate token values with those service names. Then, when the
    guest requests an RTAS service with one of those token values, it
    will be handled by the relevant in-kernel handler rather than being
    passed up to userspace as at present.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Paul Mackerras
    [agraf: fix warning]
    Signed-off-by: Alexander Graf

    Michael Ellerman
     
  • Now that all the irq routing and irqfd pieces are generic, we can expose
    real irqchip support to all of KVM's internal helpers.

    This allows us to use irqfd with the in-kernel MPIC.

    Signed-off-by: Alexander Graf

    Alexander Graf
     
  • Enabling this capability connects the vcpu to the designated in-kernel
    MPIC. Using explicit connections between vcpus and irqchips allows
    for flexibility, but the main benefit at the moment is that it
    simplifies the code -- KVM doesn't need vm-global state to remember
    which MPIC object is associated with this vm, and it doesn't need to
    care about ordering between irqchip creation and vcpu creation.

    Signed-off-by: Scott Wood
    [agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Hook the MPIC code up to the KVM interfaces, add locking, etc.

    Signed-off-by: Scott Wood
    [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently, devices that are emulated inside KVM are configured in a
    hardcoded manner based on an assumption that any given architecture
    only has one way to do it. If there's any need to access device state,
    it is done through inflexible one-purpose-only IOCTLs (e.g.
    KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
    cumbersome and depletes a limited numberspace.

    This API provides a mechanism to instantiate a device of a certain
    type, returning an ID that can be used to set/get attributes of the
    device. Attributes may include configuration parameters (e.g.
    register base address), device state, operational commands, etc. It
    is similar to the ONE_REG API, except that it acts on devices rather
    than vcpus.

    Both device types and individual attributes can be tested without having
    to create the device or get/set the attribute, without the need for
    separately managing enumerated capabilities.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
    in the presence of MAV 2.0. Emulate it now.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • Add support for TLBnPS registers available in MMU Architecture Version
    (MAV) 2.0.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • MMU registers were exposed to user-space using sregs interface. Add them
    to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
    mechanism.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Alexander Graf

    Mihai Caraman
     

22 Mar, 2013

1 commit

  • If userspace wants to change some specific bits of TSR
    (timer status register) then it uses GET/SET_SREGS ioctl interface.
    So the steps will be:
    i) user-space will make get ioctl,
    ii) change TSR in userspace
    iii) then make set ioctl.
    It can happen that TSR gets changed by kernel after step i) and
    before step iii).

    To avoid this we have added below one_reg ioctls for oring and clearing
    specific bits in TSR. This patch adds one registerface for:
    1) setting specific bit in TSR (timer status register)
    2) clearing specific bit in TSR (timer status register)
    3) setting/getting the TCR register. There are cases where we want to only
    change TCR and not TSR. Although we can uses SREGS without
    KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I am open
    if someone feels we should use SREGS only here.
    4) getting/setting TSR register

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     

12 Mar, 2013

1 commit


06 Mar, 2013

1 commit


25 Feb, 2013

1 commit

  • Pull KVM updates from Marcelo Tosatti:
    "KVM updates for the 3.9 merge window, including x86 real mode
    emulation fixes, stronger memory slot interface restrictions, mmu_lock
    spinlock hold time reduction, improved handling of large page faults
    on shadow, initial APICv HW acceleration support, s390 channel IO
    based virtio, amongst others"

    * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
    Revert "KVM: MMU: lazily drop large spte"
    x86: pvclock kvm: align allocation size to page size
    KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
    x86 emulator: fix parity calculation for AAD instruction
    KVM: PPC: BookE: Handle alignment interrupts
    booke: Added DBCR4 SPR number
    KVM: PPC: booke: Allow multiple exception types
    KVM: PPC: booke: use vcpu reference from thread_struct
    KVM: Remove user_alloc from struct kvm_memory_slot
    KVM: VMX: disable apicv by default
    KVM: s390: Fix handling of iscs.
    KVM: MMU: cleanup __direct_map
    KVM: MMU: remove pt_access in mmu_set_spte
    KVM: MMU: cleanup mapping-level
    KVM: MMU: lazily drop large spte
    KVM: VMX: cleanup vmx_set_cr0().
    KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
    KVM: VMX: disable SMEP feature when guest is in non-paging mode
    KVM: Remove duplicate text in api.txt
    Revert "KVM: MMU: split kvm_mmu_free_page"
    ...

    Linus Torvalds
     

12 Feb, 2013

2 commits

  • User space defines the model to emulate to a guest and should therefore
    decide which addresses are used for both the virtual CPU interface
    directly mapped in the guest physical address space and for the emulated
    distributor interface, which is mapped in software by the in-kernel VGIC
    support.

    Reviewed-by: Will Deacon
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     
  • On ARM some bits are specific to the model being emulated for the guest and
    user space needs a way to tell the kernel about those bits. An example is mmio
    device base addresses, where KVM must know the base address for a given device
    to properly emulate mmio accesses within a certain address range or directly
    map a device with virtualiation extensions into the guest address space.

    We make this API ARM-specific as we haven't yet reached a consensus for a
    generic API for all KVM architectures that will allow us to do something like
    this.

    Reviewed-by: Will Deacon
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

06 Feb, 2013

1 commit


05 Feb, 2013

1 commit

  • As Xiao pointed out, there are a few problems with it:
    - kvm_arch_commit_memory_region() write protects the memory slot only
    for GET_DIRTY_LOG when modifying the flags.
    - FNAME(sync_page) uses the old spte value to set a new one without
    checking KVM_MEM_READONLY flag.

    Since we flush all shadow pages when creating a new slot, the simplest
    fix is to disallow such problematic flag changes: this is safe because
    no one is doing such things.

    Reviewed-by: Gleb Natapov
    Signed-off-by: Takuya Yoshikawa
    Cc: Xiao Guangrong
    Cc: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

24 Jan, 2013

6 commits

  • Implement the PSCI specification (ARM DEN 0022A) to control
    virtual CPUs being "powered" on or off.

    PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.

    A virtual CPU can now be initialized in a "powered off" state,
    using the KVM_ARM_VCPU_POWER_OFF feature flag.

    The guest can use either SMC or HVC to execute a PSCI function.

    Reviewed-by: Will Deacon
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • We use space #18 for floating point regs.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rusty Russell
    Signed-off-by: Christoffer Dall

    Rusty Russell
     
  • The Cache Size Selection Register (CSSELR) selects the current Cache
    Size ID Register (CCSIDR). You write which cache you are interested
    in to CSSELR, and read the information out of CCSIDR.

    Which cache numbers are valid is known by reading the Cache Level ID
    Register (CLIDR).

    To export this state to userspace, we add a KVM_REG_ARM_DEMUX
    numberspace (17), which uses 8 bits to represent which register is
    being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
    this demultiplexing (in our case, the CSSELR value, which is 4 bits).

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rusty Russell
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • The following three ioctls are implemented:
    - KVM_GET_REG_LIST
    - KVM_GET_ONE_REG
    - KVM_SET_ONE_REG

    Now we have a table for all the cp15 registers, we can drive a generic
    API.

    The register IDs carry the following encoding:

    ARM registers are mapped using the lower 32 bits. The upper 16 of that
    is the register group type, or coprocessor number:

    ARM 32-bit CP15 registers have the following id bit patterns:
    0x4002 0000 000F

    ARM 64-bit CP15 registers have the following id bit patterns:
    0x4003 0000 000F

    For futureproofing, we need to tell QEMU about the CP15 registers the
    host lets the guest access.

    It will need this information to restore a current guest on a future
    CPU or perhaps a future KVM which allow some of these to be changed.

    We use a separate table for these, as they're only for the userspace API.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rusty Russell
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE. This
    works semantically well for the GIC as we in fact raise/lower a line on
    a machine component (the gic). The IOCTL uses the follwing struct.

    struct kvm_irq_level {
    union {
    __u32 irq; /* GSI */
    __s32 status; /* not used for KVM_IRQ_LEVEL */
    };
    __u32 level; /* 0 or 1 */
    };

    ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
    (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
    specific cpus. The irq field is interpreted like this:

     bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
    field: | irq_type | vcpu_index | irq_number |

    The irq_type field has the following values:
    - irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
    - irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
    (the vcpu_index field is ignored)
    - irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

    The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

    This is documented in Documentation/kvm/api.txt.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Targets KVM support for Cortex A-15 processors.

    Contains all the framework components, make files, header files, some
    tracing functionality, and basic user space API.

    Only supported core is Cortex-A15 for now.

    Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rusty Russell
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

14 Jan, 2013

1 commit


10 Jan, 2013

3 commits

  • We need to be able to read and write the contents of the EPR register
    from user space.

    This patch implements that logic through the ONE_REG API and declares
    its (never implemented) SREGS counterpart as deprecated.

    Signed-off-by: Alexander Graf

    Alexander Graf
     
  • The External Proxy Facility in FSL BookE chips allows the interrupt
    controller to automatically acknowledge an interrupt as soon as a
    core gets its pending external interrupt delivered.

    Today, user space implements the interrupt controller, so we need to
    check on it during such a cycle.

    This patch implements logic for user space to enable EPR exiting,
    disable EPR exiting and EPR exiting itself, so that user space can
    acknowledge an interrupt when an external interrupt has successfully
    been delivered into the guest vcpu.

    Signed-off-by: Alexander Graf

    Alexander Graf
     
  • Reflect the uapi folder change in SREGS API documentation.

    Signed-off-by: Mihai Caraman
    Reviewed-by: Amos Kong
    Signed-off-by: Alexander Graf

    Mihai Caraman
     

08 Jan, 2013

4 commits

  • Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
    intercepts for channel I/O instructions to userspace. Only I/O
    instructions interacting with I/O interrupts need to be handled
    in-kernel:

    - TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
    interrupts entirely in-kernel.
    - TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
    and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
    related processing.

    Reviewed-by: Marcelo Tosatti
    Reviewed-by: Alexander Graf
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Make s390 support KVM_ENABLE_CAP.

    Reviewed-by: Marcelo Tosatti
    Acked-by: Alexander Graf
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Add support for injecting machine checks (only repressible
    conditions for now).

    This is a bit more involved than I/O interrupts, for these reasons:

    - Machine checks come in both floating and cpu varieties.
    - We don't have a bit for machine checks enabling, but have to use
    a roundabout approach with trapping PSW changing instructions and
    watching for opened machine checks.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Add support for handling I/O interrupts (standard, subchannel-related
    ones and rudimentary adapter interrupts).

    The subchannel-identifying parameters are encoded into the interrupt
    type.

    I/O interrupts are floating, so they can't be injected on a specific
    vcpu.

    Reviewed-by: Alexander Graf
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     

06 Dec, 2012

2 commits

  • Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to
    the list of ONE_REG PPC supported registers.

    Signed-off-by: Mihai Caraman
    [agraf: remove HV dependency, use get/put_user]
    Signed-off-by: Alexander Graf

    Mihai Caraman
     
  • A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
    this fd return the contents of the HPT (hashed page table), writes
    create and/or remove entries in the HPT. There is a new capability,
    KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
    takes an argument structure with the index of the first HPT entry to
    read out and a set of flags. The flags indicate whether the user is
    intending to read or write the HPT, and whether to return all entries
    or only the "bolted" entries (those with the bolted bit, 0x10, set in
    the first doubleword).

    This is intended for use in implementing qemu's savevm/loadvm and for
    live migration. Therefore, on reads, the first pass returns information
    about all HPTEs (or all bolted HPTEs). When the first pass reaches the
    end of the HPT, it returns from the read. Subsequent reads only return
    information about HPTEs that have changed since they were last read.
    A read that finds no changed HPTEs in the HPT following where the last
    read finished will return 0 bytes.

    The format of the data provides a simple run-length compression of the
    invalid entries. Each block of data starts with a header that indicates
    the index (position in the HPT, which is just an array), the number of
    valid entries starting at that index (may be zero), and the number of
    invalid entries following those valid entries. The valid entries, 16
    bytes each, follow the header. The invalid entries are not explicitly
    represented.

    Signed-off-by: Paul Mackerras
    [agraf: fix documentation]
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

31 Oct, 2012

1 commit


30 Oct, 2012

1 commit