05 Sep, 2013

1 commit


14 Aug, 2013

1 commit

  • THe L_PTE_USER actually has nothing to do with stage 2 mappings and the
    L_PTE_S2_RDWR value sets the readable bit, which was what L_PTE_USER
    was used for before proper handling of stage 2 memory defines.

    Changelog:
    [v3]: Drop call to kvm_set_s2pte_writable in mmu.c
    [v2]: Change default mappings to be r/w instead of r/o, as per Marc
    Zyngier's suggestion.

    Cc: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Russell King

    Christoffer Dall
     

08 Aug, 2013

2 commits

  • When using 64kB pages, we only have two levels of page tables,
    meaning that PGD, PUD and PMD are fused. In this case, trying
    to refcount PUDs and PMDs independently is a a complete disaster,
    as they are the same.

    We manage to get it right for the allocation (stage2_set_pte uses
    {pmd,pud}_none), but the unmapping path clears both pud and pmd
    refcounts, which fails spectacularly with 2-level page tables.

    The fix is to avoid calling clear_pud_entry when both the pmd and
    pud pages are empty. For this, and instead of introducing another
    pud_empty function, consolidate both pte_empty and pmd_empty into
    page_empty (the code is actually identical) and use that to also
    test the validity of the pud.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • The unmap_range function did not properly cover the case when the start
    address was not aligned to PMD_SIZE or PUD_SIZE and an entire pte table
    or pmd table was cleared, causing us to leak memory when incrementing
    the addr.

    The fix is to always move onto the next page table entry boundary
    instead of adding the full size of the VA range covered by the
    corresponding table level entry.

    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

04 Jul, 2013

1 commit

  • Pull KVM fixes from Paolo Bonzini:
    "On the x86 side, there are some optimizations and documentation
    updates. The big ARM/KVM change for 3.11, support for AArch64, will
    come through Catalin Marinas's tree. s390 and PPC have misc cleanups
    and bugfixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
    KVM: PPC: Ignore PIR writes
    KVM: PPC: Book3S PR: Invalidate SLB entries properly
    KVM: PPC: Book3S PR: Allow guest to use 1TB segments
    KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
    KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
    KVM: PPC: Book3S PR: Fix proto-VSID calculations
    KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
    KVM: Fix RTC interrupt coalescing tracking
    kvm: Add a tracepoint write_tsc_offset
    KVM: MMU: Inform users of mmio generation wraparound
    KVM: MMU: document fast invalidate all mmio sptes
    KVM: MMU: document fast invalidate all pages
    KVM: MMU: document fast page fault
    KVM: MMU: document mmio page fault
    KVM: MMU: document write_flooding_count
    KVM: MMU: document clear_spte_count
    KVM: MMU: drop kvm_mmu_zap_mmio_sptes
    KVM: MMU: init kvm generation close to mmio wrap-around value
    KVM: MMU: add tracepoint for check_mmio_spte
    KVM: MMU: fast invalidate all mmio sptes
    ...

    Linus Torvalds
     

27 Jun, 2013

1 commit

  • S2_PGD_SIZE defines the number of pages used by a stage-2 PGD
    and is unused, except for a VM_BUG_ON check that missuses the
    define.

    As the check is very unlikely to ever triggered except in
    circumstances where KVM is the least of our worries, just kill
    both the define and the VM_BUG_ON check.

    Acked-by: Catalin Marinas
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     

03 Jun, 2013

1 commit

  • The KVM/ARM MMU code doesn't take care of invalidating TLBs before
    freeing a {pte,pmd} table. This could cause problems if the page
    is reallocated and then speculated into by another CPU.

    Reported-by: Catalin Marinas
    Signed-off-by: Marc Zyngier
    Acked-by: Catalin Marinas
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     

29 Apr, 2013

6 commits

  • Now that we have the necessary infrastructure to boot a hotplugged CPU
    at any point in time, wire a CPU notifier that will perform the HYP
    init for the incoming CPU.

    Note that this depends on the platform code and/or firmware to boot the
    incoming CPU with HYP mode enabled and return to the kernel by following
    the normal boot path (HYP stub installed).

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Our HYP init code suffers from two major design issues:
    - it cannot support CPU hotplug, as we tear down the idmap very early
    - it cannot perform a TLB invalidation when switching from init to
    runtime mappings, as pages are manipulated from PL1 exclusively

    The hotplug problem mandates that we keep two sets of page tables
    (boot and runtime). The TLB problem mandates that we're able to
    transition from one PGD to another while in HYP, invalidating the TLBs
    in the process.

    To be able to do this, we need to share a page between the two page
    tables. A page that will have the same VA in both configurations. All we
    need is a VA that has the following properties:
    - This VA can't be used to represent a kernel mapping.
    - This VA will not conflict with the physical address of the kernel text

    The vectors page seems to satisfy this requirement:
    - The kernel never maps anything else there
    - The kernel text being copied at the beginning of the physical memory,
    it is unlikely to use the last 64kB (I doubt we'll ever support KVM
    on a system with something like 4MB of RAM, but patches are very
    welcome).

    Let's call this VA the trampoline VA.

    Now, we map our init page at 3 locations:
    - idmap in the boot pgd
    - trampoline VA in the boot pgd
    - trampoline VA in the runtime pgd

    The init scenario is now the following:
    - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
    runtime stack, runtime vectors
    - Enable the MMU with the boot pgd
    - Jump to a target into the trampoline page (remember, this is the same
    physical page!)
    - Now switch to the runtime pgd (same VA, and still the same physical
    page!)
    - Invalidate TLBs
    - Set stack and vectors
    - Profit! (or eret, if you only care about the code).

    Note that we keep the boot mapping permanently (it is not strictly an
    idmap anymore) to allow for CPU hotplug in later patches.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • There is no point in freeing HYP page tables differently from Stage-2.
    They now have the same requirements, and should be dealt with the same way.

    Promote unmap_stage2_range to be The One True Way, and get rid of a number
    of nasty bugs in the process (good thing we never actually called free_hyp_pmds
    before...).

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • After the HYP page table rework, it is pretty easy to let the KVM
    code provide its own idmap, rather than expecting the kernel to
    provide it. It takes actually less code to do so.

    Acked-by: Will Deacon
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • The current code for creating HYP mapping doesn't like to wrap
    around zero, which prevents from mapping anything into the last
    page of the virtual address space.

    It doesn't take much effort to remove this limitation, making
    the code more consistent with the rest of the kernel in the process.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • The way we populate HYP mappings is a bit convoluted, to say the least.
    Passing a pointer around to keep track of the current PFN is quite
    odd, and we end-up having two different PTE accessors for no good
    reason.

    Simplify the whole thing by unifying the two PTE accessors, passing
    a pgprot_t around, and moving the various validity checks to the
    upper layers.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     

07 Mar, 2013

11 commits


25 Feb, 2013

1 commit


24 Jan, 2013

5 commits

  • When the guest accesses I/O memory this will create data abort
    exceptions and they are handled by decoding the HSR information
    (physical address, read/write, length, register) and forwarding reads
    and writes to QEMU which performs the device emulation.

    Certain classes of load/store operations do not support the syndrome
    information provided in the HSR. We don't support decoding these (patches
    are available elsewhere), so we report an error to user space in this case.

    This requires changing the general flow somewhat since new calls to run
    the VCPU must check if there's a pending MMIO load and perform the write
    after userspace has made the data available.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rusty Russell
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Handles the guest faults in KVM by mapping in corresponding user pages
    in the 2nd stage page tables.

    We invalidate the instruction cache by MVA whenever we map a page to the
    guest (no, we cannot only do it when we have an iabt because the guest
    may happily read/write a page before hitting the icache) if the hardware
    uses VIPT or PIPT. In the latter case, we can invalidate only that
    physical page. In the first case, all bets are off and we simply must
    invalidate the whole affair. Not that VIVT icaches are tagged with
    vmids, and we are out of the woods on that one. Alexander Graf was nice
    enough to remind us of this massive pain.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • This commit introduces the framework for guest memory management
    through the use of 2nd stage translation. Each VM has a pointer
    to a level-1 table (the pgd field in struct kvm_arch) which is
    used for the 2nd stage translations. Entries are added when handling
    guest faults (later patch) and the table itself can be allocated and
    freed through the following functions implemented in
    arch/arm/kvm/arm_mmu.c:
    - kvm_alloc_stage2_pgd(struct kvm *kvm);
    - kvm_free_stage2_pgd(struct kvm *kvm);

    Each entry in TLBs and caches are tagged with a VMID identifier in
    addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
    order that VMs are executed, and caches and tlbs are invalidated when
    the VMID space has been used to allow for more than 255 simultaenously
    running guests.

    The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
    freed in kvm_arch_destroy_vm(). Both functions are called from the main
    KVM code.

    We pre-allocate page table memory to be able to synchronize using a
    spinlock and be called under rcu_read_lock from the MMU notifiers. We
    steal the mmu_memory_cache implementation from x86 and adapt for our
    specific usage.

    We support MMU notifiers (thanks to Marc Zyngier) through
    kvm_unmap_hva and kvm_set_spte_hva.

    Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
    which is used by VGIC support to map the virtual CPU interface registers
    to the guest. This support is added by Marc Zyngier.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Sets up KVM code to handle all exceptions taken to Hyp mode.

    When the kernel is booted in Hyp mode, calling an hvc instruction with r0
    pointing to the new vectors, the HVBAR is changed to the the vector pointers.
    This allows subsystems (like KVM here) to execute code in Hyp-mode with the
    MMU disabled.

    We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
    the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
    point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
    to perform a world-switch into a KVM guest.

    Also provides memory mapping code to map required code pages, data structures,
    and I/O regions accessed in Hyp mode at the same virtual address as the host
    kernel virtual addresses, but which conforms to the architectural requirements
    for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
    and comprises:
    - create_hyp_mappings(from, to);
    - create_hyp_io_mappings(from, to, phys_addr);
    - free_hyp_pmds();

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Targets KVM support for Cortex A-15 processors.

    Contains all the framework components, make files, header files, some
    tracing functionality, and basic user space API.

    Only supported core is Cortex-A15 for now.

    Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

    Reviewed-by: Will Deacon
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Rusty Russell
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall