18 May, 2017

2 commits

  • We were not holding the kvm->slots_lock as required when calling
    kvm_io_bus_unregister_dev() as required.

    This only affects the error path, but still, let's do our due
    diligence.

    Reported by: Eric Auger
    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • If userspace creates the VCPUs after initializing the VGIC, then we end
    up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
    it is called prior to adding the VCPU into the vcpus array on the VM.

    There is no tight coupling between the VCPU index and the area of the
    redistributor region used for the VCPU, so we can simply ensure that all
    creations of redistributors are serialized per VM, and increment an
    offset when we successfully add a redistributor.

    The vgic_register_redist_iodev() function can be called from two paths:
    vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
    device attribute handler. This patch already holds the kvm->lock mutex.

    The other path is via kvm_vgic_vcpu_init, which is called through a
    longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
    kvm->lock mutex just before calling kvm_arch_vcpu_create(), so we can
    simply take this mutex again later for our purposes.

    Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
    Signed-off-by: Christoffer Dall
    Tested-by: Jean-Philippe Brucker
    Reviewed-by: Eric Auger

    Christoffer Dall
     

16 May, 2017

2 commits

  • We yield the kvm->mmu_lock occassionaly while performing an operation
    (e.g, unmap or permission changes) on a large area of stage2 mappings.
    However this could possibly cause another thread to clear and free up
    the stage2 page tables while we were waiting for regaining the lock and
    thus the original thread could end up in accessing memory that was
    freed. This patch fixes the problem by making sure that the stage2
    pagetable is still valid after we regain the lock. The fact that
    mmu_notifer->release() could be called twice (via __mmu_notifier_release
    and mmu_notifier_unregsister) enhances the possibility of hitting
    this race where there are two threads trying to unmap the entire guest
    shadow pages.

    While at it, cleanup the redudant checks around cond_resched_lock in
    stage2_wp_range(), as cond_resched_lock already does the same checks.

    Cc: Mark Rutland
    Cc: Radim Krčmář
    Cc: andreyknvl@google.com
    Cc: Paolo Bonzini
    Cc: stable@vger.kernel.org
    Acked-by: Marc Zyngier
    Signed-off-by: Suzuki K Poulose
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Suzuki K Poulose
     
  • Make sure we don't use a cached value of the KVM stage2 PGD while
    resetting the PGD.

    Cc: Marc Zyngier
    Cc: stable@vger.kernel.org
    Signed-off-by: Suzuki K Poulose
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Suzuki K Poulose
     

15 May, 2017

4 commits

  • In kvm_free_stage2_pgd() we check the stage2 PGD before holding
    the lock and proceed to take the lock if it is valid. And we unmap
    the page tables, followed by releasing the lock. We reset the PGD
    only after dropping this lock, which could cause a race condition
    where another thread waiting on or even holding the lock, could
    potentially see that the PGD is still valid and proceed to perform
    a stage2 operation and later encounter a NULL PGD.

    [223090.242280] Unable to handle kernel NULL pointer dereference at
    virtual address 00000040
    [223090.262330] PC is at unmap_stage2_range+0x8c/0x428
    [223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
    [223090.262531] Call trace:
    [223090.262533] [] unmap_stage2_range+0x8c/0x428
    [223090.262535] [] kvm_unmap_hva_handler+0x2c/0x3c
    [223090.262537] [] handle_hva_to_gpa+0xb0/0x104
    [223090.262539] [] kvm_unmap_hva+0x5c/0xbc
    [223090.262543] []
    kvm_mmu_notifier_invalidate_page+0x50/0x8c
    [223090.262547] []
    __mmu_notifier_invalidate_page+0x5c/0x84
    [223090.262551] [] try_to_unmap_one+0x1d0/0x4a0
    [223090.262553] [] rmap_walk+0x1cc/0x2e0
    [223090.262555] [] try_to_unmap+0x74/0xa4
    [223090.262557] [] migrate_pages+0x31c/0x5ac
    [223090.262561] [] compact_zone+0x3fc/0x7ac
    [223090.262563] [] compact_zone_order+0x94/0xb0
    [223090.262564] [] try_to_compact_pages+0x108/0x290
    [223090.262569] [] __alloc_pages_direct_compact+0x70/0x1ac
    [223090.262571] [] __alloc_pages_nodemask+0x434/0x9f4
    [223090.262572] [] alloc_pages_vma+0x230/0x254
    [223090.262574] [] do_huge_pmd_anonymous_page+0x114/0x538
    [223090.262576] [] handle_mm_fault+0xd40/0x17a4
    [223090.262577] [] __get_user_pages+0x12c/0x36c
    [223090.262578] [] get_user_pages_unlocked+0xa4/0x1b8
    [223090.262579] [] __gfn_to_pfn_memslot+0x280/0x31c
    [223090.262580] [] gfn_to_pfn_prot+0x4c/0x5c
    [223090.262582] [] kvm_handle_guest_abort+0x240/0x774
    [223090.262584] [] handle_exit+0x11c/0x1ac
    [223090.262586] [] kvm_arch_vcpu_ioctl_run+0x31c/0x648
    [223090.262587] [] kvm_vcpu_ioctl+0x378/0x768
    [223090.262590] [] do_vfs_ioctl+0x324/0x5a4
    [223090.262591] [] SyS_ioctl+0x90/0xa4
    [223090.262595] [] el0_svc_naked+0x38/0x3c

    This patch moves the stage2 PGD manipulation under the lock.

    Reported-by: Alexander Graf
    Cc: Mark Rutland
    Cc: Marc Zyngier
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Christoffer Dall

    Suzuki K Poulose
     
  • The GICv3 documentation is extremely confusing, as it talks about
    the number of priorities represented by the ICH_APxRn_EL2 registers,
    while it should really talk about the number of preemption levels.

    This leads to a bug where we may access undefined ICH_APxRn_EL2
    registers, since PREbits is allowed to be smaller than PRIbits.
    Thankfully, nobody seem to have taken this path so far...

    The fix is to use ICH_VTR_EL2.PREbits instead.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • When an interrupt is injected with the HW bit set (indicating that
    deactivation should be propagated to the physical distributor),
    special care must be taken so that we never mark the corresponding
    LR with the Active+Pending state (as the pending state is kept in
    the physycal distributor).

    Cc: stable@vger.kernel.org
    Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • When an interrupt is injected with the HW bit set (indicating that
    deactivation should be propagated to the physical distributor),
    special care must be taken so that we never mark the corresponding
    LR with the Active+Pending state (as the pending state is kept in
    the physycal distributor).

    Cc: stable@vger.kernel.org
    Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     

09 May, 2017

15 commits

  • …l/git/kvmarm/kvmarm into HEAD

    Second round of KVM/ARM Changes for v4.12.

    Changes include:
    - A fix related to the 32-bit idmap stub
    - A fix to the bitmask used to deode the operands of an AArch32 CP
    instruction
    - We have moved the files shared between arch/arm/kvm and
    arch/arm64/kvm to virt/kvm/arm
    - We add support for saving/restoring the virtual ITS state to
    userspace

    Paolo Bonzini
     
  • When failing to restore the ITT for a DTE, we should remove the failed
    device entry from the list and free the object.

    We slightly refactor vgic_its_destroy to be able to reuse the now
    separate vgic_its_free_dte() function.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • The only reason we called kvm_vgic_map_resources() when restoring the
    ITS tables was because we wanted to have the KVM iodevs registered in
    the KVM IO bus framework at the time when the ITS was restored such that
    a restored and active device can inject MSIs prior to otherwise calling
    kvm_vgic_map_resources() from the first run of a VCPU.

    Since we now register the KVM iodevs for the redestributors and ITS as
    soon as possible (when setting the base addresses), we no longer need
    this call and kvm_vgic_map_resources() is again called only when first
    running a VCPU.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • We have to register the ITS iodevice before running the VM, because in
    migration scenarios, we may be restoring a live device that wishes to
    inject MSIs before the VCPUs have started.

    All we need to register the ITS io device is the base address of the
    ITS, so we can simply register that when the base address of the ITS is
    set.

    [ Code to fix concurrency issues when setting the ITS base address and
    to fix the undef base address check written by Marc Zyngier ]

    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • The its->initialized doesn't bring much to the table, and creates
    unnecessary ordering between setting the address and initializing it
    (which amounts to exactly nothing).

    Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
    it deserves to be.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Marc Zyngier
     
  • Instead of waiting with registering KVM iodevs until the first VCPU is
    run, we can actually create the iodevs when the redist base address is
    set. The only downside is that we must now also check if we need to do
    this for VCPUs which are created after creating the VGIC, because there
    is no enforced ordering between creating the VGIC (and setting its base
    addresses) and creating the VCPUs.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • As we are about to handle setting the address for the redistributor base
    region separately from some of the other base addresses, let's rework
    this function to leave a little more room for being flexible in what
    each type of base address does.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • As we are about to fiddle with the IO device registration mechanism,
    let's be a little more careful when setting base addresses as early as
    possible. When setting a base address, we can check that there's
    address space enough for its scope and when the last of the two
    base addresses (dist and redist) get set, we can also check if the
    regions overlap at that time.

    This allows us to provide error messages to the user at time when trying
    to set the base address, as opposed to later when trying to run the VM.

    To do this, we make vgic_v3_check_base available in the core vgic-v3
    code as well as in the other parts of the GICv3 code, namely the MMIO
    config code.

    We also return true for undefined base addresses so that the function
    can be used before all base addresses are set; all callers already check
    for uninitialized addresses before calling this function.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • Split out the function to register all the redistributor iodevs into a
    function that handles a single redistributor at a time in preparation
    for being able to call this per VCPU as these get created.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • …lus/powerpc into HEAD

    The main thing here is a new implementation of the in-kernel
    XICS interrupt controller emulation for POWER9 machines, from Ben
    Herrenschmidt.

    POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
    Virtualization Engine) which is able to deliver interrupts directly
    to guest virtual CPUs in hardware without hypervisor intervention.
    With this new code, the guest still sees the old XICS interface but
    performance is better because the XICS emulation in the host uses the
    XIVE directly rather than going through a XICS emulation in firmware.

    Conflicts:
    arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
    arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]

    Paolo Bonzini
     
  • In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we
    use nonseekable_open() to open, we should use no_llseek() to seek,
    not generic_file_llseek().

    Signed-off-by: Geliang Tang
    Signed-off-by: Paolo Bonzini

    Geliang Tang
     
  • This function really doesn't init anything, it enables the CPU
    interface, so name it as such, which gives us the name to use for actual
    init work later on.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Eric Auger

    Christoffer Dall
     
  • Merge more updates from Andrew Morton:

    - the rest of MM

    - various misc things

    - procfs updates

    - lib/ updates

    - checkpatch updates

    - kdump/kexec updates

    - add kvmalloc helpers, use them

    - time helper updates for Y2038 issues. We're almost ready to remove
    current_fs_time() but that awaits a btrfs merge.

    - add tracepoints to DAX

    * emailed patches from Andrew Morton : (114 commits)
    drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
    selftests/vm: add a test for virtual address range mapping
    dax: add tracepoint to dax_insert_mapping()
    dax: add tracepoint to dax_writeback_one()
    dax: add tracepoints to dax_writeback_mapping_range()
    dax: add tracepoints to dax_load_hole()
    dax: add tracepoints to dax_pfn_mkwrite()
    dax: add tracepoints to dax_iomap_pte_fault()
    mtd: nand: nandsim: convert to memalloc_noreclaim_*()
    treewide: convert PF_MEMALLOC manipulations to new helpers
    mm: introduce memalloc_noreclaim_{save,restore}
    mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
    mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
    mm/huge_memory.c: use zap_deposited_table() more
    time: delete CURRENT_TIME_SEC and CURRENT_TIME
    gfs2: replace CURRENT_TIME with current_time
    apparmorfs: replace CURRENT_TIME with current_time()
    lustre: replace CURRENT_TIME macro
    fs: ubifs: replace CURRENT_TIME_SEC with current_time
    fs: ufs: use ktime_get_real_ts64() for birthtime
    ...

    Linus Torvalds
     
  • Patch series "kvmalloc", v5.

    There are many open coded kmalloc with vmalloc fallback instances in the
    tree. Most of them are not careful enough or simply do not care about
    the underlying semantic of the kmalloc/page allocator which means that
    a) some vmalloc fallbacks are basically unreachable because the kmalloc
    part will keep retrying until it succeeds b) the page allocator can
    invoke a really disruptive steps like the OOM killer to move forward
    which doesn't sound appropriate when we consider that the vmalloc
    fallback is available.

    As it can be seen implementing kvmalloc requires quite an intimate
    knowledge if the page allocator and the memory reclaim internals which
    strongly suggests that a helper should be implemented in the memory
    subsystem proper.

    Most callers, I could find, have been converted to use the helper
    instead. This is patch 6. There are some more relying on __GFP_REPEAT
    in the networking stack which I have converted as well and Eric Dumazet
    was not opposed [2] to convert them as well.

    [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
    [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

    This patch (of 9):

    Using kmalloc with the vmalloc fallback for larger allocations is a
    common pattern in the kernel code. Yet we do not have any common helper
    for that and so users have invented their own helpers. Some of them are
    really creative when doing so. Let's just add kv[mz]alloc and make sure
    it is implemented properly. This implementation makes sure to not make
    a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
    to not warn about allocation failures. This also rules out the OOM
    killer as the vmalloc is a more approapriate fallback than a disruptive
    user visible action.

    This patch also changes some existing users and removes helpers which
    are specific for them. In some cases this is not possible (e.g.
    ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
    require GFP_NO{FS,IO} context which is not vmalloc compatible in general
    (note that the page table allocation is GFP_KERNEL). Those need to be
    fixed separately.

    While we are at it, document that __vmalloc{_node} about unsupported gfp
    mask because there seems to be a lot of confusion out there.
    kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
    superset) flags to catch new abusers. Existing ones would have to die
    slowly.

    [sfr@canb.auug.org.au: f2fs fixup]
    Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Stephen Rothwell
    Reviewed-by: Andreas Dilger [ext4 part]
    Acked-by: Vlastimil Babka
    Cc: John Hubbard
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Pull KVM updates from Paolo Bonzini:
    "ARM:
    - HYP mode stub supports kexec/kdump on 32-bit
    - improved PMU support
    - virtual interrupt controller performance improvements
    - support for userspace virtual interrupt controller (slower, but
    necessary for KVM on the weird Broadcom SoCs used by the Raspberry
    Pi 3)

    MIPS:
    - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
    and Cavium Octeon III)

    PPC:
    - in-kernel acceleration for VFIO

    s390:
    - support for guests without storage keys
    - adapter interruption suppression

    x86:
    - usual range of nVMX improvements, notably nested EPT support for
    accessed and dirty bits
    - emulation of CPL3 CPUID faulting

    generic:
    - first part of VCPU thread request API
    - kvm_stat improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
    kvm: nVMX: Don't validate disabled secondary controls
    KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
    Revert "KVM: Support vCPU-based gfn->hva cache"
    tools/kvm: fix top level makefile
    KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
    KVM: Documentation: remove VM mmap documentation
    kvm: nVMX: Remove superfluous VMX instruction fault checks
    KVM: x86: fix emulation of RSM and IRET instructions
    KVM: mark requests that need synchronization
    KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
    KVM: add explicit barrier to kvm_vcpu_kick
    KVM: perform a wake_up in kvm_make_all_cpus_request
    KVM: mark requests that do not need a wakeup
    KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
    KVM: x86: always use kvm_make_request instead of set_bit
    KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
    s390: kvm: Cpu model support for msa6, msa7 and msa8
    KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
    kvm: better MWAIT emulation for guests
    KVM: x86: virtualize cpuid faulting
    ...

    Linus Torvalds
     

08 May, 2017

17 commits

  • This patch adds a new attribute to GICV3 KVM device
    KVM_DEV_ARM_VGIC_GRP_CTRL group. This allows userspace to
    flush all GICR pending tables into guest RAM.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Eric Auger
     
  • In its_sync_lpi_pending_table() we currently ignore the
    target_vcpu of the LPIs. We sync the pending bit found in
    the vcpu pending table even if the LPI is not targeting it.

    Also in vgic_its_cmd_handle_invall() we are supposed to
    read the config table data for the LPIs associated to the
    collection ID. At the moment we refresh all LPI config
    information.

    This patch passes a vpcu to vgic_copy_lpi_list() so that
    this latter returns a snapshot of the LPIs targeting this
    CPU and only those.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Eric Auger
     
  • Implement routines to save and restore device ITT and their
    interrupt table entries (ITE).

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • This patch saves the device table entries into guest RAM.
    Both flat table and 2 stage tables are supported. DeviceId
    indexing is used.

    For each device listed in the device table, we also save
    the translation table using the vgic_its_save/restore_itt
    routines. Those functions will be implemented in a subsequent
    patch.

    On restore, devices are re-allocated and their itt are
    re-built.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • As vgic_its_check_id() computes the device/collection entry's
    GPA, let's return it so that new callers can retrieve it easily.

    Signed-off-by: Eric Auger
    Acked-by: Christoffer Dall
    Acked-by: Marc Zyngier

    Eric Auger
     
  • The save path copies the collection entries into guest RAM
    at the GPA specified in the BASER register. This obviously
    requires the BASER to be set. The last written element is a
    dummy collection table entry.

    We do not index by collection ID as the collection entry
    can fit into 8 bytes while containing the collection ID.

    On restore path we re-allocate the collection objects.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • Add a generic scan_its_table() helper whose role consists in
    scanning a contiguous table located in guest RAM and applying
    a callback on each entry. Entries can be handled as linked lists
    since the callback may return an id offset to the next entry and
    also indicate whether the entry is the last one.

    Helper functions also are added to compute the device/event ID
    offset to the next DTE/ITE.

    compute_next_devid_offset, compute_next_eventid_offset and
    scan_table will become static in subsequent patches

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • Add two new helpers to allocate an its ite and an its device.
    This will avoid duplication on restore path.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • Introduce new attributes in KVM_DEV_ARM_VGIC_GRP_CTRL group:
    - KVM_DEV_ARM_ITS_SAVE_TABLES: saves the ITS tables into guest RAM
    - KVM_DEV_ARM_ITS_RESTORE_TABLES: restores them into VGIC internal
    structures.

    We hold the vcpus lock during the save and restore to make
    sure no vcpu is running.

    At this stage the functionality is not yet implemented. Only
    the skeleton is put in place.

    Signed-off-by: Eric Auger
    [Given we will move the iodev register until setting the base addr]
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • When creating the lpi we now ask the redistributor what is the state
    of the LPI (priority, enabled, pending).

    Signed-off-by: Eric Auger
    Reviewed-by: Marc Zyngier
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • this new helper synchronizes the irq pending_latch
    with the LPI pending bit status found in rdist pending table.
    As the status is consumed, we reset the bit in pending table.

    As we need the PENDBASER_ADDRESS() in vgic-v3, let's move its
    definition in the irqchip header. We restore the full length
    of the field, ie [51:16]. Same for PROPBASER_ADDRESS with full
    field length of [51:12].

    Signed-off-by: Eric Auger
    Reviewed-by: Marc Zyngier
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • On MAPD we currently check the device id can be stored in the device table.
    Let's first check it can be encoded within the range defined by TYPER
    DEVBITS.

    Also check the collection ID belongs to the 16 bit range as GITS_TYPER
    CIL field equals to 0.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • Up to now the MAPD ITT_addr had been ignored. We will need it
    for save/restore. Let's record it in the its_device struct.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • Up to now the MAPD's ITT size field has been ignored. It encodes
    the number of eventid bit minus 1. It should be used to check
    the eventid when a MAPTI command is issued on a device. Let's
    store the number of eventid bits in the its_device and do the
    check on MAPTI. Also make sure the ITT size field does
    not exceed the GITS_TYPER IDBITS field.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Eric Auger
     
  • The GITS_IIDR revision field is used to encode the migration ABI
    revision. So we need to restore it to check the table layout is
    readable by the destination.

    By writing the IIDR, userspace thus forces the ABI revision to be
    used and this must be less than or equal to the max revision KVM
    supports.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • We plan to support different migration ABIs, ie. characterizing
    the ITS table layout format in guest RAM. For example, a new ABI
    will be needed if vLPIs get supported for nested use case.

    So let's introduce an array of supported ABIs (at the moment a single
    ABI is supported though). The following characteristics are foreseen
    to vary with the ABI: size of table entries, save/restore operation,
    the way abi settings are applied.

    By default the MAX_ABI_REV is applied on its creation. In subsequent
    patches we will introduce a way for the userspace to change the ABI
    in use.

    The entry sizes now are set according to the ABI version and not
    hardcoded anymore.

    Signed-off-by: Eric Auger
    Reviewed-by: Christoffer Dall

    Eric Auger
     
  • GITS_CREADR needs to be restored so let's implement the associated
    uaccess_write_its callback. The write only is allowed if the its
    is disabled.

    Signed-off-by: Eric Auger
    Acked-by: Marc Zyngier
    Reviewed-by: Christoffer Dall

    Eric Auger