13 Aug, 2020

1 commit

  • Rationale:
    Reduces attack surface on kernel devs opening the links for MITM
    as HTTPS traffic is much harder to manipulate.

    Signed-off-by: Alexander A. Klimov
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Link: http://lkml.kernel.org/r/20200726110117.16346-1-grandmaster@al2klimov.de
    Signed-off-by: Linus Torvalds

    Alexander A. Klimov
     

27 Jul, 2020

2 commits

  • Drop the repeated word "the" in a comment.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Marc Zyngier
    Cc: Thomas Gleixner
    Cc: Jason Cooper
    Cc: Marc Zyngier
    Link: https://lore.kernel.org/r/20200719002853.20419-1-rdunlap@infradead.org

    Randy Dunlap
     
  • [maz: The GICv3 spec has evolved quite a bit since the draft the Linux
    driver was written against, and some register definitions are simply gone]

    As per the GICv3 specification, GIC{D,R}_SEIR are not assigned and the
    locations (0x0068) are actually Reserved. GICR_MOV{LPI,ALL}R are two IMP
    DEF registers and might be defined by some specific micro-architecture.

    As they're not used anywhere in the kernel, just drop all of them.

    Signed-off-by: Zenghui Yu
    [maz: added context explaination]
    Signed-off-by: Marc Zyngier
    Link: https://lore.kernel.org/r/20200630134126.880-1-yuzenghui@huawei.com

    Zenghui Yu
     

27 Jun, 2020

2 commits


16 Apr, 2020

1 commit

  • When a vPE is made resident, the GIC starts parsing the virtual pending
    table to deliver pending interrupts. This takes place asynchronously,
    and can at times take a long while. Long enough that the vcpu enters
    the guest and hits WFI before any interrupt has been signaled yet.
    The vcpu then exits, blocks, and now gets a doorbell. Rince, repeat.

    In order to avoid the above, a (optional on GICv4, mandatory on v4.1)
    feature allows the GIC to feedback to the hypervisor whether it is
    done parsing the VPT by clearing the GICR_VPENDBASER.Dirty bit.
    The hypervisor can then wait until the GIC is ready before actually
    running the vPE.

    Plug the detection code as well as polling on vPE schedule. While
    at it, tidy-up the kernel message that displays the GICv4 optional
    features.

    Reviewed-by: Zenghui Yu
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

24 Mar, 2020

8 commits

  • Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Add the SGI configuration entry point for KVM to use.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-16-maz@kernel.org

    Marc Zyngier
     
  • Allocate per-VPE SGIs when initializing the GIC-specific part of the
    VPE data structure.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-15-maz@kernel.org

    Marc Zyngier
     
  • In order to hide some of the differences between v4.0 and v4.1, move
    the doorbell management out of the KVM code, and into the GICv4-specific
    layer. This allows the calling code to ask for the doorbell when blocking,
    and otherwise to leave the doorbell permanently disabled.

    This matches the v4.1 code perfectly, and only results in a minor
    refactoring of the v4.0 code.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org

    Marc Zyngier
     
  • Just like for vLPIs, there is some configuration information that cannot
    be directly communicated through the normal irqchip API, and we have to
    use our good old friend set_vcpu_affinity as a side-band communication
    mechanism.

    This is used to configure group and priority for a given vSGI.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Reviewed-by: Eric Auger
    Link: https://lore.kernel.org/r/20200304203330.4967-13-maz@kernel.org

    Marc Zyngier
     
  • To implement the get/set_irqchip_state callbacks (limited to the
    PENDING state), we have to use a particular set of hacks:

    - Reading the pending state is done by using a pair of new redistributor
    registers (GICR_VSGIR, GICR_VSGIPENDR), which allow the 16 interrupts
    state to be retrieved.
    - Setting the pending state is done by generating it as we'd otherwise do
    for a guest (writing to GITS_SGIR).
    - Clearing the pending state is done by emitting a VSGI command with the
    "clear" bit set.

    This requires some interesting locking though:
    - When talking to the redistributor, we must make sure that the VPE
    affinity doesn't change, hence taking the VPE lock.
    - At the same time, we must ensure that nobody accesses the same
    redistributor's GICR_VSGIR registers for a different VPE, which
    would corrupt the reading of the pending bits. We thus take the
    per-RD spinlock. Much fun.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-12-maz@kernel.org

    Marc Zyngier
     
  • The GICv4.1 ITS has yet another new command (VSGI) which allows
    a VPE-targeted SGI to be configured (or have its pending state
    cleared). Add support for this command and plumb it into the
    activate irqdomain callback so that it is ready to be used.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-10-maz@kernel.org

    Marc Zyngier
     
  • Since GICv4.1 has the capability to inject 16 SGIs into each VPE,
    and that I'm keen not to invent too many specific interfaces to
    manipulate these interrupts, let's pretend that each of these SGIs
    is an actual Linux interrupt.

    For that matter, let's introduce a minimal irqchip and irqdomain
    setup that will get fleshed up in the following patches.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Reviewed-by: Eric Auger
    Link: https://lore.kernel.org/r/20200304203330.4967-9-maz@kernel.org

    Marc Zyngier
     

21 Mar, 2020

3 commits

  • There is no special reason to set virtual LPI pending table as
    non-shareable. If we choose to hard code the shareability without
    probing, Inner-Shareable is likely to be a better choice, as the
    VPEs can move around and benefit from having the redistributors
    snooping each other's cache, if that's something they can do.

    Furthermore, Hisilicon hip08 ends up with unspecified errors when
    mixing shareability attributes. So let's move to IS attributes for
    the VPT. This has also been tested on D05 and didn't show any
    regression.

    Signed-off-by: Heyi Guo
    [maz: rewrote commit message]
    Signed-off-by: Marc Zyngier
    Tested-by: Marc Zyngier
    Link: https://lore.kernel.org/r/20191130073849.38378-1-guoheyi@huawei.com

    Heyi Guo
     
  • Tell KVM that we support v4.1. Nothing uses this information so far.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Reviewed-by: Eric Auger
    Link: https://lore.kernel.org/r/20200304203330.4967-7-maz@kernel.org

    Marc Zyngier
     
  • The GICv4.1 spec says that it is CONTRAINED UNPREDICTABLE to write to
    any of the GICR_INV{LPI,ALL}R registers if GICR_SYNCR.Busy == 1.

    To deal with it, we must ensure that only a single invalidation can
    happen at a time for a given redistributor. Add a per-RD lock to that
    effect and take it around the invalidation/syncr-read to deal with this.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Reviewed-by: Eric Auger
    Link: https://lore.kernel.org/r/20200304203330.4967-6-maz@kernel.org

    Marc Zyngier
     

19 Mar, 2020

2 commits

  • Before GICv4.1, all operations would be serialized with the affinity
    changes by virtue of using the same ITS command queue. With v4.1, things
    change, as invalidations (and a number of other operations) are issued
    using the redistributor MMIO frame.

    We must thus make sure that these redistributor accesses cannot race
    against aginst the affinity change, or we may end-up talking to the
    wrong redistributor.

    To ensure this, we expand the irq_to_cpuid() helper to take a spinlock
    when the LPI is mapped to a vLPI (a new per-VPE lock) on each operation
    that requires mutual exclusion.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-4-maz@kernel.org

    Marc Zyngier
     
  • To allow the direct injection of SGIs into a guest, the GICv4.1
    architecture has to sacrifice the Active state so that SGIs look
    a lot like LPIs (they are injected by the same mechanism).

    In order not to break existing software, the architecture gives
    offers guests OSs the choice: SGIs with or without an active
    state. It is the hypervisors duty to honor the guest's choice.

    For this, the architecture offers a discovery bit indicating whether
    the GIC supports GICv4.1 SGIs (GICD_TYPER2.nASSGIcap), and another
    bit indicating whether the guest wants Active-less SGIs or not
    (controlled by GICD_CTLR.nASSGIreq).

    A hypervisor not supporting GICv4.1 SGIs would leave nASSGIcap
    clear, and a guest not knowing about GICv4.1 SGIs (or definitely
    wanting an Active state) would leave nASSGIreq clear (both being
    thankfully backward compatible with older revisions of the GIC).

    Since Linux is perfectly happy without an active state on SGIs,
    inform the hypervisor that we'll use that if offered.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20200304203330.4967-2-maz@kernel.org

    Marc Zyngier
     

08 Feb, 2020

1 commit

  • Currently, we will not set vpe_l1_page for the current RD if we can
    inherit the vPE configuration table from another RD (or ITS), which
    results in an inconsistency between RDs within the same CommonLPIAff
    group.

    Let's rename it to vpe_l1_base to indicate the base address of the
    vPE configuration table of this RD, and set it properly for *all*
    v4.1 redistributors.

    Signed-off-by: Zenghui Yu
    Signed-off-by: Marc Zyngier
    Link: https://lore.kernel.org/r/20200206075711.1275-3-yuzenghui@huawei.com

    Zenghui Yu
     

22 Jan, 2020

8 commits

  • Just like for INVALL, GICv4.1 has grown a VPE-aware INVLPI register.
    Let's plumb it in and make use of the DirectLPI code in that case.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-16-maz@kernel.org

    Marc Zyngier
     
  • GICv4.1 redistributors have a VPE-aware INVALL register. Progress!
    We can now emulate a guest-requested INVALL without emiting a
    VINVALL command.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-14-maz@kernel.org

    Marc Zyngier
     
  • Making a VPE resident on GICv4.1 is pretty simple, as it is just a
    single write to the local redistributor. We just need extra information
    about which groups to enable, which the KVM code will have to provide.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-12-maz@kernel.org

    Marc Zyngier
     
  • masking/unmasking doorbells on GICv4.1 relies on a new INVDB command,
    which broadcasts the invalidation to all RDs.

    Implement the new command as well as the masking callbacks, and plug
    the whole thing into the v4.1 VPE irqchip.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-11-maz@kernel.org

    Marc Zyngier
     
  • The ITS VMAPP command gains some new fields with GICv4.1:
    - a default doorbell, which allows a single doorbell to be used for
    all the VLPIs routed to a given VPE
    - a pointer to the configuration table (instead of having it in a register
    that gets context switched)
    - a flag indicating whether this is the first map or the last unmap for
    this particular VPE
    - a flag indicating whether the pending table is known to be zeroed, or not

    Plumb in the new fields in the VMAPP builder, and add the map/unmap
    refcounting so that the ITS can do the right thing.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-7-maz@kernel.org

    Marc Zyngier
     
  • GICv4.1 defines a new VPE table that is potentially shared between
    both the ITSs and the redistributors, following complicated affinity
    rules.

    To make things more confusing, the programming of this table at
    the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register
    for something completely different.

    The code flow is somewhat complexified by the need to respect the
    affinities required by the HW, meaning that tables can either be
    inherited from a previously discovered ITS or redistributor.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-6-maz@kernel.org

    Marc Zyngier
     
  • While GICv4.0 mandates 16 bit worth of VPEIDs, GICv4.1 allows smaller
    implementations to be built. Add the required glue to dynamically
    compute the limit.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-3-maz@kernel.org

    Marc Zyngier
     
  • GICv4.1 supports the RVPEID ("Residency per vPE ID"), which allows for
    a much efficient way of making virtual CPUs resident (to allow direct
    injection of interrupts).

    The functionnality needs to be discovered on each and every redistributor
    in the system, and disabled if the settings are inconsistent.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191224111055.11836-2-maz@kernel.org

    Marc Zyngier
     

04 Dec, 2019

1 commit

  • Pull irq updates from Ingo Molnar:
    "Most of the IRQ subsystem changes in this cycle were irq-chip driver
    updates:

    - Qualcomm PDC wakeup interrupt support

    - Layerscape external IRQ support

    - Broadcom bcm7038 PM and wakeup support

    - Ingenic driver cleanup and modernization

    - GICv3 ITS preparation for GICv4.1 updates

    - GICv4 fixes

    There's also the series from Frederic Weisbecker that fixes memory
    ordering bugs for the irq-work logic, whose primary fix is to turn
    work->irq_work.flags into an atomic variable and then convert the
    complex (and buggy) atomic_cmpxchg() loop in irq_work_claim() into a
    much simpler atomic_fetch_or() call.

    There are also various smaller cleanups"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    pinctrl/sdm845: Add PDC wakeup interrupt map for GPIOs
    pinctrl/msm: Setup GPIO chip in hierarchy
    irqchip/qcom-pdc: Add irqchip set/get state calls
    irqchip/qcom-pdc: Add irqdomain for wakeup capable GPIOs
    irqchip/qcom-pdc: Do not toggle IRQ_ENABLE during mask/unmask
    irqchip/qcom-pdc: Update max PDC interrupts
    of/irq: Document properties for wakeup interrupt parent
    genirq: Introduce irq_chip_get/set_parent_state calls
    irqdomain: Add bus token DOMAIN_BUS_WAKEUP
    genirq: Fix function documentation of __irq_alloc_descs()
    irq_work: Fix IRQ_WORK_BUSY bit clearing
    irqchip/ti-sci-inta: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))
    irq_work: Slightly simplify IRQ_WORK_PENDING clearing
    irq_work: Fix irq_work_claim() memory ordering
    irq_work: Convert flags to atomic_t
    irqchip: Ingenic: Add process for more than one irq at the same time.
    irqchip: ingenic: Alloc generic chips from IRQ domain
    irqchip: ingenic: Get virq number from IRQ domain
    irqchip: ingenic: Error out if IRQ domain creation failed
    irqchip: ingenic: Drop redundant irq_suspend / irq_resume functions
    ...

    Linus Torvalds
     

26 Nov, 2019

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "ARM:
    - data abort report and injection
    - steal time support
    - GICv4 performance improvements
    - vgic ITS emulation fixes
    - simplify FWB handling
    - enable halt polling counters
    - make the emulated timer PREEMPT_RT compliant

    s390:
    - small fixes and cleanups
    - selftest improvements
    - yield improvements

    PPC:
    - add capability to tell userspace whether we can single-step the
    guest
    - improve the allocation of XIVE virtual processor IDs
    - rewrite interrupt synthesis code to deliver interrupts in virtual
    mode when appropriate.
    - minor cleanups and improvements.

    x86:
    - XSAVES support for AMD
    - more accurate report of nested guest TSC to the nested hypervisor
    - retpoline optimizations
    - support for nested 5-level page tables
    - PMU virtualization optimizations, and improved support for nested
    PMU virtualization
    - correct latching of INITs for nested virtualization
    - IOAPIC optimization
    - TSX_CTRL virtualization for more TAA happiness
    - improved allocation and flushing of SEV ASIDs
    - many bugfixes and cleanups"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
    kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
    KVM: x86: Grab KVM's srcu lock when setting nested state
    KVM: x86: Open code shared_msr_update() in its only caller
    KVM: Fix jump label out_free_* in kvm_init()
    KVM: x86: Remove a spurious export of a static function
    KVM: x86: create mmu/ subdirectory
    KVM: nVMX: Remove unnecessary TLB flushes on L1L2 switches when L1 use apic-access-page
    KVM: x86: remove set but not used variable 'called'
    KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
    KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
    KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
    KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
    KVM: x86: do not modify masked bits of shared MSRs
    KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
    KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
    KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
    KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
    KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
    KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
    KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
    ...

    Linus Torvalds
     

11 Nov, 2019

3 commits

  • The same behaviour can be obtained by using the IRQCHIP_MASK_ON_SUSPEND
    flag on the IRQ chip.

    Signed-off-by: Paul Cercueil
    Signed-off-by: Marc Zyngier
    Link: https://lore.kernel.org/r/1570015525-27018-2-git-send-email-zhouyanjie@zoho.com

    Paul Cercueil
     
  • Now that we have a copy of TYPER in the ITS structure, rely on this
    to provide the same service as its->device_ids, which gets axed.
    Errata workarounds are now updating the cached fields instead of
    requiring a separate field in the ITS structure.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191027144234.8395-7-maz@kernel.org
    Link: https://lore.kernel.org/r/20191108165805.3071-7-maz@kernel.org

    Marc Zyngier
     
  • Now that we have a copy of TYPER in the ITS structure, rely on this
    to provide the same service as its->ite_size, which gets axed.
    Errata workarounds are now updating the cached fields instead of
    requiring a separate field in the ITS structure.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Link: https://lore.kernel.org/r/20191027144234.8395-6-maz@kernel.org
    Link: https://lore.kernel.org/r/20191108165805.3071-6-maz@kernel.org

    Marc Zyngier
     

08 Nov, 2019

1 commit

  • In order to find out whether a vcpu is likely to be the target of
    VLPIs (and to further optimize the way we deal with those), let's
    track the number of VLPIs a vcpu can receive.

    This gets implemented with an atomic variable that gets incremented
    or decremented on map, unmap and move of a VLPI.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Zenghui Yu
    Reviewed-by: Christoffer Dall
    Link: https://lore.kernel.org/r/20191107160412.30301-2-maz@kernel.org

    Marc Zyngier
     

29 Oct, 2019

1 commit

  • When the VHE code was reworked, a lot of the vgic stuff was moved around,
    but the GICv4 residency code did stay untouched, meaning that we come
    in and out of residency on each flush/sync, which is obviously suboptimal.

    To address this, let's move things around a bit:

    - Residency entry (flush) moves to vcpu_load
    - Residency exit (sync) moves to vcpu_put
    - On blocking (entry to WFI), we "put"
    - On unblocking (exit from WFI), we "load"

    Because these can nest (load/block/put/load/unblock/put, for example),
    we now have per-VPE tracking of the residency state.

    Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
    gets set to true when blocking because of a WFI. This allows a finer
    control of the doorbell, which now also gets disabled as soon as
    it gets signaled.

    Signed-off-by: Marc Zyngier
    Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org

    Marc Zyngier
     

15 Oct, 2019

1 commit

  • The GICv3 architecture specification is incredibly misleading when it
    comes to PMR and the requirement for a DSB. It turns out that this DSB
    is only required if the CPU interface sends an Upstream Control
    message to the redistributor in order to update the RD's view of PMR.

    This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
    the case in Linux. It can still be set from EL3, so some special care
    is required. But the upshot is that in the (hopefuly large) majority
    of the cases, we can drop the DSB altogether.

    This relies on a new static key being set if the boot CPU has PMHE
    set. The drawback is that this static key has to be exported to
    modules.

    Cc: Will Deacon
    Cc: James Morse
    Cc: Julien Thierry
    Cc: Suzuki K Poulose
    Signed-off-by: Marc Zyngier
    Signed-off-by: Catalin Marinas

    Marc Zyngier
     

20 Aug, 2019

4 commits