26 Apr, 2018

1 commit

  • [ Upstream commit 95a2562590c2f64a0398183f978d5cf3db6d0284 ]

    On some platforms there's an ITS available but it's not enabled
    because reading or writing the registers is denied by the
    firmware. In fact, reading or writing them will cause the system
    to reset. We could remove the node from DT in such a case, but
    it's better to skip nodes that are marked as "disabled" in DT so
    that we can describe the hardware that exists and use the status
    property to indicate how the firmware has configured things.

    Cc: Stuart Yoder
    Cc: Laurentiu Tudor
    Cc: Greg Kroah-Hartman
    Cc: Marc Zyngier
    Cc: Rajendra Nayak
    Signed-off-by: Stephen Boyd
    Signed-off-by: Marc Zyngier
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     

21 Mar, 2018

1 commit

  • commit 4f2c7583e33eb08dc09dd2e25574b80175ba7d93 upstream.

    When struct its_device instances are created, the nr_ites member
    will be set to a power of 2 that equals or exceeds the requested
    number of MSIs passed to the msi_prepare() callback. At the same
    time, the LPI map is allocated to be some multiple of 32 in size,
    where the allocated size may be less than the requested size
    depending on whether a contiguous range of sufficient size is
    available in the global LPI bitmap.

    This may result in the situation where the nr_ites < nr_lpis, and
    since nr_ites is what we program into the hardware when we map the
    device, the additional LPIs will be non-functional.

    For bog standard hardware, this does not really matter. However,
    in cases where ITS device IDs are shared between different PCIe
    devices, we may end up allocating these additional LPIs without
    taking into account that they don't actually work.

    So let's make nr_ites at least 32. This ensures that all allocated
    LPIs are 'live', and that its_alloc_device_irq() will fail when
    attempts are made to allocate MSIs beyond what was allocated in
    the first place.

    Signed-off-by: Ard Biesheuvel
    [maz: updated comment]
    Signed-off-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     

13 Oct, 2017

3 commits

  • The current ITS driver works fine as long as normal memory and GICR
    regions are located within the lower 48bit (>=0 &&
    Signed-off-by: Marc Zyngier

    Shanker Donthineni
     
  • The VCPU table consists of vPE entries, and its size provides the number
    of VPEs supported by GICv4 hardware. Unfortunately the maximum size of
    the VPE table is not discoverable like Device table. All VLPI commands
    limits the number of bits to 16 to hold VPEID, which is index into VCPU
    table. Don't apply DEVID bits for VCPU table instead assume maximum bits
    to 16.

    ITS log messages on QDF2400 without fix:
    allocated 524288 Devices (indirect, esz 8, psz 64K, shr 1)
    allocated 8192 Interrupt Collections (flat, esz 8, psz 64K, shr 1)
    Virtual CPUs Table too large, reduce ids 32->26
    Virtual CPUs too large, reduce ITS pages 8192->256
    allocated 2097152 Virtual CPUs (flat, esz 8, psz 64K, shr 1)

    ITS log messages on QDF2400 with fix:
    allocated 524288 Devices (indirect, esz 8, psz 64K, shr 1)
    allocated 8192 Interrupt Collections (flat, esz 8, psz 64K, shr 1)
    allocated 65536 Virtual CPUs (flat, esz 8, psz 64K, shr 1)

    Signed-off-by: Shanker Donthineni
    Signed-off-by: Marc Zyngier

    Shanker Donthineni
     
  • The driver probe path hits 'BUG_ON(entries != vpe_proxy.dev->nr_ites)'
    on systems where it has VLPI capability, doesn't support direct LPI
    feature and boot with a single CPU.

    Relax the BUG_ON() condition to fix the issue.

    Signed-off-by: Shanker Donthineni
    Signed-off-by: Marc Zyngier

    Shanker Donthineni
     

05 Sep, 2017

1 commit

  • Pull irq updates from Thomas Gleixner:
    "The interrupt subsystem delivers this time:

    - Refactoring of the GIC-V3 driver to prepare for the GIC-V4 support

    - Initial GIC-V4 support

    - Consolidation of the FSL MSI support

    - Utilize the effective affinity interface in various ARM irqchip
    drivers

    - Yet another interrupt chip driver (UniPhier AIDET)

    - Bulk conversion of the irq chip driver to use %pOF

    - The usual small fixes and improvements all over the place"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
    irqchip/ls-scfg-msi: Add MSI affinity support
    irqchip/ls-scfg-msi: Add LS1043a v1.1 MSI support
    irqchip/ls-scfg-msi: Add LS1046a MSI support
    arm64: dts: ls1046a: Add MSI dts node
    arm64: dts: ls1043a: Share all MSIs
    arm: dts: ls1021a: Share all MSIs
    arm64: dts: ls1043a: Fix typo of MSI compatible string
    arm: dts: ls1021a: Fix typo of MSI compatible string
    irqchip/ls-scfg-msi: Fix typo of MSI compatible strings
    irqchip/irq-bcm7120-l2: Use correct I/O accessors for irq_fwd_mask
    irqchip/mmp: Make mmp_intc_conf const
    irqchip/gic: Make irq_chip const
    irqchip/gic-v3: Advertise GICv4 support to KVM
    irqchip/gic-v4: Enable low-level GICv4 operations
    irqchip/gic-v4: Add some basic documentation
    irqchip/gic-v4: Add VLPI configuration interface
    irqchip/gic-v4: Add VPE command interface
    irqchip/gic-v4: Add per-VM VPE domain creation
    irqchip/gic-v3-its: Set implementation defined bit to enable VLPIs
    irqchip/gic-v3-its: Allow doorbell interrupts to be injected/cleared
    ...

    Linus Torvalds
     

01 Sep, 2017

1 commit


31 Aug, 2017

17 commits

  • Get the show on the road...

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • A long time ago, GITS_CTLR[1] used to be called GITC_CTLR.EnableVLPI.
    It has been subsequently deprecated and is now an "Implementation
    Defined" bit that may ot may not be set for GICv4. Brilliant.

    And the current crop of the FastModel requires that bit for VLPIs
    to be enabled. Oh well... Let's set it and find out what breaks.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • While the doorbell interrupts are usually driven by the HW itself,
    having a way to trigger them independently has proved to be a
    really useful debug feature. As it is actually very little code,
    let's add it to the VPE irqchip operations.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • After moving a VPE from a redistributor to another, we're still left
    with a potential pending doorbell interrupt on the old redistributor.
    That interrupt should be moved to the new one to be either cleared
    or take, depending on what the hypervisor wishes to do.

    So let's move it right after having execited VMOVP. This doesn't
    add much cost in the !DirectLPI case (we trade a DISCARD for a MOVI),
    and the cost of the DIRECTLPI case should be minimal (two extra MMIO
    accesses).

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When we don't have the DirectLPI feature, we must work around the
    architecture shortcomings to be able to perform the required
    maintenance (interrupt masking, clearing and injection).

    For this, we create a fake device whose sole purpose is to
    provide a way to issue commands as if we were dealing with LPIs
    coming from that device (while they actually originate from
    the ITS). This fake device doesn't have LPIs allocated to it,
    but instead uses the VPE LPIs.

    Of course, this could be a real bottleneck, and a naive
    implementation would require 6 commands to issue an invalidation.

    Instead, let's allocate at least one event per physical CPU
    (rounded up to the next power of 2), and opportunistically
    map the VPE doorbell to an event. This doorbell will be mapped
    until we roll over and need to reallocate this slot.

    This ensures that most of the time, we only need 2 commands
    to issue an INV, INT or CLEAR, making the performance a lot
    better, given that we always issue a CLEAR on entry, and
    an INV on each side of a trapped WFI.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • The normal course of action when allocating the ITS' view of a
    device is to allocate the corresponding LPIs. But we're about
    to introduce devices that borrow their interrupts from
    some other entities.

    So let's make the allocation optional.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When masking/unmasking a doorbell interrupt, it is necessary
    to issue an invalidation to the corresponding redistributor.
    We use the DirectLPI feature by writting directly to the corresponding
    redistributor.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When we're about to run a vcpu, it is crucial that the redistributor
    associated with the physical CPU is being told about the new residency.

    This is abstracted by hijacking the irq_set_affinity method for the
    doorbell interrupt associated with the VPE. It is expected that the
    hypervisor will call this method before scheduling the VPE.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When a guest issues a INVALL command targetting a collection, it must
    be translated into a VINVALL for the VPE that has this collection.

    This patch implements a hook that offers this functionallity to the
    hypervisor.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When a VPE is scheduled to run, the corresponding redistributor must
    be told so, by setting VPROPBASER to the VM's property table, and
    VPENDBASER to the vcpu's pending table.

    When scheduled out, we preserve the IDAI and PendingLast bits. The
    latter is specially important, as it tells the hypervisor that
    there are pending interrupts for this vcpu.

    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • On activation, a VPE is mapped using the VMAPP command, followed
    by a VINVALL for a good measure. On deactivation, the VPE is
    simply unmapped.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When creating a VM, the low level GICv4 code is responsible for:
    - allocating each VPE a unique VPEID
    - allocating a doorbell interrupt for each VPE
    - allocating the pending tables for each VPE
    - allocating the property table for the VM

    This of course has to be reversed when the VM is brought down.

    All of this is wired into the irq domain alloc/free methods.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Add the basic GICv4 VPE (vcpu in GICv4 parlance) infrastructure
    (irqchip, irq domain) that is going to be populated in the following
    patches.

    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • When a VLPI is reconfigured (enabled, disabled, change in priority),
    the full configuration byte must be written, and the caches invalidated.

    Also, when using the irq_mask/irq_unmask methods, it is necessary
    to disable the doorbell for that particular interrupt (by mapping it
    to 1023) on top of clearing the Enable bit.

    Reviewed-by: Thomas Gleixner
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • In order to let a VLPI being injected into a guest, the VLPI must
    be mapped using the VMAPTI command. When moved to a different vcpu,
    it must be moved with the VMOVI command.

    These commands are issued via the irq_set_vcpu_affinity method,
    making sure we unmap the corresponding host LPI first.

    The reverse is also done when the VLPI is unmapped from the guest.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Add the skeleton irq_set_vcpu_affinity method that will be used
    to configure VLPIs.

    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Add the new GICv4 ITS command definitions, most of them, being
    defined in terms of their physical counterparts.

    Reviewed-by: Eric Auger
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

23 Aug, 2017

11 commits

  • We're are going to need to change a bit more than just the enable
    bit in the LPI property table in the future. So let's change the
    LPI configuration funtion to take a set of bits to be cleared,
    and a set of bits to be set.

    This way, we'll be able to use it when a guest updates an LPI
    property (priority, for example).

    Reviewed-by: Eric Auger
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • As we want to use 2-level tables for VCPUs, let's hack the device
    table allocator in order to make it slightly more generic. It
    will get reused in subsequent patches.

    Reviewed-by: Thomas Gleixner
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Rework LPI deallocation so that it can be reused by the v4 support
    code.

    Reviewed-by: Eric Auger
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Just as for the property table, let's move the pending table
    allocation to a separate function.

    Reviewed-by: Thomas Gleixner
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • The VCPU tables can be quite sparse as well, and it makes sense
    to use indirect tables as well if possible.

    Reviewed-by: Thomas Gleixner
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Move the LPI property table allocation into its own function, as
    this is going to be required for those associated with VMs in
    the future.

    Reviewed-by: Eric Auger
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Allow the pending state of an LPI to be set or cleared via
    irq_set_irqchip_state.

    Reviewed-by: Thomas Gleixner
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Most ITS commands do operate on a collection object, and require
    a SYNC command to be performed on that collection in order to
    guarantee the execution of the first command.

    With GICv4 ITS, another set of commands perform similar operations
    on a VPE object, and a VSYNC operations must be executed to guarantee
    their execution.

    Given the similarities (post a command, perform a synchronization
    operation on a sync object), it makes sense to reuse the same
    mechanism for both class of commands.

    Let's start with turning its_send_single_command into a huge macro
    that performs the bulk of the work, and a set of helpers that
    make this macro usable for the GICv3 ITS commands.

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Add the probing code for the ITS VLPI support. This includes
    configuring the ITS number if not supporting the single VMOVP
    command feature.

    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • The various LPI definitions are in the middle of the code, and
    would be better placed at the beginning, given that we're going
    to use some of them much earlier.

    Reviewed-by: Thomas Gleixner
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Now that we have a custom printf format specifier, convert users of
    full_name to use %pOF instead. This is preparation to remove storing
    of the full path string for each node.

    Cc: Thomas Gleixner
    Cc: Jason Cooper
    Cc: Lee Jones
    Cc: Stefan Wahren
    Cc: Florian Fainelli
    Cc: Ray Jui
    Cc: Scott Branden
    Cc: bcm-kernel-feedback-list@broadcom.com
    Cc: Sylvain Lemieux
    Cc: Maxime Coquelin
    Cc: Chen-Yu Tsai
    Cc: Thierry Reding
    Cc: Jonathan Hunter
    Cc: Michal Simek
    Cc: "Sören Brinkmann"
    Cc: linux-rpi-kernel@lists.infradead.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-mediatek@lists.infradead.org
    Cc: linux-tegra@vger.kernel.org
    Acked-by: Eric Anholt
    Acked-by: Baruch Siach
    Acked-by: Vladimir Zapolskiy
    Acked-by: Matthias Brugger
    Acked-by: Alexandre Torgue
    Acked-by: Maxime Ripard
    Signed-off-by: Rob Herring
    Signed-off-by: Marc Zyngier

    Rob Herring
     

19 Aug, 2017

1 commit

  • wait_for_range_completion() is nicely busted when handling
    wrapping of the command queue, leading to an early exit
    instead of waiting for the command to have been executed.

    Fortunately, the impact is pretty minor, as it only impair
    the detection of an ITS that doesn't make any forward progress
    for a whole second. And an ITS should *never* lock up.

    Reported-by: Yang Yingliang
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

18 Aug, 2017

1 commit

  • The GICv3 ITS driver only targets a single CPU at a time, even if
    the notional affinity is wider. Let's inform the core code
    about this.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Thomas Gleixner
    Cc: Andrew Lunn
    Cc: James Hogan
    Cc: Jason Cooper
    Cc: Paul Burton
    Cc: Chris Zankel
    Cc: Kevin Cernekee
    Cc: Wei Xu
    Cc: Max Filippov
    Cc: Florian Fainelli
    Cc: Gregory Clement
    Cc: Matt Redfearn
    Cc: Sebastian Hesselbarth
    Link: http://lkml.kernel.org/r/20170818083925.10108-6-marc.zyngier@arm.com

    Marc Zyngier
     

14 Aug, 2017

1 commit


10 Aug, 2017

1 commit

  • When enabling ITS NUMA support on D05, I got the boot log:

    [ 0.000000] SRAT: PXM 0 -> ITS 0 -> Node 0
    [ 0.000000] SRAT: PXM 0 -> ITS 1 -> Node 0
    [ 0.000000] SRAT: PXM 0 -> ITS 2 -> Node 0
    [ 0.000000] SRAT: PXM 1 -> ITS 3 -> Node 1
    [ 0.000000] SRAT: ITS affinity exceeding max count[4]

    This is wrong on D05 as we have 8 ITSs with 4 NUMA nodes.

    So dynamically alloc the memory needed instead of using
    its_srat_maps[MAX_NUMNODES], which count the number of
    ITS entry(ies) in SRAT and alloc its_srat_maps as needed,
    then build the mapping of numa node to ITS ID. Of course,
    its_srat_maps will be freed after ITS probing because
    we don't need that after boot.

    After doing this, I got what I wanted:

    [ 0.000000] SRAT: PXM 0 -> ITS 0 -> Node 0
    [ 0.000000] SRAT: PXM 0 -> ITS 1 -> Node 0
    [ 0.000000] SRAT: PXM 0 -> ITS 2 -> Node 0
    [ 0.000000] SRAT: PXM 1 -> ITS 3 -> Node 1
    [ 0.000000] SRAT: PXM 2 -> ITS 4 -> Node 2
    [ 0.000000] SRAT: PXM 2 -> ITS 5 -> Node 2
    [ 0.000000] SRAT: PXM 2 -> ITS 6 -> Node 2
    [ 0.000000] SRAT: PXM 3 -> ITS 7 -> Node 3

    Fixes: dbd2b8267233 ("irqchip/gic-v3-its: Add ACPI NUMA node mapping")
    Signed-off-by: Hanjun Guo
    Reviewed-by: Lorenzo Pieralisi
    Cc: Ganapatrao Kulkarni
    Cc: John Garry
    Signed-off-by: Marc Zyngier

    Hanjun Guo
     

02 Aug, 2017

1 commit