20 Jan, 2021

1 commit


06 Jan, 2021

1 commit

  • [ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

    I noticed that iounmap() of msgr_block_addr before return from
    mpic_msgr_probe() in the error handling case is missing. So use
    devm_ioremap() instead of just ioremap() when remapping the message
    register block, so the mapping will be automatically released on
    probe failure.

    Signed-off-by: Qinglang Miao
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
    Signed-off-by: Sasha Levin

    Qinglang Miao
     

14 Dec, 2020

1 commit

  • In sleep mode, the clocks of CPU core and unused IP blocks are turned
    off (IP blocks allowed to wake up system will running).

    Some QorIQ SoCs like MPC8536, P1022 and T104x, have deep sleep PM mode
    in addtion to the sleep PM mode. While in deep sleep mode,
    additionally, the power supply is removed from CPU core and most IP
    blocks. Only the blocks needed to wake up the chip out of deep sleep
    are ON.

    This feature supports 32-bit and 36-bit address space.

    The sleep mode is equal to the Standby state in Linux. The deep sleep
    mode is equal to the Suspend-to-RAM state of Linux Power Management.
    Command to enter sleep mode.
    echo standby > /sys/power/state
    Command to enter deep sleep mode.
    echo mem > /sys/power/state

    Signed-off-by: Dave Liu
    Signed-off-by: Li Yang
    Signed-off-by: Jin Qing
    Signed-off-by: Jerry Huang
    Signed-off-by: Ramneek Mehresh
    Signed-off-by: Zhao Chenhui
    Signed-off-by: Wang Dongsheng
    Signed-off-by: Tang Yuantian
    Signed-off-by: Xie Xiaobo
    Signed-off-by: Zhao Qiang
    Signed-off-by: Shengzhou Liu
    Signed-off-by: Ran Wang

    Ran Wang
     

18 Sep, 2020

1 commit

  • This fixes a compile error with W=1.

    CC arch/powerpc/sysdev/xive/common.o
    ../arch/powerpc/sysdev/xive/common.c:1568:6: error: no previous prototype for ‘xive_debug_show_cpu’ [-Werror=missing-prototypes]
    void xive_debug_show_cpu(struct seq_file *m, int cpu)
    ^~~~~~~~~~~~~~~~~~~
    ../arch/powerpc/sysdev/xive/common.c:1602:6: error: no previous prototype for ‘xive_debug_show_irq’ [-Werror=missing-prototypes]
    void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
    ^~~~~~~~~~~~~~~~~~~

    Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
    Signed-off-by: Cédric Le Goater
    Reviewed-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200914211007.2285999-5-clg@kaod.org

    Cédric Le Goater
     

24 Aug, 2020

1 commit

  • Both of_find_compatible_node() and of_find_node_by_type() will return
    a refcounted node on success - thus for the success path the node must
    be explicitly released with a of_node_put().

    Fixes: 0b05ac6e2480 ("powerpc/xics: Rewrite XICS driver")
    Signed-off-by: Nicholas Mc Guire
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1530691407-3991-1-git-send-email-hofrat@osadl.org

    Nicholas Mc Guire
     

30 Jul, 2020

1 commit

  • Certain warnings are emitted for powerpc code when building with a gcc-10
    toolset:

    WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in
    reference from the function remove_pmd_table() to the function
    .meminit.text:split_kernel_mapping()
    The function remove_pmd_table() references
    the function __meminit split_kernel_mapping().
    This is often because remove_pmd_table lacks a __meminit
    annotation or the annotation of split_kernel_mapping is wrong.

    Add the appropriate __init and __meminit annotations to make modpost not
    complain. In all the cases there are just a single callsite from another
    __init or __meminit function:

    __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table()
    __init prom_init() -> setup_secure_guest()
    __init xive_spapr_init() -> xive_spapr_disabled()

    Signed-off-by: Vladis Dronov
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200729133741.62789-1-vdronov@redhat.com

    Vladis Dronov
     

22 Jun, 2020

1 commit

  • xive_native_provision_pages() allocates memory and passes the pointer to
    OPAL so kmemleak cannot find the pointer usage in the kernel memory and
    produces a false positive report (below) (even if the kernel did scan
    OPAL memory, it is unable to deal with __pa() addresses anyway).

    This silences the warning.

    unreferenced object 0xc000200350c40000 (size 65536):
    comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
    hex dump (first 32 bytes):
    02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00 ....P...........
    01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] xive_native_alloc_vp_block+0x120/0x250
    [] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
    [] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
    [] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
    [] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
    [] ksys_ioctl+0x184/0x1b0
    [] sys_ioctl+0x48/0xb0
    [] system_call_exception+0x124/0x1f0
    [] system_call_common+0xe8/0x214

    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200612043303.84894-1-aik@ozlabs.ru

    Alexey Kardashevskiy
     

19 Jun, 2020

1 commit


18 Jun, 2020

1 commit


10 Jun, 2020

3 commits

  • The replacement of with made the include
    of the latter in the middle of asm includes. Fix this up with the aid of
    the below script and manual adjustments here and there.

    import sys
    import re

    if len(sys.argv) is not 3:
    print "USAGE: %s " % (sys.argv[0])
    sys.exit(1)

    hdr_to_move="#include " % sys.argv[2]
    moved = False
    in_hdrs = False

    with open(sys.argv[1], "r") as f:
    lines = f.readlines()
    for _line in lines:
    line = _line.rstrip('
    ')
    if line == hdr_to_move:
    continue
    if line.startswith("#include
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The include/linux/pgtable.h is going to be the home of generic page table
    manipulation functions.

    Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
    make the latter include asm/pgtable.h.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

02 Jun, 2020

1 commit

  • Implement rtas_call_reentrant() for reentrant rtas-calls:
    "ibm,int-on", "ibm,int-off",ibm,get-xive" and "ibm,set-xive".

    On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
    items 2 and 3 say:

    2 - For the PowerPC External Interrupt option: The * call must be
    reentrant to the number of processors on the platform.
    3 - For the PowerPC External Interrupt option: The * argument call
    buffer for each simultaneous call must be physically unique.

    So, these rtas-calls can be called in a lockless way, if using
    a different buffer for each cpu doing such rtas call.

    For this, it was suggested to add the buffer (struct rtas_args)
    in the PACA struct, so each cpu can have it's own buffer.
    The PACA struct received a pointer to rtas buffer, which is
    allocated in the memory range available to rtas 32-bit.

    Reentrant rtas calls are useful to avoid deadlocks in crashing,
    where rtas-calls are needed, but some other thread crashed holding
    the rtas.lock.

    This is a backtrace of a deadlock from a kdump testing environment:

    #0 arch_spin_lock
    #1 lock_rtas ()
    #2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
    #3 ics_rtas_mask_real_irq (hw_irq=4100)
    #4 machine_kexec_mask_interrupts
    #5 default_machine_crash_shutdown
    #6 machine_crash_shutdown
    #7 __crash_kexec
    #8 crash_kexec
    #9 oops_end

    Signed-off-by: Leonardo Bras
    [mpe: Move under #ifdef PSERIES to avoid build breakage]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200518234245.200672-3-leobras.c@gmail.com

    Leonardo Bras
     

28 May, 2020

3 commits

  • XIVE interrupt controller uses an Event Queue (EQ) to enqueue event
    notifications when an exception occurs. The EQ is a single memory page
    provided by the O/S defining a circular buffer, one per server and
    priority couple.

    On baremetal, the EQ page is configured with an OPAL call. On pseries,
    an extra hop is necessary and the guest OS uses the hcall
    H_INT_SET_QUEUE_CONFIG to configure the XIVE interrupt controller.

    The XIVE controller being Hypervisor privileged, it will not be allowed
    to enqueue event notifications for a Secure VM unless the EQ pages are
    shared by the Secure VM.

    Hypervisor/Ultravisor still requires support for the TIMA and ESB page
    fault handlers. Until this is complete, QEMU can use the emulated XIVE
    device for Secure VMs, option "kernel_irqchip=off" on the QEMU pseries
    machine.

    Signed-off-by: Ram Pai
    Reviewed-by: Cedric Le Goater
    Reviewed-by: Greg Kurz
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200426020518.GC5853@oc0525413822.ibm.com

    Ram Pai
     
  • The latest Xilinx design tools called ISE and EDK has been released in
    October 2013. New tool doesn't support any PPC405/PPC440 new designs.
    These platforms are no longer supported and tested.

    PowerPC 405/440 port is orphan from 2013 by
    commit cdeb89943bfc ("MAINTAINERS: Fix incorrect status tag") and
    commit 19624236cce1 ("MAINTAINERS: Update Grant's email address and maintainership")
    that's why it is time to remove the support fot these platforms.

    Signed-off-by: Michal Simek
    Signed-off-by: Christophe Leroy
    Acked-by: Arnd Bergmann
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/8c593895e2cb57d232d85ce4d8c3a1aa7f0869cc.1590079968.git.christophe.leroy@csgroup.eu

    Michal Simek
     
  • The XIVE interrupt mode can be disabled with the "xive=off" kernel
    parameter, in which case there is nothing to present to the user in the
    associated /sys/kernel/debug/powerpc/xive file.

    Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200429075122.1216388-4-clg@kaod.org

    Cédric Le Goater
     

26 May, 2020

3 commits

  • Commit 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the
    machine crash handler") fixed an issue in the FW assisted dump of
    machines using hash MMU and the XIVE interrupt mode under the POWER
    hypervisor. It forced the mapping of the ESB page of interrupts being
    mapped in the Linux IRQ number space to make sure the 'crash kexec'
    sequence worked during such an event. But it didn't handle the
    un-mapping.

    This mapping is now blocking the removal of a passthrough IO adapter
    under the POWER hypervisor because it expects the guest OS to have
    cleared all page table entries related to the adapter. If some are
    still present, the RTAS call which isolates the PCI slot returns error
    9001 "valid outstanding translations".

    Remove these mapping in the IRQ data cleanup routine.

    Under KVM, this cleanup is not required because the ESB pages for the
    adapter interrupts are un-mapped from the guest by the hypervisor in
    the KVM XIVE native device. This is now redundant but it's harmless.

    Fixes: 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler")
    Cc: stable@vger.kernel.org # v5.5+
    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org

    Cédric Le Goater
     
  • Merge Christophe's large series to use huge pages for the linear
    mapping on 8xx.

    From his cover letter:

    The main purpose of this big series is to:
    - reorganise huge page handling to avoid using mm_slices.
    - use huge pages to map kernel memory on the 8xx.

    The 8xx supports 4 page sizes: 4k, 16k, 512k and 8M.
    It uses 2 Level page tables, PGD having 1024 entries, each entry
    covering 4M address space. Then each page table has 1024 entries.

    At the time being, page sizes are managed in PGD entries, implying
    the use of mm_slices as it can't mix several pages of the same size
    in one page table.

    The first purpose of this series is to reorganise things so that
    standard page tables can also handle 512k pages. This is done by
    adding a new _PAGE_HUGE flag which will be copied into the Level 1
    entry in the TLB miss handler. That done, we have 2 types of pages:
    - PGD entries to regular page tables handling 4k/16k and 512k pages
    - PGD entries to hugepd tables handling 8M pages.

    There is no need to mix 8M pages with other sizes, because a 8M page
    will use more than what a single PGD covers.

    Then comes the second purpose of this series. At the time being, the
    8xx has implemented special handling in the TLB miss handlers in order
    to transparently map kernel linear address space and the IMMR using
    huge pages by building the TLB entries in assembly at the time of the
    exception.

    As mm_slices is only for user space pages, and also because it would
    anyway not be convenient to slice kernel address space, it was not
    possible to use huge pages for kernel address space. But after step
    one of the series, it is now more flexible to use huge pages.

    This series drop all assembly 'just in time' handling of huge pages
    and use huge pages in page tables instead.

    Once the above is done, then comes icing on the cake:
    - Use huge pages for KASAN shadow mapping
    - Allow pinned TLBs with strict kernel rwx
    - Allow pinned TLBs with debug pagealloc

    Then, last but not least, those modifications for the 8xx allows the
    following improvement on book3s/32:
    - Mapping KASAN shadow with BATs
    - Allowing BATs with debug pagealloc

    All this allows to considerably simplify TLB miss handlers and associated
    initialisation. The overhead of reading page tables is negligible
    compared to the reduction of the miss handlers.

    While we were at touching pte_update(), some cleanup was done
    there too.

    Tested widely on 8xx and 832x. Boot tested on QEMU MAC99.

    Michael Ellerman
     
  • Only early debug requires IMMR to be mapped early.

    No need to set it up and pin it in assembly. Map it
    through page tables at udbg init when necessary.

    If CONFIG_PIN_TLB_IMMR is selected, pin it once we
    don't need the 32 Mb pinned RAM anymore.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/13c1e8539fdf363d3146f4884e5c3c76c6c308b5.1589866984.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     

20 May, 2020

1 commit


07 May, 2020

1 commit

  • When an interrupt has been handled, the OS notifies the interrupt
    controller with a EOI sequence. On a POWER9 system using the XIVE
    interrupt controller, this can be done with a load or a store
    operation on the ESB interrupt management page of the interrupt. The
    StoreEOI operation has less latency and improves interrupt handling
    performance but it was deactivated during the POWER9 DD2.0 timeframe
    because of ordering issues. We use the LoadEOI today but we plan to
    reactivate StoreEOI in future architectures.

    There is usually no need to enforce ordering between ESB load and
    store operations as they should lead to the same result. E.g. a store
    trigger and a load EOI can be executed in any order. Assuming the
    interrupt state is PQ=10, a store trigger followed by a load EOI will
    return a Q bit. In the reverse order, it will create a new interrupt
    trigger from HW. In both cases, the handler processing interrupts is
    notified.

    In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
    disable temporarily the interrupt source (mask/unmask). When the
    source is reenabled, the OS can detect if interrupts were received
    while the source was disabled and reinject them. This process needs
    special care when StoreEOI is activated. The ESB load and store
    operations should be correctly ordered because a XIVE_ESB_STORE_EOI
    operation could leave the source enabled if it has not completed
    before the loads.

    For those cases, we enforce Load-after-Store ordering with a special
    load operation offset. To avoid performance impact, this ordering is
    only enforced when really needed, that is when interrupt sources are
    temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
    be needed for other loads.

    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org

    Cédric Le Goater
     

20 Apr, 2020

1 commit


26 Mar, 2020

4 commits

  • As does XMON, the debugfs file /sys/kernel/debug/powerpc/xive exposes
    the XIVE internal state of the machine CPUs and interrupts. Available
    on the PowerNV and sPAPR platforms.

    Signed-off-by: Cédric Le Goater
    [mpe: Make the debugfs file 0400]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200306150143.5551-5-clg@kaod.org

    Cédric Le Goater
     
  • Some firmwares or hypervisors can advertise different source
    characteristics. Track their value under XMON. What we are mostly
    interested in is the StoreEOI flag.

    Signed-off-by: Cédric Le Goater
    Reviewed-by: Greg Kurz
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200306150143.5551-4-clg@kaod.org

    Cédric Le Goater
     
  • The PowerNV platform has multiple IRQ chips and the xmon command
    dumping the state of the XIVE interrupt should only operate on the
    XIVE IRQ chip.

    Fixes: 5896163f7f91 ("powerpc/xmon: Improve output of XIVE interrupts")
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Cédric Le Goater
    Reviewed-by: Greg Kurz
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200306150143.5551-3-clg@kaod.org

    Cédric Le Goater
     
  • When a CPU is brought up, an IPI number is allocated and recorded
    under the XIVE CPU structure. Invalid IPI numbers are tracked with
    interrupt number 0x0.

    On the PowerNV platform, the interrupt number space starts at 0x10 and
    this works fine. However, on the sPAPR platform, it is possible to
    allocate the interrupt number 0x0 and this raises an issue when CPU 0
    is unplugged. The XIVE spapr driver tracks allocated interrupt numbers
    in a bitmask and it is not correctly updated when interrupt number 0x0
    is freed. It stays allocated and it is then impossible to reallocate.

    Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms.

    Reported-by: David Gibson
    Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
    Cc: stable@vger.kernel.org # v4.14+
    Signed-off-by: Cédric Le Goater
    Reviewed-by: David Gibson
    Tested-by: David Gibson
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org

    Cédric Le Goater
     

04 Feb, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "A pretty small batch for us, and apologies for it being a bit late, I
    wanted to sneak Christophe's user_access_begin() series in.

    Summary:

    - Implement user_access_begin() and friends for our platforms that
    support controlling kernel access to userspace.

    - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.

    - Some tweaks to our pseries IOMMU code to allow SVMs ("secure"
    virtual machines) to use the IOMMU.

    - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit
    VDSO, and some other improvements.

    - A series to use the PCI hotplug framework to control opencapi
    card's so that they can be reset and re-read after flashing a new
    FPGA image.

    As well as other minor fixes and improvements as usual.

    Thanks to: Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy,
    Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen
    Zhou, Christophe Leroy, Frederic Barrat, Greg Kurz, Jason A.
    Donenfeld, Joel Stanley, Jordan Niethe, Julia Lawall, Krzysztof
    Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus Walleij, Michael
    Bringmann, Nathan Chancellor, Nicholas Piggin, Nick Desaulniers,
    Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy Dunlap,
    Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
    Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago
    Jung Bauermann, Tyrel Datwyler, Vaibhav Jain"

    * tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (131 commits)
    powerpc: configs: Cleanup old Kconfig options
    powerpc/configs/skiroot: Enable some more hardening options
    powerpc/configs/skiroot: Disable xmon default & enable reboot on panic
    powerpc/configs/skiroot: Enable security features
    powerpc/configs/skiroot: Update for symbol movement only
    powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV
    powerpc/configs/skiroot: Drop HID_LOGITECH
    powerpc/configs: Drop NET_VENDOR_HP which moved to staging
    powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE
    powerpc/configs: Drop CONFIG_QLGE which moved to staging
    powerpc: Do not consider weak unresolved symbol relocations as bad
    powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK
    powerpc: indent to improve Kconfig readability
    powerpc: Provide initial documentation for PAPR hcalls
    powerpc: Implement user_access_save() and user_access_restore()
    powerpc: Implement user_access_begin and friends
    powerpc/32s: Prepare prevent_user_access() for user_access_end()
    powerpc/32s: Drop NULL addr verification
    powerpc/kuap: Fix set direction in allow/prevent_user_access()
    powerpc/32s: Fix bad_kuap_fault()
    ...

    Linus Torvalds
     

25 Jan, 2020

1 commit


22 Jan, 2020

1 commit

  • A load on an ESB page returning all 1's means that the underlying
    device has invalidated the access to the PQ state of the interrupt
    through mmio. It may happen, for example when querying a PHB interrupt
    while the PHB is in an error state.

    In that case, we should consider the interrupt to be invalid when
    checking its state in the irq_get_irqchip_state() handler.

    Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race")
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Frederic Barrat
    [clg: wrote a commit log, introduced XIVE_ESB_INVALID ]
    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org

    Frederic Barrat
     

07 Jan, 2020

1 commit

  • The mpic_ipi_chip and mpic_irq_ht_chip structures are only copied
    into other structures, so make them const.

    The opportunity for this change was found using Coccinelle.

    Signed-off-by: Julia Lawall
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/1577864614-5543-10-git-send-email-Julia.Lawall@inria.fr

    Julia Lawall
     

04 Dec, 2019

1 commit

  • The PCI INTx interrupts and other LSI interrupts are handled differently
    under a sPAPR platform. When the interrupt source characteristics are
    queried, the hypervisor returns an H_INT_ESB flag to inform the OS
    that it should be using the H_INT_ESB hcall for interrupt management
    and not loads and stores on the interrupt ESB pages.

    A default -1 value is returned for the addresses of the ESB pages. The
    driver ignores this condition today and performs a bogus IO mapping.
    Recent changes and the DEBUG_VM configuration option make the bug
    visible with :

    kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1
    NIP: c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000
    REGS: c0000000fa45f0d0 TRAP: 0700 Not tainted (5.4.0-0.rc6.git0.1.fc32.ppc64le)
    ...
    NIP ioremap_page_range+0x4c4/0x6e0
    LR ioremap_page_range+0x74/0x6e0
    Call Trace:
    ioremap_page_range+0x74/0x6e0 (unreliable)
    do_ioremap+0x8c/0x120
    __ioremap_caller+0x128/0x140
    ioremap+0x30/0x50
    xive_spapr_populate_irq_data+0x170/0x260
    xive_irq_domain_map+0x8c/0x170
    irq_domain_associate+0xb4/0x2d0
    irq_create_mapping+0x1e0/0x3b0
    irq_create_fwspec_mapping+0x27c/0x3e0
    irq_create_of_mapping+0x98/0xb0
    of_irq_parse_and_map_pci+0x168/0x230
    pcibios_setup_device+0x88/0x250
    pcibios_setup_bus_devices+0x54/0x100
    __of_scan_bus+0x160/0x310
    pcibios_scan_phb+0x330/0x390
    pcibios_init+0x8c/0x128
    do_one_initcall+0x60/0x2c0
    kernel_init_freeable+0x290/0x378
    kernel_init+0x2c/0x148
    ret_from_kernel_thread+0x5c/0x80

    Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall")
    Cc: stable@vger.kernel.org # v4.14+
    Signed-off-by: Cédric Le Goater
    Tested-by: Daniel Axtens
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20191203163642.2428-1-clg@kaod.org

    Cédric Le Goater
     

01 Dec, 2019

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "Highlights:

    - Infrastructure for secure boot on some bare metal Power9 machines.
    The firmware support is still in development, so the code here
    won't actually activate secure boot on any existing systems.

    - A change to xmon (our crash handler / pseudo-debugger) to restrict
    it to read-only mode when the kernel is lockdown'ed, otherwise it's
    trivial to drop into xmon and modify kernel data, such as the
    lockdown state.

    - Support for KASLR on 32-bit BookE machines (Freescale / NXP).

    - Fixes for our flush_icache_range() and __kernel_sync_dicache()
    (VDSO) to work with memory ranges >4GB.

    - Some reworks of the pseries CMM (Cooperative Memory Management)
    driver to make it behave more like other balloon drivers and enable
    some cleanups of generic mm code.

    - A series of fixes to our hardware breakpoint support to properly
    handle unaligned watchpoint addresses.

    Plus a bunch of other smaller improvements, fixes and cleanups.

    Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
    Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
    Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
    Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
    Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
    Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
    Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
    Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
    Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
    Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
    Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"

    * tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
    powerpc/fixmap: fix crash with HIGHMEM
    x86/efi: remove unused variables
    powerpc: Define arch_is_kernel_initmem_freed() for lockdep
    powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
    powerpc: Avoid clang warnings around setjmp and longjmp
    powerpc: Don't add -mabi= flags when building with Clang
    powerpc: Fix Kconfig indentation
    powerpc/fixmap: don't clear fixmap area in paging_init()
    selftests/powerpc: spectre_v2 test must be built 64-bit
    powerpc/powernv: Disable native PCIe port management
    powerpc/kexec: Move kexec files into a dedicated subdir.
    powerpc/32: Split kexec low level code out of misc_32.S
    powerpc/sysdev: drop simple gpio
    powerpc/83xx: map IMMR with a BAT.
    powerpc/32s: automatically allocate BAT in setbat()
    powerpc/ioremap: warn on early use of ioremap()
    powerpc: Add support for GENERIC_EARLY_IOREMAP
    powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
    powerpc/8xx: use the fixmapped IMMR in cpm_reset()
    powerpc/8xx: add __init to cpm1 init functions
    ...

    Linus Torvalds
     

22 Nov, 2019

1 commit

  • Using a mask to represent bus DMA constraints has a set of limitations.
    The biggest one being it can only hold a power of two (minus one). The
    DMA mapping code is already aware of this and treats dev->bus_dma_mask
    as a limit. This quirk is already used by some architectures although
    still rare.

    With the introduction of the Raspberry Pi 4 we've found a new contender
    for the use of bus DMA limits, as its PCIe bus can only address the
    lower 3GB of memory (of a total of 4GB). This is impossible to represent
    with a mask. To make things worse the device-tree code rounds non power
    of two bus DMA limits to the next power of two, which is unacceptable in
    this case.

    In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
    over the tree and treat it as such. Note that dev->bus_dma_limit should
    contain the higher accessible DMA address.

    Signed-off-by: Nicolas Saenz Julienne
    Reviewed-by: Robin Murphy
    Signed-off-by: Christoph Hellwig

    Nicolas Saenz Julienne
     

21 Nov, 2019

1 commit

  • There is a config item CONFIG_SIMPLE_GPIO which
    provides simple memory mapped GPIOs specific to powerpc.

    However, the only platform which selects this option is
    mpc5200, and this platform doesn't use it.

    There are three boards calling simple_gpiochip_init(), but
    as they don't select CONFIG_SIMPLE_GPIO, this is just a nop.

    Simple_gpio is just redundant with the generic MMIO GPIO
    driver which can be found in driver/gpio/ and selected via
    CONFIG_GPIO_GENERIC_PLATFORM, so drop simple_gpio driver.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/bf930402613b41b42d0441b784e0cc43fc18d1fb.1572529632.git.christophe.leroy@c-s.fr

    Christophe Leroy
     

13 Nov, 2019

1 commit

  • When the machine crash handler is invoked, all interrupts are masked
    but interrupts which have not been started yet do not have an ESB page
    mapped in the Linux address space. This crashes the 'crash kexec'
    sequence on sPAPR guests.

    To fix, force the mapping of the ESB page when an interrupt is being
    mapped in the Linux IRQ number space. This is done by setting the
    initial state of the interrupt to OFF which is not necessarily the
    case on PowerNV.

    Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
    Cc: stable@vger.kernel.org # v4.12+
    Signed-off-by: Cédric Le Goater
    Reviewed-by: Greg Kurz
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20191031063100.3864-1-clg@kaod.org

    Cédric Le Goater
     

24 Sep, 2019

1 commit

  • On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB
    of memory running the following guest configs:

    guest A:
    - 224GB of memory
    - 56 VCPUs (sockets=1,cores=28,threads=2), where:
    VCPUs 0-1 are pinned to CPUs 0-3,
    VCPUs 2-3 are pinned to CPUs 4-7,
    ...
    VCPUs 54-55 are pinned to CPUs 108-111

    guest B:
    - 4GB of memory
    - 4 VCPUs (sockets=1,cores=4,threads=1)

    with the following workloads (with KSM and THP enabled in all):

    guest A:
    stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M

    guest B:
    stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M

    host:
    stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M

    the below soft-lockup traces were observed after an hour or so and
    persisted until the host was reset (this was found to be reliably
    reproducible for this configuration, for kernels 4.15, 4.18, 5.0,
    and 5.3-rc5):

    [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU
    [ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941
    [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709]
    [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913]
    [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331]
    [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338]
    [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525]
    [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
    [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU
    [ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243
    [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
    [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU
    [ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714
    [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
    [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU
    [ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403
    [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds.
    [ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds.
    [ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds.
    [ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds.
    [ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds.
    [ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds.
    [ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds.
    [ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1
    [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds.
    [ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1

    CPUs 45, 24, and 124 are stuck on spin locks, likely held by
    CPUs 105 and 31.

    CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on
    target CPU 42. For instance:

    # CPU 105 registers (via xmon)
    R00 = c00000000020b20c R16 = 00007d1bcd800000
    R01 = c00000363eaa7970 R17 = 0000000000000001
    R02 = c0000000019b3a00 R18 = 000000000000006b
    R03 = 000000000000002a R19 = 00007d537d7aecf0
    R04 = 000000000000002a R20 = 60000000000000e0
    R05 = 000000000000002a R21 = 0801000000000080
    R06 = c0002073fb0caa08 R22 = 0000000000000d60
    R07 = c0000000019ddd78 R23 = 0000000000000001
    R08 = 000000000000002a R24 = c00000000147a700
    R09 = 0000000000000001 R25 = c0002073fb0ca908
    R10 = c000008ffeb4e660 R26 = 0000000000000000
    R11 = c0002073fb0ca900 R27 = c0000000019e2464
    R12 = c000000000050790 R28 = c0000000000812b0
    R13 = c000207fff623e00 R29 = c0002073fb0ca808
    R14 = 00007d1bbee00000 R30 = c0002073fb0ca800
    R15 = 00007d1bcd600000 R31 = 0000000000000800
    pc = c00000000020b260 smp_call_function_many+0x3d0/0x460
    cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460
    lr = c00000000020b20c smp_call_function_many+0x37c/0x460
    msr = 900000010288b033 cr = 44024824
    ctr = c000000000050790 xer = 0000000000000000 trap = 100

    CPU 42 is running normally, doing VCPU work:

    # CPU 42 stack trace (via xmon)
    [link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv]
    [c000008ed3343820] c000008ed3343850 (unreliable)
    [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv]
    [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv]
    [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm]
    [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
    [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm]
    [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70
    [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120
    [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80
    [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70
    --- Exception: c00 (System Call) at 00007d545cfd7694
    SP (7d53ff7edf50) is in userspace

    It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION]
    was set for CPU 42 by at least 1 of the CPUs waiting in
    smp_call_function_many(), but somehow the corresponding
    call_single_queue entries were never processed by CPU 42, causing the
    callers to spin in csd_lock_wait() indefinitely.

    Nick Piggin suggested something similar to the following sequence as
    a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE
    IPI messages seems to be most common, but any mix of CALL_FUNCTION and
    !CALL_FUNCTION messages could trigger it):

    CPU
    X: smp_muxed_ipi_set_message():
    X: smp_mb()
    X: message[RESCHEDULE] = 1
    X: doorbell_global_ipi(42):
    X: kvmppc_set_host_ipi(42, 1)
    X: ppc_msgsnd_sync()/smp_mb()
    X: ppc_msgsnd() -> 42
    42: doorbell_exception(): // from CPU X
    42: ppc_msgsync()
    105: smp_muxed_ipi_set_message():
    105: smb_mb()
    // STORE DEFERRED DUE TO RE-ORDERING
    --105: message[CALL_FUNCTION] = 1
    | 105: doorbell_global_ipi(42):
    | 105: kvmppc_set_host_ipi(42, 1)
    | 42: kvmppc_set_host_ipi(42, 0)
    | 42: smp_ipi_demux_relaxed()
    | 42: // returns to executing guest
    | // RE-ORDERED STORE COMPLETES
    ->105: message[CALL_FUNCTION] = 1
    105: ppc_msgsnd_sync()/smp_mb()
    105: ppc_msgsnd() -> 42
    42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
    105: // hangs waiting on 42 to process messages/call_single_queue

    This can be prevented with an smp_mb() at the beginning of
    kvmppc_set_host_ipi(), such that stores to message[] (or other
    state indicated by the host_ipi flag) are ordered vs. the store to
    to host_ipi.

    However, doing so might still allow for the following scenario (not
    yet observed):

    CPU
    X: smp_muxed_ipi_set_message():
    X: smp_mb()
    X: message[RESCHEDULE] = 1
    X: doorbell_global_ipi(42):
    X: kvmppc_set_host_ipi(42, 1)
    X: ppc_msgsnd_sync()/smp_mb()
    X: ppc_msgsnd() -> 42
    42: doorbell_exception(): // from CPU X
    42: ppc_msgsync()
    // STORE DEFERRED DUE TO RE-ORDERING
    -- 42: kvmppc_set_host_ipi(42, 0)
    | 42: smp_ipi_demux_relaxed()
    | 105: smp_muxed_ipi_set_message():
    | 105: smb_mb()
    | 105: message[CALL_FUNCTION] = 1
    | 105: doorbell_global_ipi(42):
    | 105: kvmppc_set_host_ipi(42, 1)
    | // RE-ORDERED STORE COMPLETES
    -> 42: kvmppc_set_host_ipi(42, 0)
    42: // returns to executing guest
    105: ppc_msgsnd_sync()/smp_mb()
    105: ppc_msgsnd() -> 42
    42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
    105: // hangs waiting on 42 to process messages/call_single_queue

    Fixing this scenario would require an smp_mb() *after* clearing
    host_ipi flag in kvmppc_set_host_ipi() to order the store vs.
    subsequent processing of IPI messages.

    To handle both cases, this patch splits kvmppc_set_host_ipi() into
    separate set/clear functions, where we execute smp_mb() prior to
    setting host_ipi flag, and after clearing host_ipi flag. These
    functions pair with each other to synchronize the sender and receiver
    sides.

    With that change in place the above workload ran for 20 hours without
    triggering any lock-ups.

    Fixes: 755563bc79c7 ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0
    Signed-off-by: Michael Roth
    Acked-by: Paul Mackerras
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com

    Michael Roth
     

21 Sep, 2019

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "This is a bit late, partly due to me travelling, and partly due to a
    power outage knocking out some of my test systems *while* I was
    travelling.

    - Initial support for running on a system with an Ultravisor, which
    is software that runs below the hypervisor and protects guests
    against some attacks by the hypervisor.

    - Support for building the kernel to run as a "Secure Virtual
    Machine", ie. as a guest capable of running on a system with an
    Ultravisor.

    - Some changes to our DMA code on bare metal, to allow devices with
    medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
    DMA space.

    - Support for firmware assisted crash dumps on bare metal (powernv).

    - Two series fixing bugs in and refactoring our PCI EEH code.

    - A large series refactoring our exception entry code to use gas
    macros, both to make it more readable and also enable some future
    optimisations.

    As well as many cleanups and other minor features & fixups.

    Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
    Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
    JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
    Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
    Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
    Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
    Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
    Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
    Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
    Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
    O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
    Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
    Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
    Lendacky, Vasant Hegde"

    * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
    powerpc/mm/mce: Keep irqs disabled during lockless page table walk
    powerpc: Use ftrace_graph_ret_addr() when unwinding
    powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
    ftrace: Look up the address of return_to_handler() using helpers
    powerpc: dump kernel log before carrying out fadump or kdump
    docs: powerpc: Add missing documentation reference
    powerpc/xmon: Fix output of XIVE IPI
    powerpc/xmon: Improve output of XIVE interrupts
    powerpc/mm/radix: remove useless kernel messages
    powerpc/fadump: support holes in kernel boot memory area
    powerpc/fadump: remove RMA_START and RMA_END macros
    powerpc/fadump: update documentation about option to release opalcore
    powerpc/fadump: consider f/w load area
    powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
    powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
    powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
    powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
    powerpc/fadump: improve how crashed kernel's memory is reserved
    powerpc/fadump: consider reserved ranges while releasing memory
    powerpc/fadump: make crash memory ranges array allocation generic
    ...

    Linus Torvalds
     

13 Sep, 2019

2 commits

  • When dumping the XIVE state of an CPU IPI, xmon does not check if the
    CPU is started or not which can cause an error. Add a check for that
    and change the output to be on one line just as the XIVE interrupts of
    the machine.

    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20190910081850.26038-3-clg@kaod.org

    Cédric Le Goater
     
  • When looping on the list of interrupts, add the current value of the
    PQ bits with a load on the ESB page. This has the side effect of
    faulting the ESB page of all interrupts.

    Signed-off-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20190910081850.26038-2-clg@kaod.org

    Cédric Le Goater
     

12 Sep, 2019

1 commit

  • There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call
    to return the 32-bit value 0xffffffff when OPAL has run out of IRQs.
    Unfortunatelty, OPAL return values are signed 64-bit entities and
    errors are supposed to be negative. If that happens, the linux code
    confusingly treats 0xffffffff as a valid IRQ number and panics at some
    point.

    A fix was recently merged in skiboot:

    e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")

    but we need a workaround anyway to support older skiboots already
    in the field.

    Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error
    returned upon resource exhaustion.

    Cc: stable@vger.kernel.org # v4.12+
    Signed-off-by: Greg Kurz
    Reviewed-by: Cédric Le Goater
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/156821713818.1985334.14123187368108582810.stgit@bahia.lan

    Greg Kurz