13 Mar, 2020

20 commits

  • The PCI shutdown handler is invoked in response
    to system reboot or shutdown. A data transfer
    might still be in flight when this happens. So
    the very first action we take here is to send
    a link down notification, so that any pending
    data transfer is terminated. Rest of the actions
    are same as that of PCI remove handler.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • When the driver on the local side is loaded, it sets
    SIDE_READY bit in SIDE_INFO register. Likewise, when
    it is un-loaded, it clears the bit.

    Also just after being loaded, the driver polls for
    peer SIDE_READY bit to be set. Since that bit is set
    when the peer side driver has loaded, the polling on
    local side breaks as soon as this condition is met.

    But the situation is different when the driver is
    un-loaded. Since the polling has already been stopped
    as mentioned before, if the peer side driver gets
    un-loaded, the driver on the local side is not notified
    implicitly.

    So, we improvise using existing doorbell mechanism.
    We reserve the highest order bit of the DB register to
    send a notification to peer when the driver on local
    side is un-loaded. This also means that now we are one
    short of 16 DB events and that is taken care of in the
    valid DB mask.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • db_valid_mask is set at two places, once within
    amd_init_ntb(), and again within amd_init_dev().
    Since amd_init_ntb() is actually called from
    amd_init_dev(), setting db_valid_mask from
    former does not really make sense. So remove it.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Since NTB connects two physically separate systems,
    there can be scenarios where one system goes down
    while the other one remains active. In case of NTB
    primary, if the NTB secondary goes down, a Link-Down
    event is received. For the NTB secondary, if the
    NTB primary goes down, the PCIe hotplug mechanism
    ensures that the driver on the secondary side is also
    unloaded.

    But there are other scenarios to consider as well,
    when suppose the physical link remains active, but
    the driver on primary or secondary side is loaded
    or un-loaded.

    When the driver is loaded, on either side, it sets
    SIDE_READY bit(bit-1) of SIDE_INFO register. Similarly,
    when the driver is un-loaded, it resets the same bit.

    We consider the NTB link to be up and operational
    only when the driver on both sides of link are loaded
    and ready. But we also need to take account of
    Link Up and Down events which signify the physical
    link status. So amd_link_is_up() is modified to take
    care of the above scenarios.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • We define two new helper functions to set and clear
    sideinfo registers respectively. These functions
    take an additional boolean parameter which signifies
    whether we want to set/clear the sideinfo register
    of the peer(true) or local host(false).

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • It does not really make sense to enable or disable
    the bits of NTB_CTRL register only during enable
    and disable link callbacks. They should be done
    independent of these callbacks. The correct placement
    for that is during the amd_init_side_info() and
    amd_deinit_side_info() functions, which are invoked
    during probe and remove respectively.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Just like for Link-Down event, Link-Up and D3 events
    are also mutually exclusive to Link-Down and D0 events
    respectively. So we clear the bitmasks in peer_sta
    depending on event type.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Link-Up and Link-Down are mutually exclusive events.
    So when we receive a Link-Down event, we should also
    clear the bitmask for Link-Up event in peer_sta.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • amd_link_is_up() is a callback to inquire whether
    the NTB link is up or not. So it should not indulge
    itself into clearing the bitmasks of peer_sta.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • amd_ack_smu() should only set the corresponding
    bits into SMUACK register. Setting the bitmask
    of peer_sta should be done within the event handler.
    They are two different things, and so should be
    handled differently and at different places.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Bit 1 of SIDE_INFO register is an indication that
    the driver on the other side of link is ready. We
    set this bit during driver initialization sequence.
    So rather than having separate macros to return the
    status, we can simply return the status of this bit
    from amd_poll_link(). So a return of 1 or 0 from
    this function will indicate to the caller whether
    the driver on the other side of link is ready or not,
    respectively.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Since getting the status of link is a logically separate
    operation, we simply create a new function which will
    store the link status to be used later.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Link-Up and Link-Down events can occur irrespective
    of whether a data transfer is in progress or not.
    So we need to enable the interrupt delivery for
    these events early during driver load.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • The interrupt status register should be cleared
    by driver once the particular event is handled.
    The patch fixes this.

    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • The design of AMD NTB implementation is such that
    NTB primary acts as an endpoint device and NTB
    secondary is an endpoint device behind a combination
    of Switch Upstream and Switch Downstream. Considering
    that, the link status and control register needs to
    be accessed differently based on the NTB topology.

    So in the case of NTB secondary, we first get the
    pointer to the Switch Downstream device for the NTB
    device. Then we get the pointer to the Switch Upstream
    device. Once we have that, we read the Link Status
    and Control register to get the correct status of
    link at the secondary.

    In the case of NTB primary, simply reading the Link
    Status and Control register of the NTB device itself
    will suffice.

    Suggested-by: Jiasen Lin
    Signed-off-by: Arindam Nath
    Signed-off-by: Jon Mason

    Arindam Nath
     
  • Since snprintf() returns the would-be-output size instead of the
    actual output size, the succeeding calls may go beyond the given
    buffer limit. Fix it by replacing with scnprintf().

    Fixes: fce8a7bb5b4b (PCI-Express Non-Transparent Bridge Support)
    Fixes: 282a2feeb9bf (NTB: Use DMA Engine to Transmit and Receive)
    Fixes: a754a8fcaf38 (NTB: allocate number transport entries depending on size of ring size)
    Fixes: d98ef99e378b (NTB: Clean up QP stats info)
    Fixes: e74bfeedad08 (NTB: Add flow control to the ntb_netdev)
    Fixes: 569410ca756c (NTB: Use unique DMA channels for TX and RX)
    Signed-off-by: Takashi Iwai
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Takashi Iwai
     
  • ntb_mw_set_trans() should work as ntb_mw_clear_trans() when size == 0 and/or
    addr == 0. But error in xlate_pos checking condition prevents this.
    Fix the condition to make ntb_mw_clear_trans() working.

    Fixes: 87d11e645e31 (NTB: switchtec_ntb: Add memory window support)
    Signed-off-by: Alexander Fomichev
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Alexander Fomichev
     
  • The correct printk format is %pa or %pap, but not %pa[p].

    Fixes: 7f46c8b3a5523 ("NTB: ntb_tool: Add full multi-port NTB API support")
    Signed-off-by: Helge Deller
    Signed-off-by: Jon Mason

    Helge Deller
     
  • peer->outbuf is a virtual address which is get by ioremap, it can not
    be converted to a physical address by virt_to_page and page_to_phys.
    This conversion will result in DMA error, because the destination address
    which is converted by page_to_phys is invalid.

    This patch save the MMIO address of NTB BARx in perf_setup_peer_mw,
    and map the BAR space to DMA address after we assign the DMA channel.
    Then fill the destination address of DMA descriptor with this DMA address
    to guarantee that the address of memory write requests fall into
    memory window of NBT BARx with IOMMU enabled and disabled.

    Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
    Signed-off-by: Jiasen Lin
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Jiasen Lin
     
  • The offset of PCIe Capability Header for AMD and HYGON NTB is 0x64,
    but the macro which named "AMD_LINK_STATUS_OFFSET" is defined as 0x68.
    It is offset of Device Capabilities Reg rather than Link Control Reg.

    This code trigger an error in get link statsus:

    cat /sys/kernel/debug/ntb_hw_amd/0000:43:00.1/info
    LNK STA - 0x8fa1
    Link Status - Up
    Link Speed - PCI-E Gen 0
    Link Width - x0

    This patch use pcie_capability_read_dword to get link status.
    After fix this issue, we can get link status accurately:

    cat /sys/kernel/debug/ntb_hw_amd/0000:43:00.1/info
    LNK STA - 0x11030042
    Link Status - Up
    Link Speed - PCI-E Gen 3
    Link Width - x16

    Fixes: a1b3695820aa4 ("NTB: Add support for AMD PCI-Express Non-Transparent Bridge")
    Signed-off-by: Jiasen Lin
    Signed-off-by: Jon Mason

    Jiasen Lin
     

08 Dec, 2019

2 commits


16 Oct, 2019

1 commit

  • There is no need to check the return value of debugfs_create_atomic_t as
    nothing happens with the error. Also, the code will never return NULL,
    so this check has never caught anything :)

    Fix this by removing the check entirely.

    Cc: Jon Mason
    Cc: Dave Jiang
    Cc: Allen Hubbe
    Cc: linux-ntb@googlegroups.com
    Cc: linux-kernel@vger.kernel.org
    Link: https://lore.kernel.org/r/20191011131919.GA1174815@kroah.com
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

24 Sep, 2019

6 commits

  • Fix typos in drivers/ntb/hw/idt/Kconfig.
    Use consistent spelling and capitalization.

    Fixes: bf2a952d31d2 ("NTB: Add IDT 89HPESxNTx PCIe-switches support")
    Signed-off-by: Randy Dunlap
    Cc: Dave Jiang
    Cc: Allen Hubbe
    Cc: Serge Semin
    Signed-off-by: Jon Mason

    Randy Dunlap
     
  • The AMD new hardware uses BAR23 and BAR45 as memory windows
    as compared to previos where BAR1, BAR23 and BAR45 is used
    for memory windows.

    This patch add support for both AMD hardwares.

    Signed-off-by: Sanjay R Mehta
    Signed-off-by: Jon Mason

    Sanjay R Mehta
     
  • Signed-off-by: Sanjay R Mehta
    Signed-off-by: Jon Mason

    Sanjay R Mehta
     
  • Variable rc is initialized to a value that is never read and it
    is re-assigned later. The initialization is redundant and can be
    removed.

    Addresses-Coverity: ("Unused value")
    Signed-off-by: Colin Ian King
    Signed-off-by: Jon Mason

    Colin Ian King
     
  • On switchtec_ntb_mw_set_trans() call, when (only) address == 0, it acts as
    ntb_mw_clear_trans(). Fix this, since address == 0 and size != 0 is valid
    combination for setting translation.

    Signed-off-by: Alexander Fomichev
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Alexander Fomichev
     
  • second parameter of ntb_peer_mw_get_addr is pointing to wrong memory
    window index by passing "peer gidx" instead of "local gidx".

    For ex, "local gidx" value is '0' and "peer gidx" value is '1', then

    on peer side ntb_mw_set_trans() api is used as below with gidx pointing to
    local side gidx which is '0', so memroy window '0' is chosen and XLAT '0'
    will be programmed by peer side.

    ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat,
    peer->inbuf_size);

    Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx
    pointing to "peer gidx" which is '1', so pointing to memory window '1'
    instead of memory window '0'.

    ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr,
    &peer->outbuf_size);

    So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().

    Signed-off-by: Sanjay R Mehta
    Signed-off-by: Jon Mason

    Sanjay R Mehta
     

06 Aug, 2019

1 commit

  • msi.c is not a module on its own right and should not have the
    MODULE_[LICENSE|VERSION|AUTHOR|DESCRIPTION] definitions.

    This caused a regression noticed by lkp with the following back
    trace:

    WARNING: CPU: 0 PID: 1 at kernel/params.c:861 param_sysfs_init+0xb1/0x20a
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc1-00018-g26b3a37b928457 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    RIP: 0010:param_sysfs_init+0xb1/0x20a
    Code: 24 38 e8 ec 17 2e fd 49 8b 7c 24 38 e8 76 fe ff ff 48 85 c0 48 89 c5 74 25 31 d2 4c 89 e6 48 89 c7 e8 6d 6f 3c fd 85 c0 74 02 0b 48 89 ef 31 f6 e8 5d 70 a7 fe 48 89 ef e8 95 52 a7 fe 48 83
    RSP: 0000:ffff88806b0ffe30 EFLAGS: 00010282
    RAX: 00000000ffffffef RBX: ffffffff83774220 RCX: ffff88806a85e880
    RDX: 00000000ffffffef RSI: ffff88806b000400 RDI: ffff88806a8608c0
    RBP: ffff88806b392000 R08: ffffed100d61ff59 R09: ffffed100d61ff59
    R10: 0000000000000001 R11: ffffed100d61ff58 R12: ffffffff83974bc0
    R13: 0000000000000004 R14: 0000000000000028 R15: 00000000000003b9
    FS: 0000000000000000(0000) GS:ffff88806b800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000000380e000 CR4: 00000000000406b0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    ? file_caps_disable+0x10/0x10
    ? locate_module_kobject+0xf2/0xf2
    do_one_initcall+0x47/0x1f0
    kernel_init_freeable+0x1b1/0x243
    ? rest_init+0xd0/0xd0
    kernel_init+0xa/0x130
    ? calculate_sigpending+0x63/0x80
    ? rest_init+0xd0/0xd0
    ret_from_fork+0x1f/0x30
    ---[ end trace 78201497ae74cc91 ]---

    Reported-by: kernel test robot
    Fixes: 26b3a37b9284 ("NTB: Introduce MSI library")
    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Logan Gunthorpe
     

22 Jul, 2019

1 commit

  • Pull NTB updates from Jon Mason:
    "New feature to add support for NTB virtual MSI interrupts, the ability
    to test and use this feature in the NTB transport layer.

    Also, bug fixes for the AMD and Switchtec drivers, as well as some
    general patches"

    * tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
    NTB: Describe the ntb_msi_test client in the documentation.
    NTB: Add MSI interrupt support to ntb_transport
    NTB: Add ntb_msi_test support to ntb_test
    NTB: Introduce NTB MSI Test Client
    NTB: Introduce MSI library
    NTB: Rename ntb.c to support multiple source files in the module
    NTB: Introduce functions to calculate multi-port resource index
    NTB: Introduce helper functions to calculate logical port number
    PCI/switchtec: Add module parameter to request more interrupts
    PCI/MSI: Support allocating virtual MSI interrupts
    ntb_hw_switchtec: Fix setup MW with failure bug
    ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
    ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
    NTB: correct ntb_dev_ops and ntb_dev comment typos
    NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
    ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
    NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
    NTB: ntb_hw_amd: set peer limit register
    NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
    NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
    ...

    Linus Torvalds
     

13 Jun, 2019

9 commits

  • Introduce the module parameter 'use_msi' which, when set, uses
    MSI interrupts instead of doorbells for each queue pair (QP). The
    parameter is only available if NTB MSI support is configured into
    the kernel. We also require there to be more than one memory window
    (MW) so that an extra one is available to forward the APIC region.

    To use MSIs, we request one interrupt per QP and forward the MSI address
    and data to the peer using scratch pad registers (SPADS) above the MW
    SPADS. (If there are not enough SPADS the MSI interrupt will not be used.)

    Once registered, we simply use ntb_msi_peer_trigger and the receiving
    ISR simply queues up the rxc_db_work for the queue.

    This addition can significantly improve performance of ntb_transport.
    In a simple, untuned, apples-to-apples comparision using ntb_netdev
    and iperf with switchtec hardware, I see 3.88Gb/s without MSI
    interrupts and 14.1Gb/s wit MSI, which is a more than 3x improvement.

    Signed-off-by: Logan Gunthorpe
    Cc: Dave Jiang
    Cc: Allen Hubbe
    Signed-off-by: Jon Mason

    Logan Gunthorpe
     
  • Introduce a tool to test NTB MSI interrupts similar to the other
    NTB test tools. This tool creates a debugfs directory for each
    NTB device with the following files:

    port
    irqX_occurrences
    peerX/port
    peerX/count
    peerX/trigger

    The 'port' file tells the user the local port number and the
    'occurrences' files tell the number of local interrupts that
    have been received for each interrupt.

    For each peer, the 'port' file and the 'count' file tell you the
    peer's port number and number of interrupts respectively. Writing
    the interrupt number to the 'trigger' file triggers the interrupt
    handler for the peer which should increment their corresponding
    'occurrences' file. The 'ready' file indicates if a peer is ready,
    writing to this file blocks until it is ready.

    The module parameter num_irqs can be used to set the number of
    local interrupts. By default this is 4. This is only limited by
    the number of unused MSI interrupts registered by the hardware
    (this will require support of the hardware driver) and there must
    be at least 2*num_irqs + 1 spads registers available.

    Signed-off-by: Logan Gunthorpe
    Cc: Dave Jiang
    Cc: Allen Hubbe
    Signed-off-by: Jon Mason

    Logan Gunthorpe
     
  • The NTB MSI library allows passing MSI interrupts across a memory
    window. This offers similar functionality to doorbells or messages
    except will often have much better latency and the client can
    potentially use significantly more remote interrupts than typical hardware
    provides for doorbells. (Which can be important in high-multiport
    setups.)

    The library utilizes one memory window per peer and uses the highest
    index memory windows. Before any ntb_msi function may be used, the user
    must call ntb_msi_init(). It may then setup and tear down the memory
    windows when the link state changes using ntb_msi_setup_mws() and
    ntb_msi_clear_mws().

    The peer which receives the interrupt must call ntb_msim_request_irq()
    to assign the interrupt handler (this function is functionally
    similar to devm_request_irq()) and the returned descriptor must be
    transferred to the peer which can use it to trigger the interrupt.
    The triggering peer, once having received the descriptor, can
    trigger the interrupt by calling ntb_msi_peer_trigger().

    Signed-off-by: Logan Gunthorpe
    Cc: Dave Jiang
    Cc: Allen Hubbe
    Signed-off-by: Jon Mason

    Logan Gunthorpe
     
  • The kbuild system does not support having multiple source files in
    a module if one of those source files has the same name as the module.

    Therefore, we must rename ntb.c to core.c, while the module remains
    ntb.ko.

    This is similar to the way the nvme modules are structured.

    Signed-off-by: Logan Gunthorpe
    Cc: Dave Jiang
    Cc: Allen Hubbe
    Signed-off-by: Jon Mason

    Logan Gunthorpe
     
  • Switchtec does not support setting multiple MWs simultaneously. The
    driver takes a hardware lock to ensure that two peers are not doing this
    simultaneously and it fails if someone else takes the lock. In most
    cases, this is fine as clients only setup the MWs once on one side of
    the link.

    However, there's a race condition when a re-initialization is caused by
    a link event. The driver will re-setup the shared memory window
    asynchronously and this races with the client setting up it's memory
    windows on the link up event.

    To fix this we ensure do the entire initialization in a work queue and
    signal the client once it's done.

    Signed-off-by: Joey Zhang
    Signed-off-by: Wesley Sheng
    Signed-off-by: Jon Mason

    Joey Zhang
     
  • In case of NTB crosslink topology, the setting of shared memory window in
    the virtual partition doesn't reset on peer's reboot. So skip the
    unnecessary re-setup of shared memory window for that case.

    Signed-off-by: Wesley Sheng
    Signed-off-by: Jon Mason

    Wesley Sheng
     
  • When a re-initialization is caused by a link event, the driver will
    re-setup the shared memory window. But at that time, the shared memory
    is still valid, and it's unnecessary to free, reallocate and then
    initialize it again. We only need to reconfigure the hardware
    registers. Remove the redundant steps from
    switchtec_ntb_reinit_peer() function.

    Signed-off-by: Joey Zhang
    Signed-off-by: Wesley Sheng
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Joey Zhang
     
  • This code triggers a Smatch warning:

    drivers/ntb/hw/amd/ntb_hw_amd.c:336 amd_ntb_db_vector_mask()
    warn: should '(1 << db_vector)' be a 64 bit type?

    I don't think "db_vector" can be higher than 16 so this doesn't affect
    runtime, but it's nice to silence the static checker warning and we
    might increase "ndev->db_count" in the future.

    Signed-off-by: Dan Carpenter
    Acked-by: Shyam Sundar S K
    Signed-off-by: Jon Mason

    Dan Carpenter
     
  • This code triggers a Smatch warning:

    drivers/ntb/hw/mscc/ntb_hw_switchtec.c:884 switchtec_ntb_init_sndev()
    warn: should '(1 << sndev->peer_partition)' be a 64 bit type?

    The "part_map" and "tpart_vec" variables are u64 type so this seems like
    a valid warning.

    Fixes: 3df54c870f52 ("ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Logan Gunthorpe
    Signed-off-by: Jon Mason

    Dan Carpenter