30 May, 2018

1 commit

  • [ Upstream commit c9f4c6cf53bfafb639386a4c094929f13f573e04 ]

    smc allocates a certain number of CQ entries for used RoCE devices. For
    mlx5 devices the chosen constant number results in a large allocation
    causing this warning:

    [13355.124656] WARNING: CPU: 3 PID: 16535 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x2be/0x10c0
    [13355.124657] Modules linked in: smc_diag(O) smc(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter mlx5_ib ib_core sunrpc mlx5_core s390_trng rng_core ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common ptp pps_core eadm_sch dm_multipath dm_mod vhost_net tun vhost tap sch_fq_codel kvm ip_tables x_tables autofs4 [last unloaded: smc]
    [13355.124672] CPU: 3 PID: 16535 Comm: kworker/3:0 Tainted: G O 4.14.0uschi #1
    [13355.124673] Hardware name: IBM 3906 M04 704 (LPAR)
    [13355.124675] Workqueue: events smc_listen_work [smc]
    [13355.124677] task: 00000000e2f22100 task.stack: 0000000084720000
    [13355.124678] Krnl PSW : 0704c00180000000 000000000029da76 (__alloc_pages_nodemask+0x2be/0x10c0)
    [13355.124681] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    [13355.124682] Krnl GPRS: 0000000000000000 00550e00014080c0 0000000000000000 0000000000000001
    [13355.124684] 000000000029d8b6 00000000f3bfd710 0000000000000000 00000000014080c0
    [13355.124685] 0000000000000009 00000000ec277a00 0000000000200000 0000000000000000
    [13355.124686] 0000000000000000 00000000000001ff 000000000029d8b6 0000000084723720
    [13355.124708] Krnl Code: 000000000029da6a: a7110200 tmll %r1,512
    000000000029da6e: a774ff29 brc 7,29d8c0
    #000000000029da72: a7f40001 brc 15,29da74
    >000000000029da76: a7f4ff25 brc 15,29d8c0
    000000000029da7a: a7380000 lhi %r3,0
    000000000029da7e: a7f4fef1 brc 15,29d860
    000000000029da82: 5820f0c4 l %r2,196(%r15)
    000000000029da86: a53e0048 llilh %r3,72
    [13355.124720] Call Trace:
    [13355.124722] ([] __alloc_pages_nodemask+0xfe/0x10c0)
    [13355.124724] [] s390_dma_alloc+0x6e/0x148
    [13355.124733] [] mlx5_dma_zalloc_coherent_node+0x8e/0xe0 [mlx5_core]
    [13355.124740] [] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core]
    [13355.124744] [] mlx5_ib_create_cq+0x558/0x898 [mlx5_ib]
    [13355.124749] [] ib_create_cq+0x48/0x88 [ib_core]
    [13355.124751] [] smc_ib_setup_per_ibdev+0x52/0x118 [smc]
    [13355.124753] [] smc_conn_create+0x65e/0x728 [smc]
    [13355.124755] [] smc_listen_work+0x2d2/0x540 [smc]
    [13355.124756] [] process_one_work+0x1be/0x440
    [13355.124758] [] worker_thread+0x58/0x458
    [13355.124759] [] kthread+0x14e/0x168
    [13355.124760] [] kernel_thread_starter+0x6/0xc
    [13355.124762] [] kernel_thread_starter+0x0/0xc
    [13355.124762] Last Breaking-Event-Address:
    [13355.124764] [] __alloc_pages_nodemask+0x2ba/0x10c0
    [13355.124764] ---[ end trace 34be38b581c0b585 ]---

    This patch reduces the smc constant for the maximum number of allocated
    completion queue entries SMC_MAX_CQE by 2 to avoid high round up values
    in the mlx5 code, and reduces the number of allocated completion queue
    entries even more, if the final allocation for an mlx5 device hits the
    MAX_ORDER limit.

    Reported-by: Ihnken Menssen
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

22 Sep, 2017

1 commit

  • In the infiniband part, SMC currently uses get_netdev which calls
    dev_hold on the returned net device. However, the SMC code never calls
    dev_put on that net device resulting in a wrong reference count.

    This patch adds a dev_put after the usage of the net device to fix the
    issue.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     

30 Jul, 2017

5 commits

  • Usage of send buffer "sndbuf" is synced
    (a) before filling sndbuf for cpu access
    (b) after filling sndbuf for device access

    Usage of receive buffer "RMB" is synced
    (a) before reading RMB content for cpu access
    (b) after reading RMB content for device access

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • SMC send buffers are processed the same way as RMBs. Since RMBs have
    been converted to sg-logic, do the same for send buffers.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • A memory region created for a new RMB must be registered explicitly,
    before the peer can make use of it for remote DMA transfer.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • SMC currently uses the unsafe_global_rkey of the protection domain,
    which exposes all memory for remote reads and writes once a connection
    is established. This patch introduces separate memory regions with
    separate rkeys for every RMB. Now the unsafe_global_rkey of the
    protection domain is no longer needed.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The follow-on patch makes use of ib_map_mr_sg() when introducing
    separate memory regions for RMBs. This function is based on
    scatterlists; thus this patch introduces scatterlists for RMBs.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     

17 May, 2017

1 commit

  • Currently, SMC enables remote access to physical memory when a user
    has successfully configured and established an SMC-connection until ten
    minutes after the last SMC connection is closed. Because this is considered
    a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
    such a case.

    This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
    This improves user awareness, but does not remove the security risk itself.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     

02 May, 2017

2 commits

  • rdma_ah_attr can now be either ib or roce allowing
    core components to use one type or the other and also
    to define attributes unique to a specific type. struct
    ib_ah is also initialized with the type when its first
    created. This ensures that calls such as modify_ah
    dont modify the type of the address handle attribute.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Modify core and driver components to use accessor functions
    introduced to access individual fields of rdma_ah_attr

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     

12 Apr, 2017

2 commits

  • smc specifies IB_SEND_INLINE for IB_WR_SEND ib_post_send calls, but
    provides a mapped buffer to be sent. This is inconsistent, since
    IB_SEND_INLINE works without mapped buffer. Problem has not been
    detected in the past, because tests had been limited to Connect X3 cards
    from Mellanox, whose mlx4 driver just ignored the IB_SEND_INLINE flag.
    For now, the IB_SEND_INLINE flag is removed.

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The global event handler is created only, if the ib_device has already
    been used by at least one link group. It is guaranteed that there exists
    the corresponding entry in the smc_ib_devices list. Get rid of this
    superfluous check.

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Signed-off-by: David S. Miller

    Ursula Braun
     

10 Jan, 2017

5 commits

  • Prepare the link for RDMA transport:
    Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The base containers for RDMA transport are work requests and completion
    queue entries processed through Infiniband verbs:
    * allocate and initialize these areas
    * map these areas to DMA
    * implement the basic communication consisting of work request posting
    and receival of completion queue events

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • * allocate data RMB memory for sending and receiving
    * size depends on the maximum socket send and receive buffers
    * allocated RMBs are kept during life time of the owning link group
    * map the allocated RMBs to DMA

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Connection creation with SMC-R starts through an internal
    TCP-connection. The Ethernet interface for this TCP-connection is not
    restricted to the Ethernet interface of a RoCE device. Any existing
    Ethernet interface belonging to the same physical net can be used, as
    long as there is a defined relation between the Ethernet interface and
    some RoCE devices. This relation is defined with the help of an
    identification string called "Physical Net ID" or short "pnet ID".
    Information about defined pnet IDs and their related Ethernet
    interfaces and RoCE devices is stored in the SMC-R pnet table.

    A pnet table entry consists of the identifying pnet ID and the
    associated network and IB device.
    This patch adds pnet table configuration support using the
    generic netlink message interface referring to network and IB device
    by their names. Commands exist to add, delete, and display pnet table
    entries, and to flush or display the entire pnet table.

    There are cross-checks to verify whether the ethernet interfaces
    or infiniband devices really exist in the system. If either device
    is not available, the pnet ID entry is not created.
    Loss of network devices and IB devices is also monitored;
    a pnet ID entry is removed when an associated network or
    IB device is removed.

    Signed-off-by: Thomas Richter
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Thomas Richter
     
  • * create a list of SMC IB-devices

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun