10 May, 2019

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "This has been a smaller cycle than normal. One new driver was
    accepted, which is unusual, and at least one more driver remains in
    review on the list.

    Summary:

    - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
    vmw_pvrdma

    - Many patches from MatthewW converting radix tree and IDR users to
    use xarray

    - Introduction of tracepoints to the MAD layer

    - Build large SGLs at the start for DMA mapping and get the driver to
    split them

    - Generally clean SGL handling code throughout the subsystem

    - Support for restricting RDMA devices to net namespaces for
    containers

    - Progress to remove object allocation boilerplate code from drivers

    - Change in how the mlx5 driver shows representor ports linked to VFs

    - mlx5 uapi feature to access the on chip SW ICM memory

    - Add a new driver for 'EFA'. This is HW that supports user space
    packet processing through QPs in Amazon's cloud"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
    RDMA/ipoib: Allow user space differentiate between valid dev_port
    IB/core, ipoib: Do not overreact to SM LID change event
    RDMA/device: Don't fire uevent before device is fully initialized
    lib/scatterlist: Remove leftover from sg_page_iter comment
    RDMA/efa: Add driver to Kconfig/Makefile
    RDMA/efa: Add the efa module
    RDMA/efa: Add EFA verbs implementation
    RDMA/efa: Add common command handlers
    RDMA/efa: Implement functions that submit and complete admin commands
    RDMA/efa: Add the ABI definitions
    RDMA/efa: Add the com service API definitions
    RDMA/efa: Add the efa_com.h file
    RDMA/efa: Add the efa.h header file
    RDMA/efa: Add EFA device definitions
    RDMA: Add EFA related definitions
    RDMA/umem: Remove hugetlb flag
    RDMA/bnxt_re: Use core helpers to get aligned DMA address
    RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
    RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
    RDMA/umem: Add API to find best driver supported page size in an MR
    ...

    Linus Torvalds
     

03 May, 2019

1 commit

  • Use core provided API to fill the source MAC address and use
    rdma_read_gid_attr_ndev_rcu() to get stable netdev.

    This is preparation patch to allow gid attribute to become NULL when
    associated net device is removed.

    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Parav Pandit
     

28 Apr, 2019

1 commit

  • Add options to strictly validate messages and dump messages,
    sometimes perhaps validating dump messages non-strictly may
    be required, so add an option for that as well.

    Since none of this can really be applied to existing commands,
    set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
    {
    .cmd = X,
    + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
    ...
    },
    ...
    };

    For new commands one should just not copy the .validate 'opt-out'
    flags and thus get strict validation.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

18 Apr, 2019

1 commit


13 Apr, 2019

8 commits

  • Rework smc_conn_create() to always return a valid DECLINE reason code.
    This removes the need to translate the return codes on 4 different
    places and allows to easily add more detailed return codes by changing
    smc_conn_create() only.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • Rework smc_listen_work() to provide improved reason codes when an
    SMC connection is declined. This allows better debugging on user side.
    This also adds 3 more detailed reason codes in smc_clc.h to indicate
    what type of device was not found (ism or rdma or both), or if ism
    cannot talk to the peer.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • In smc_listen_work() the variables rc and reason_code are defined which
    have the same meaning. Eliminate reason_code in favor of the shorter
    name rc. No functional changes.
    Rename the functions smc_check_ism() and smc_check_rdma() into
    smc_find_ism_device() and smc_find_rdma_device() to make there purpose
    more clear. No functional changes.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • The vlan_id of the underlying CLC socket was retrieved two times
    during processing of the listen handshaking. Change this to get the
    vlan id one time in connect and in listen processing, and reuse the id.
    And add a new CLC DECLINE return code for the case when the retrieval
    of the vlan id failed.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • During initialization of an SMC socket a lot of function parameters need
    to get passed down the function call path. Consolidate the parameters
    in a helper struct so there are less enough parameters to get all passed
    by register.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • The check for a matching ip prefix and subnet was only done for SMC-R
    in smc_listen_rdma_check() but not when an SMC-D connection was
    possible. Rename the function into smc_listen_prfx_check() and move its
    call to a place where it is called for both SMC variants.
    And add a new CLC DECLINE reason for the case when the IP prefix or
    subnet check fails so the reason for the failing SMC connection can be
    found out more easily.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • Correct the CLC decline reason codes for internal problems to not have
    the sign bit set, negative reason codes are interpreted as not eligible
    for TCP fallback.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • For nonblocking sockets move the kernel_connect() from the connect
    worker into the initial smc_connect part to return kernel_connect()
    errors other than -EINPROGRESS to user space.

    Reviewed-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     

12 Apr, 2019

5 commits

  • Commit
    ("net/smc: move unhash as early as possible in smc_release()")
    fixes one occurrence in the smc code, but the same pattern exists
    in other places. This patch covers the remaining occurrences and
    makes sure, the unhash operation is done before the smc->clcsock is
    released. This avoids a potential use-after-free in smc_diag_dump().

    Reviewed-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The FLUSH command is used to empty the pnet table. No return code is
    expected from the command. Commit a9d8b0b1e3d6 added namespace support
    for the pnet table and changed the FLUSH command processing to call
    smc_pnet_remove_by_pnetid() to remove the pnet entries. This function
    returns -ENOENT when no entry was deleted, which is now the return code
    of the FLUSH command. As a result the FLUSH command will return an error
    when the pnet table is already empty.
    Restore the expected behavior and let FLUSH always return 0.

    Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support")
    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
    that are delivered when out-of-band data arrives on socket fd.
    If an SMC socket program makes use of such an fcntl() call, it fails
    in case of fallback to TCP-mode. In case of fallback the traffic is
    processed with the internal TCP socket. Propagating field "file" from the
    SMC socket to the internal TCP socket fixes the issue.

    Reviewed-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • In case alloc_ordered_workqueue fails, the fix returns NULL
    to avoid NULL pointer dereference.

    Signed-off-by: Kangjie Lu
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Kangjie Lu
     
  • When the clcsock is already released using sock_release() and a pending
    smc_listen_work accesses the clcsock than that will fail. Solve this
    by canceling and waiting for the work to complete first. Because the
    work holds the sock_lock it must make sure that the lock is not hold
    before the new helper smc_clcsock_release() is invoked. And before the
    smc_listen_work starts working check if the parent listen socket is
    still valid, otherwise stop the work early.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     

22 Mar, 2019

1 commit

  • Since maxattr is common, the policy can't really differ sanely,
    so make it common as well.

    The only user that did in fact manage to make a non-common policy
    is taskstats, which has to be really careful about it (since it's
    still using a common maxattr!). This is no longer supported, but
    we can fake it using pre_doit.

    This reduces the size of e.g. nl80211.o (which has lots of commands):

    text data bss dec hex filename
    398745 14323 2240 415308 6564c net/wireless/nl80211.o (before)
    397913 14331 2240 414484 65314 net/wireless/nl80211.o (after)
    --------------------------------
    -832 +8 0 -824

    Which is obviously just 8 bytes for each command, and an added 8
    bytes for the new policy pointer. I'm not sure why the ops list is
    counted as .text though.

    Most of the code transformations were done using the following spatch:
    @ops@
    identifier OPS;
    expression POLICY;
    @@
    struct genl_ops OPS[] = {
    ...,
    {
    - .policy = POLICY,
    },
    ...
    };

    @@
    identifier ops.OPS;
    expression ops.POLICY;
    identifier fam;
    expression M;
    @@
    struct genl_family fam = {
    .ops = OPS,
    .maxattr = M,
    + .policy = POLICY,
    ...
    };

    This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
    the cb->data as ops, which we want to change in a later genl patch.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

13 Mar, 2019

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted fixes (really no common topic here)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Make __vfs_write() static
    vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
    pipe: stop using ->can_merge
    splice: don't merge into linked buffers
    fs: move generic stat response attr handling to vfs_getattr_nosec
    orangefs: don't reinitialize result_mask in ->getattr
    fs/devpts: always delete dcache dentry-s in dput()

    Linus Torvalds
     

01 Mar, 2019

1 commit

  • Without hardware pnetid support there must currently be a pnet
    table configured to determine the IB device port to be used for SMC
    RDMA traffic. This patch enables a setup without pnet table, if
    the used handshake interface belongs already to a RoCE port.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     

25 Feb, 2019

1 commit

  • Three conflicts, one of which, for marvell10g.c is non-trivial and
    requires some follow-up from Heiner or someone else.

    The issue is that Heiner converted the marvell10g driver over to
    use the generic c45 code as much as possible.

    However, in 'net' a bug fix appeared which makes sure that a new
    local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
    is cleared.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Feb, 2019

6 commits

  • SMC-D devices are identified by their PCI IDs in the pnet table. In
    order to make usage of the pnet table more consistent for users, this
    patch adds this form of identification for ib devices as well.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     
  • This patch adds namespace support to the pnet table code. Each network
    namespace gets its own pnet table. Infiniband and smcd device pnetids
    can only be modified in the initial namespace. In other namespaces they
    can still be used as if they were set by the underlying hardware.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     
  • Currently, users can only set pnetids for netdevs and ib devices in the
    pnet table. This patch adds support for smcd devices to the pnet table.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     
  • If a device does not have a pnetid, users can set a temporary pnetid for
    said device in the pnet table. This patch reworks the pnet table to make
    it more flexible. Multiple entries with the same pnetid but differing
    devices are now allowed. Additionally, the netlink interface now sends
    each mapping from pnetid to device separately to the user while
    maintaining the message format existing applications might expect. Also,
    the SMC data structure for ib devices already has a pnetid attribute.
    So, it is used to store the user defined pnetids. As a result, the pnet
    table entries are only used for netdevs.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     
  • Use local variable pflags from the beginning of function
    smcr_tx_sndbuf_nonempty

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • smc_poll() returns with mask bit EPOLLPRI if the connection urg_state
    is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals
    EPOLLPRI errorneously if called in state SMC_INIT before the connection
    is created, for instance in a non-blocking connect scenario.

    This patch switches to non-zero values for the urg states.

    Reviewed-by: Karsten Graul
    Fixes: de8474eb9d50 ("net/smc: urgent data support")
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     

16 Feb, 2019

1 commit

  • The netfilter conflicts were rather simple overlapping
    changes.

    However, the cls_tcindex.c stuff was a bit more complex.

    On the 'net' side, Cong is fixing several races and memory
    leaks. Whilst on the 'net-next' side we have Vlad adding
    the rtnl-ness support.

    What I've decided to do, in order to resolve this, is revert the
    conversion over to using a workqueue that Cong did, bringing us back
    to pure RCU. I did it this way because I believe that either Cong's
    races don't apply with have Vlad did things, or Cong will have to
    implement the race fix slightly differently.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Feb, 2019

6 commits

  • For robustness protect of higher port numbers than expected to avoid
    setting bits behind our port_event_mask. In case of an DEVICE_FATAL
    event all ports must be checked. The IB_EVENT_GID_CHANGE event is
    provided in the global event handler, so handle it there. And handle a
    QP_FATAL event instead of an DEVICE_FATAL event in the qp handler.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • Remove the shortcut that smc_lgr_free() would skip the check for
    existing connections when the link group is not in the link group list.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • In smc_cdc_msg_recv_action() the received cdc message is evaluated.
    To reduce the number of messaged triggered by this evaluation the logic
    is streamlined. For the write_blocked condition we do not need to send
    a response immediately. The remaining conditions can be put together
    into one if clause.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • When no free transfer buffers are available then a work to call
    smc_tx_work() is scheduled. Set the schedule delay to zero, because for
    the out-of-buffers condition the work can start immediately and will
    block in the called function smc_wr_tx_get_free_slot(), waiting for free
    buffers.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • Move the call to smc_close_wake_tx_prepared() (which wakes up a possibly
    waiting close processing that might wait for 'all data sent') to
    smc_tx_sndbuf_nonempty() (which is the main function to send data).

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • When an updated rx_cursor_confirmed field was sent to the peer then
    reset the cons_curs_upd_req flag. And remove the duplicate reset and
    cursor update in smc_tx_consumer_update().

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     

09 Feb, 2019

2 commits


08 Feb, 2019

4 commits

  • Commit ed75986f4aae ("net/smc: ipv6 support for smc_diag.c") changed the
    value of the diag_family field. The idea was to indicate the family of
    the IP address in the inet_diag_sockid field. But the change makes it
    impossible to distinguish an inet_sock_diag response message from SMC
    sock_diag response. This patch restores the original behaviour and sends
    AF_SMC as value of the diag_family field.

    Fixes: ed75986f4aae ("net/smc: ipv6 support for smc_diag.c")
    Reported-by: Eugene Syromiatnikov
    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • The lgr field of an smc_connection is set in smc_conn_create() and
    should be cleared in smc_conn_free() for consistency reasons, so move
    the responsible code.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Karsten Graul
     
  • If SMC client and server connections are both established at the same
    time, smc_connect_rdma() cannot send a CLC confirm message while
    smc_listen_work() is waiting for one due to lock contention. This can
    result in timeouts in smc_clc_wait_msg() and failed SMC connections.

    In case of SMC-R, there are two types of LGRs (client and server LGRs)
    which can be protected by separate locks. So, this patch splits the LGR
    pending lock into two separate locks for client and server to avoid the
    locking issue for SMC-R.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     
  • If SMC client and server connections are both established at the same
    time, smc_connect_ism() cannot send a CLC confirm message while
    smc_listen_work() is waiting for one due to lock contention. This can
    result in timeouts in smc_clc_wait_msg() and failed SMC connections.

    In case of SMC-D, the LGR pending lock is not needed while
    smc_listen_work() is waiting for the CLC confirm message. So, this patch
    releases the lock earlier for SMC-D to avoid the locking issue.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel