21 Dec, 2016

1 commit

  • To make the code clearer, use rb_entry() instead of container_of() to
    deal with rbtree.

    Signed-off-by: Geliang Tang
    Reviewed-by: Leon Romanovsky
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Geliang Tang
     

16 Dec, 2016

1 commit

  • Pull rdma updates from Doug Ledford:
    "This is the complete update for the rdma stack for this release cycle.

    Most of it is typical driver and core updates, but there is the
    entirely new VMWare pvrdma driver. You may have noticed that there
    were changes in DaveM's pull request to the bnxt Ethernet driver to
    support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
    be pulled in this release cycle, but it simply wasn't ready in time
    and was dropped (a few review comments still to address, and some
    multi-arch build issues like prefetch() not working across all
    arches).

    Summary:

    - shared mlx5 updates with net stack (will drop out on merge if
    Dave's tree has already been merged)

    - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe

    - debug cleanups

    - new connection rejection helpers

    - SRP updates

    - various misc fixes

    - new paravirt driver from vmware"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
    IB: Add vmw_pvrdma driver
    IB/mlx4: fix improper return value
    IB/ocrdma: fix bad initialization
    infiniband: nes: return value of skb_linearize should be handled
    MAINTAINERS: Update Intel RDMA RNIC driver maintainers
    MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
    IB/core: fix unmap_sg argument
    qede: fix general protection fault may occur on probe
    IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
    mlx5, calc_sq_size(): Make a debug message more informative
    mlx5: Remove a set-but-not-used variable
    mlx5: Use { } instead of { 0 } to init struct
    IB/srp: Make writing the add_target sysfs attr interruptible
    IB/srp: Make mapping failures easier to debug
    IB/srp: Make login failures easier to debug
    IB/srp: Introduce a local variable in srp_add_one()
    IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
    IB/multicast: Check ib_find_pkey() return value
    IPoIB: Avoid reading an uninitialized member variable
    IB/mad: Fix an array index check
    ...

    Linus Torvalds
     

15 Dec, 2016

1 commit


04 Dec, 2016

1 commit

  • Couple conflicts resolved here:

    1) In the MACB driver, a bug fix to properly initialize the
    RX tail pointer properly overlapped with some changes
    to support variable sized rings.

    2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
    overlapping with a reorganization of the driver to support
    ACPI, OF, as well as PCI variants of the chip.

    3) In 'net' we had several probe error path bug fixes to the
    stmmac driver, meanwhile a lot of this code was cleaned up
    and reorganized in 'net-next'.

    4) The cls_flower classifier obtained a helper function in
    'net-next' called __fl_delete() and this overlapped with
    Daniel Borkamann's bug fix to use RCU for object destruction
    in 'net'. It also overlapped with Jiri's change to guard
    the rhashtable_remove_fast() call with a check against
    tc_skip_sw().

    5) In mlx4, a revert bug fix in 'net' overlapped with some
    unrelated changes in 'net-next'.

    6) In geneve, a stale header pointer after pskb_expand_head()
    bug fix in 'net' overlapped with a large reorganization of
    the same code in 'net-next'. Since the 'net-next' code no
    longer had the bug in question, there was nothing to do
    other than to simply take the 'net-next' hunks.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Dec, 2016

1 commit


18 Nov, 2016

4 commits

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • When 2 RDS peers initiate an RDS-TCP connection simultaneously,
    there is a potential for "duelling syns" on either/both sides.
    See commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
    outgoing socket in rds_tcp_accept_one()") for a description of this
    condition, and the arbitration logic which ensures that the
    numerically large IP address in the TCP connection is bound to the
    RDS_TCP_PORT ("canonical ordering").

    The rds_connection should not be marked as RDS_CONN_UP until the
    arbitration logic has converged for the following reason. The sender
    may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
    and since the sender removes all datagrams from the rds_connection's
    cp_retrans queue based on TCP acks. If the TCP ack was sent from
    a tcp socket that got reset as part of duel aribitration (but
    before data was delivered to the receivers RDS socket layer),
    the sender may end up prematurely freeing the datagram, and
    the datagram is no longer reliably deliverable.

    This patch remedies that condition by making sure that, upon
    receipt of 3WH completion state change notification of TCP_ESTABLISHED
    in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
    if, and only if, the IP addresses and ports for the connection are
    canonically ordered. In all other cases, rds_tcp_state_change will
    force an rds_conn_path_drop(), and rds_queue_reconnect() on
    both peers will restart the connection to ensure canonical ordering.

    A side-effect of enforcing this condition in rds_tcp_state_change()
    is that rds_tcp_accept_one_path() can now be refactored for simplicity.
    It is also no longer possible to encounter an RDS_CONN_UP connection in
    the arbitration logic in rds_tcp_accept_one().

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • The RDS transport has to be able to distinguish between
    two types of failure events:
    (a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
    (b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
    In case (a) both sides must reconnect and continue the RDS messaging
    without any message loss or disruption to the message sequence numbers,
    and this is achieved by rds_send_path_reset().

    In case (b) we should reset all rds_connection state to the
    new incarnation of the peer. Examples of state that needs to
    be reset are next expected rx sequence number from, or messages to be
    retransmitted to, the new incarnation of the peer.

    To achieve this, the RDS handshake probe added as part of
    commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
    is enhanced so that sender and receiver of the RDS ping-probe
    will add a generation number as part of the RDS_EXTHDR_GEN_NUM
    extension header. Each peer stores local and remote generation
    numbers as part of each rds_connection. Changes in generation
    number will be detected via incoming handshake probe ping
    request or response and will allow the receiver to reset rds_connection
    state.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • As noted in rds_recv_incoming() sequence numbers on data packets
    can decreas for the failover case, and the Rx path is equipped
    to recover from this, if the RDS_FLAG_RETRANSMITTED is set
    on the rds header of an incoming message with a suspect sequence
    number.

    The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
    flag in the rds_message, so make sure the flag is set on messages
    queued for retransmission.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

10 Nov, 2016

2 commits

  • The for() loop in rds_tcp_accept_one() assumes that the 0'th
    rds_tcp_conn_path is UP and starts multipath accepts at index 1.
    But this assumption may not always be true: if the 0'th path
    has failed (ERROR or DOWN state) an incoming connection request
    should be used to resurrect this path.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket,
    so it is incorrect to report the address port info based on
    rds_getname() as part of TCP state report.

    Invoke inet_getname() for the t_sock associated with the
    rds_tcp_connection instead.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

31 Oct, 2016

1 commit


30 Oct, 2016

1 commit

  • rds use Kconfig option called "RDS_DEBUG" to enable rds debug messages.
    This option cause the rds Makefile to add -DDEBUG to the rds gcc command
    line.

    When CONFIG_DYNAMIC_DEBUG is enabled, the "DEBUG" macro is used by
    include/linux/dynamic_debug.h to decide if dynamic debug prints should
    be sent by default to the kernel log.

    rds should not enable this macro for production builds. rds dynamic
    debug work as expected follow this fix.

    Signed-off-by: Shamir Rabinovitch
    Acked-by: Santosh Shilimkar
    Reviewed-by: Wengang Wang
    Signed-off-by: David S. Miller

    shamir rabinovitch
     

17 Oct, 2016

2 commits


24 Sep, 2016

1 commit

  • Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
    less unchecked, this moves the capability of creating a global rkey into
    the RDMA core, where it can be easily audited. It also prints a warning
    everytime this feature is used as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     

09 Aug, 2016

1 commit


16 Jul, 2016

3 commits

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Some code duplication in rds_tcp_reset_callbacks() can be avoided
    by having the function call rds_tcp_restore_callbacks() and
    rds_tcp_set_callbacks().

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • As the existing comments in rds_tcp_listen_data_ready() indicate,
    it is possible under some race-windows to get to this function with the
    accept() socket. If that happens, we could run into a sequence whereby

    thread 1 thread 2

    rds_tcp_accept_one() thread
    sets up new_sock via ->accept().
    The sk_user_data is now
    sock_def_readable
    data comes in for new_sock,
    ->sk_data_ready is called, and
    we land in rds_tcp_listen_data_ready
    rds_tcp_set_callbacks()
    takes the sk_callback_lock and
    sets up sk_user_data to be the cp
    read_lock sk_callback_lock
    ready = cp
    unlock sk_callback_lock
    page fault on ready

    In the above sequence, we end up with a panic on a bad page reference
    when trying to execute (*ready)(). Instead we need to call
    sock_def_readable() safely, which is what this patch achieves.

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

07 Jul, 2016

1 commit


05 Jul, 2016

1 commit

  • If register_pernet_subsys() fails, we shouldn't try to call
    unregister_pernet_subsys().

    Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
    Cc: stable@vger.kernel.org
    Cc: Sowmini Varadhan
    Cc: David S. Miller
    Signed-off-by: Vegard Nossum
    Acked-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Vegard Nossum
     

02 Jul, 2016

9 commits


30 Jun, 2016

1 commit


19 Jun, 2016

1 commit

  • Fix coding style issues in the following files:

    ib_cm.c: add space
    loop.c: convert spaces to tabs
    sysctl.c: add space
    tcp.h: convert spaces to tabs
    tcp_connect.c:remove extra indentation in switch statement
    tcp_recv.c: convert spaces to tabs
    tcp_send.c: convert spaces to tabs
    transport.c: move brace up one line on for statement

    Signed-off-by: Joshua Houghton
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Joshua Houghton
     

18 Jun, 2016

2 commits

  • The state of the rds_connection after rds_tcp_reset_callbacks() would
    be RDS_CONN_RESETTING and this is the value that should be passed
    by rds_tcp_accept_one() to rds_connect_path_complete() to transition
    the socket to RDS_CONN_UP.

    Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
    by rds_tcp_accept_one()")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Fixes the following sparse warnings:

    net/rds/tcp.c:59:5: warning:
    symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static?
    net/rds/tcp.c:60:5: warning:
    symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Acked-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Wei Yongjun
     

15 Jun, 2016

5 commits