29 May, 2020

6 commits


05 Feb, 2019

1 commit

  • RDS Service type (TOS) is user-defined and needs to be configured
    via RDS IOCTL interface. It must be set before initiating any
    traffic and once set the TOS can not be changed. All out-going
    traffic from the socket will be associated with its TOS.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     

02 Aug, 2018

1 commit


24 Jul, 2018

2 commits

  • This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
    listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
    connection requests. RDS/RDMA/IB uses a private data (struct
    rds_ib_connect_private) exchange between endpoints at RDS connection
    establishment time to support RDMA. This private data exchange uses a
    32 bit integer to represent an IP address. This needs to be changed in
    order to support IPv6. A new private data struct
    rds6_ib_connect_private is introduced to handle this. To ensure
    backward compatibility, an IPv6 capable RDS stack uses another RDMA
    listener port (RDS_CM_PORT) to accept IPv6 connection. And it
    continues to use the original RDS_PORT for IPv4 RDS connections. When
    it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
    send the connection set up request.

    v5: Fixed syntax problem (David Miller).

    v4: Changed port history comments in rds.h (Sowmini Varadhan).

    v3: Added support to set up IPv4 connection using mapped address
    (David Miller).
    Added support to set up connection between link local and non-link
    addresses.
    Various review comments from Santosh Shilimkar and Sowmini Varadhan.

    v2: Fixed bound and peer address scope mismatched issue.
    Added back rds_connect() IPv6 changes.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     
  • This patch changes the internal representation of an IP address to use
    struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
    All the functions which take an IP address as argument are also
    changed to use struct in6_addr. But RDS socket layer is not modified
    such that it still does not accept IPv6 address from an application.
    And RDS layer does not accept nor initiate IPv6 connections.

    v2: Fixed sparse warnings.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

02 Mar, 2018

1 commit

  • Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the
    accept socket") has a reference counting issue in TCP socket creation
    when accepting a new connection. The code uses sock_create_lite() to
    create a kernel socket. But it does not do __module_get() on the
    socket owner. When the connection is shutdown and sock_release() is
    called to free the socket, the owner's reference count is decremented
    and becomes incorrect. Note that this bug only shows up when the socket
    owner is configured as a kernel module.

    v2: Update comments

    Fixes: 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket")
    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

08 Jul, 2017

1 commit

  • There are two problems with calling sock_create_kern() from
    rds_tcp_accept_one()
    1. it sets up a new_sock->sk that is wasteful, because this ->sk
    is going to get replaced by inet_accept() in the subsequent ->accept()
    2. The new_sock->sk is a leaked reference in sock_graft() which
    expects to find a null parent->sk

    Avoid these problems by calling sock_create_lite().

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

22 Jun, 2017

2 commits

  • If we are unloading the rds_tcp module, we can set linger to 1
    and drop pending packets to accelerate reconnect. The peer will
    end up resetting the connection based on new generation numbers
    of the new incarnation, so hanging on to unsent TCP packets via
    linger is mostly pointless in this case.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • The RDS handshake ping probe added by commit 5916e2c1554f
    ("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
    before the first data packet is sent to a peer. If the conversation
    is not bidirectional (i.e., one side is always passive and never
    invokes rds_sendmsg()) and the passive side restarts its rds_tcp
    module, a new HS ping probe needs to be sent, so that the number
    of paths can be re-established.

    This patch achieves that by sending a HS ping probe from
    rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
    a handshake probe with this peer yet).

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

17 Jun, 2017

3 commits

  • Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP
    layer accepts the connection and then the rds_tcp_accept_one()
    callback is invoked to process the incoming connection.

    rds_tcp_accept_one() may reject the incoming syn for a number of
    reasons, e.g., commit 1a0e100fb2c9 ("RDS: TCP: Force every connection
    to be initiated by numerically smaller IP address"), or because
    we are getting spammed by a malicious node that is triggering
    a flood of connection attempts to RDS_TCP_PORT. If the incoming
    syn is rejected, no data would have been sent on the TCP socket,
    and we do not need to be in TIME_WAIT state, so we set linger on
    the TCP socket before closing, thereby closing the socket efficiently
    with a RST.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Found when testing between sparc and x86 machines on different
    subnets, so the address comparison patterns hit the corner cases and
    brought out some bugs fixed by this patch.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • After commit 1a0e100fb2c9 ("RDS: TCP: Force every connection to be
    initiated by numerically smaller IP address") we no longer need
    the logic associated with cp_outgoing, so clean up usage of this
    field.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

10 Mar, 2017

1 commit

  • Lockdep issues a circular dependency warning when AFS issues an operation
    through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

    The theory lockdep comes up with is as follows:

    (1) If the pagefault handler decides it needs to read pages from AFS, it
    calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
    creating a call requires the socket lock:

    mmap_sem must be taken before sk_lock-AF_RXRPC

    (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
    binds the underlying UDP socket whilst holding its socket lock.
    inet_bind() takes its own socket lock:

    sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

    (3) Reading from a TCP socket into a userspace buffer might cause a fault
    and thus cause the kernel to take the mmap_sem, but the TCP socket is
    locked whilst doing this:

    sk_lock-AF_INET must be taken before mmap_sem

    However, lockdep's theory is wrong in this instance because it deals only
    with lock classes and not individual locks. The AF_INET lock in (2) isn't
    really equivalent to the AF_INET lock in (3) as the former deals with a
    socket entirely internal to the kernel that never sees userspace. This is
    a limitation in the design of lockdep.

    Fix the general case by:

    (1) Double up all the locking keys used in sockets so that one set are
    used if the socket is created by userspace and the other set is used
    if the socket is created by the kernel.

    (2) Store the kern parameter passed to sk_alloc() in a variable in the
    sock struct (sk_kern_sock). This informs sock_lock_init(),
    sock_init_data() and sk_clone_lock() as to the lock keys to be used.

    Note that the child created by sk_clone_lock() inherits the parent's
    kern setting.

    (3) Add a 'kern' parameter to ->accept() that is analogous to the one
    passed in to ->create() that distinguishes whether kernel_accept() or
    sys_accept4() was the caller and can be passed to sk_alloc().

    Note that a lot of accept functions merely dequeue an already
    allocated socket. I haven't touched these as the new socket already
    exists before we get the parameter.

    Note also that there are a couple of places where I've made the accepted
    socket unconditionally kernel-based:

    irda_accept()
    rds_rcp_accept_one()
    tcp_accept_from_sock()

    because they follow a sock_create_kern() and accept off of that.

    Whilst creating this, I noticed that lustre and ocfs don't create sockets
    through sock_create_kern() and thus they aren't marked as for-kernel,
    though they appear to be internal. I wonder if these should do that so
    that they use the new set of lock keys.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

08 Mar, 2017

1 commit

  • Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in
    rds_tcp_listen_data_ready") added the function
    rds_tcp_listen_sock_def_readable() to handle the case when a
    partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
    However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
    through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
    null and we would hit a panic of the form
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: (null)
    :
    ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
    tcp_data_queue+0x39d/0x5b0
    tcp_rcv_established+0x2e5/0x660
    tcp_v4_do_rcv+0x122/0x220
    tcp_v4_rcv+0x8b7/0x980
    :
    In the above case, it is not fatal to encounter a NULL value for
    ready- we should just drop the packet and let the flush of the
    acceptor thread finish gracefully.

    In general, the tear-down sequence for listen() and accept() socket
    that is ensured by this commit is:
    rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
    In rds_tcp_listen_stop():
    serialize with, and prevent, further callbacks using lock_sock()
    flush rds_wq
    flush acceptor workq
    sock_release(listen socket)

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

03 Jan, 2017

1 commit


18 Nov, 2016

1 commit

  • When 2 RDS peers initiate an RDS-TCP connection simultaneously,
    there is a potential for "duelling syns" on either/both sides.
    See commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
    outgoing socket in rds_tcp_accept_one()") for a description of this
    condition, and the arbitration logic which ensures that the
    numerically large IP address in the TCP connection is bound to the
    RDS_TCP_PORT ("canonical ordering").

    The rds_connection should not be marked as RDS_CONN_UP until the
    arbitration logic has converged for the following reason. The sender
    may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
    and since the sender removes all datagrams from the rds_connection's
    cp_retrans queue based on TCP acks. If the TCP ack was sent from
    a tcp socket that got reset as part of duel aribitration (but
    before data was delivered to the receivers RDS socket layer),
    the sender may end up prematurely freeing the datagram, and
    the datagram is no longer reliably deliverable.

    This patch remedies that condition by making sure that, upon
    receipt of 3WH completion state change notification of TCP_ESTABLISHED
    in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
    if, and only if, the IP addresses and ports for the connection are
    canonically ordered. In all other cases, rds_tcp_state_change will
    force an rds_conn_path_drop(), and rds_queue_reconnect() on
    both peers will restart the connection to ensure canonical ordering.

    A side-effect of enforcing this condition in rds_tcp_state_change()
    is that rds_tcp_accept_one_path() can now be refactored for simplicity.
    It is also no longer possible to encounter an RDS_CONN_UP connection in
    the arbitration logic in rds_tcp_accept_one().

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

10 Nov, 2016

1 commit

  • The for() loop in rds_tcp_accept_one() assumes that the 0'th
    rds_tcp_conn_path is UP and starts multipath accepts at index 1.
    But this assumption may not always be true: if the 0'th path
    has failed (ERROR or DOWN state) an incoming connection request
    should be used to resurrect this path.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

16 Jul, 2016

2 commits

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • As the existing comments in rds_tcp_listen_data_ready() indicate,
    it is possible under some race-windows to get to this function with the
    accept() socket. If that happens, we could run into a sequence whereby

    thread 1 thread 2

    rds_tcp_accept_one() thread
    sets up new_sock via ->accept().
    The sk_user_data is now
    sock_def_readable
    data comes in for new_sock,
    ->sk_data_ready is called, and
    we land in rds_tcp_listen_data_ready
    rds_tcp_set_callbacks()
    takes the sk_callback_lock and
    sets up sk_user_data to be the cp
    read_lock sk_callback_lock
    ready = cp
    unlock sk_callback_lock
    page fault on ready

    In the above sequence, we end up with a panic on a bad page reference
    when trying to execute (*ready)(). Instead we need to call
    sock_def_readable() safely, which is what this patch achieves.

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Jul, 2016

2 commits


30 Jun, 2016

1 commit


18 Jun, 2016

1 commit

  • The state of the rds_connection after rds_tcp_reset_callbacks() would
    be RDS_CONN_RESETTING and this is the value that should be passed
    by rds_tcp_accept_one() to rds_connect_path_complete() to transition
    the socket to RDS_CONN_UP.

    Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
    by rds_tcp_accept_one()")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

15 Jun, 2016

1 commit

  • In preparation for multipath RDS, split the rds_connection
    structure into a base structure, and a per-path struct rds_conn_path.
    The base structure tracks information and locks common to all
    paths. The workqs for send/recv/shutdown etc are tracked per
    rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

    This commit allows for one rds_conn_path per rds_connection, and will
    be extended into multiple conn_paths in subsequent commits.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

08 Jun, 2016

2 commits

  • The send path needs to be quiesced before resetting callbacks from
    rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
    rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
    this using the c_state and RDS_IN_XMIT bit following the pattern
    used by rds_conn_shutdown(). However this leaves the possibility
    of a race window as shown in the sequence below
    take t_conn_lock in rds_tcp_conn_connect
    send outgoing syn to peer
    drop t_conn_lock in rds_tcp_conn_connect
    incoming from peer triggers rds_tcp_accept_one, conn is
    marked CONNECTING
    wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
    call rds_tcp_reset_callbacks
    [.. race-window where incoming syn-ack can cause the conn
    to be marked UP from rds_tcp_state_change ..]
    lock_sock called from rds_tcp_reset_callbacks, and we set
    t_sock to null
    As soon as the conn is marked UP in the race-window above, rds_send_xmit()
    threads will proceed to rds_tcp_xmit and may encounter a null-pointer
    deref on the t_sock.

    Given that rds_tcp_state_change() is invoked in softirq context, whereas
    rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
    after lock_sock could result in a deadlock with tcp_sendmsg, this
    commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
    will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • When rds_tcp_accept_one() has to replace the existing tcp socket
    with a newer tcp socket (duelling-syn resolution), it must lock_sock()
    to suppress the rds_tcp_data_recv() path while callbacks are being
    changed. Also, existing RDS datagram reassembly state must be reset,
    so that the next datagram on the new socket does not have corrupted
    state. Similarly when resetting the newly accepted socket, appropriate
    locks and synchronization is needed.

    This commit ensures correct synchronization by invoking
    kernel_sock_shutdown to reset a newly accepted sock, and by taking
    appropriate lock_sock()s (for old and new sockets) when resetting
    existing callbacks.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

21 May, 2016

2 commits

  • When a rogue SYN is received after the connection arbitration
    algorithm has converged, the incoming SYN should not needlessly
    quiesce the transmit path, and it should not result in needless
    TCP connection resets due to re-execution of the connection
    arbitration logic.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • There are two instances where we want to terminate RDS-TCP: when
    exiting the netns or during module unload. In either case, the
    termination sequence is to stop the listen socket, mark the
    rtn->rds_tcp_listen_sock as null, and flush any accept workqs.
    Thus any workqs that get flushed at this point will encounter a
    null rds_tcp_listen_sock, and must exit gracefully to allow
    the RDS-TCP termination to complete successfully.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

20 May, 2016

1 commit

  • TCP stack can now run from process context.

    Use read_lock_bh(&sk->sk_callback_lock) variant to restore previous
    assumption.

    Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
    Fixes: d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 May, 2016

2 commits

  • An arbitration scheme for duelling SYNs is implemented as part of
    commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
    outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
    involved will arrive at the same arbitration decision. However, this
    needs to be synchronized with an outgoing SYN to be generated by
    rds_tcp_conn_connect(). This commit achieves the synchronization
    through the t_conn_lock mutex in struct rds_tcp_connection.

    The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
    the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
    not already UP (an UP would indicate that rds_tcp_accept_one() has
    completed 3WH, so no SYN needs to be generated).

    Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
    acquiring the t_conn_lock mutex. The only acceptable states (to
    allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
    was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
    outgoing SYN before we saw incoming SYN).

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • There is a race condition between rds_send_xmit -> rds_tcp_xmit
    and the code that deals with resolution of duelling syns added
    by commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
    outgoing socket in rds_tcp_accept_one()").

    Specifically, we may end up derefencing a null pointer in rds_send_xmit
    if we have the interleaving sequence:
    rds_tcp_accept_one rds_send_xmit

    conn is RDS_CONN_UP, so
    invoke rds_tcp_xmit

    tc = conn->c_transport_data
    rds_tcp_restore_callbacks
    /* reset t_sock */
    null ptr deref from tc->t_sock

    The race condition can be avoided without adding the overhead of
    additional locking in the xmit path: have rds_tcp_accept_one wait
    for rds_tcp_xmit threads to complete before resetting callbacks.
    The synchronization can be done in the same manner as rds_conn_shutdown().
    First set the rds_conn_state to something other than RDS_CONN_UP
    (so that new threads cannot get into rds_tcp_xmit()), then wait for
    RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
    threads in rds_tcp_xmit are done.

    Fixes: 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
    outgoing socket in rds_tcp_accept_one()")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

13 Oct, 2015

1 commit

  • Consider the following "duelling syn" sequence between two peers A and B:
    A B
    SYN1 -->

    Note that the SYN/ACK has already been sent out by TCP before
    rds_tcp_accept_one() gets invoked as part of callbacks.

    If the inet_addr(A) is numerically less than inet_addr(B),
    the arbitration scheme in rds_tcp_accept_one() will prefer the
    TCP connection triggered by SYN1, and will send a CLOSE for the
    SYN2 (just after the SYN2ACK was sent).

    Since B also follows the same arbitration scheme, it will send the SYN-ACK
    for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
    B will also get a CLOSE for SYN2, which should result in the cleanup
    of the TCP state machine for SYN2, but it should not trigger any
    stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
    that would disrupt the progress of the SYN2 based RDS-TCP connection.

    Thus the arbitration scheme in rds_tcp_accept_one() should restore
    rds_tcp callbacks for the winner before setting them up for the
    new accept socket, and also make sure that conn->c_outgoing
    is set to 0 so that we do not trigger any reconnect attempts on the
    passive side of the tcp socket in the future, in conformance with
    commit c82ac7e69efe ("net/rds: RDS-TCP: only initiate reconnect attempt
    on outgoing TCP socket.")

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

05 Oct, 2015

1 commit

  • Commit f711a6ae062c ("net/rds: RDS-TCP: Always create a new rds_sock
    for an incoming connection.") modified rds-tcp so that an incoming SYN
    would ignore an existing "client" TCP connection which had the local
    port set to the transient port. The motivation for ignoring the existing
    "client" connection in f711a6ae was to avoid race conditions and an
    endless duel of reconnect attempts triggered by a restart/abort of one
    of the nodes in the TCP connection.

    However, having separate sockets for active and passive sides
    is avoidable, and the simpler model of a single TCP socket for
    both send and receives of all RDS connections associated with
    that tcp socket makes for easier observability. We avoid the race
    conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
    if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
    The c_outgoing bit is initialized in __rds_conn_create().

    A side-effect of re-using the client rds_connection for an incoming
    SYN is the potential of encountering duelling SYNs, i.e., we
    have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
    SYN. The logic to arbitrate this criss-crossing SYN exchange in
    rds_tcp_accept_one() has been modified to emulate the BGP state
    machine: the smaller IP address should back off from the connection attempt.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

08 Aug, 2015

2 commits

  • Register pernet subsys init/stop functions that will set up
    and tear down per-net RDS-TCP listen endpoints. Unregister
    pernet subusys functions on 'modprobe -r' to clean up these
    end points.

    Enable keepalive on both accept and connect socket endpoints.
    The keepalive timer expiration will ensure that client socket
    endpoints will be removed as appropriate from the netns when
    an interface is removed from a namespace.

    Register a device notifier callback that will clean up all
    sockets (and thus avoid the need to wait for keepalive timeout)
    when the loopback device is unregistered from the netns indicating
    that the netns is getting deleted.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Open the sockets calling sock_create_kern() with the correct struct net
    pointer, and use that struct net pointer when verifying the
    address passed to rds_bind().

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan