29 May, 2020

3 commits

  • Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space
    without going through a fake uaccess.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Add a helper to directly set the TCP_NODELAY sockopt from kernel space
    without going through a fake uaccess. Cleanup the callers to avoid
    pointless wrappers now that this is a simple function call.

    Signed-off-by: Christoph Hellwig
    Acked-by: Sagi Grimberg
    Acked-by: Jason Gunthorpe
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Add a helper to directly set the SO_LINGER sockopt from kernel space
    with onoff set to true and a linger time of 0 without going through a
    fake uaccess.

    Signed-off-by: Christoph Hellwig
    Acked-by: Sagi Grimberg
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

24 Jul, 2018

1 commit

  • This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
    listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
    connection requests. RDS/RDMA/IB uses a private data (struct
    rds_ib_connect_private) exchange between endpoints at RDS connection
    establishment time to support RDMA. This private data exchange uses a
    32 bit integer to represent an IP address. This needs to be changed in
    order to support IPv6. A new private data struct
    rds6_ib_connect_private is introduced to handle this. To ensure
    backward compatibility, an IPv6 capable RDS stack uses another RDMA
    listener port (RDS_CM_PORT) to accept IPv6 connection. And it
    continues to use the original RDS_PORT for IPv4 RDS connections. When
    it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
    send the connection set up request.

    v5: Fixed syntax problem (David Miller).

    v4: Changed port history comments in rds.h (Sowmini Varadhan).

    v3: Added support to set up IPv4 connection using mapped address
    (David Miller).
    Added support to set up connection between link local and non-link
    addresses.
    Various review comments from Santosh Shilimkar and Sowmini Varadhan.

    v2: Fixed bound and peer address scope mismatched issue.
    Added back rds_connect() IPv6 changes.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

24 Jan, 2018

1 commit


23 Jan, 2018

1 commit

  • rds-tcp uses m_ack_seq to track the tcp ack# that indicates
    that the peer has received a rds_message. The m_ack_seq is
    used in rds_tcp_is_acked() to figure out when it is safe to
    drop the rds_message from the RDS retransmit queue.

    The m_ack_seq must be calculated as an offset from the right
    edge of the in-flight tcp buffer, i.e., it should be based on
    the ->write_seq, not the ->snd_nxt.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Dec, 2017

1 commit

  • The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
    to find the rds_connection entries marked for deletion as part
    of the netns deletion under the protection of the rds_tcp_conn_lock.
    Since the rds_tcp_conn_list tracks rds_tcp_connections (which
    have a 1:1 mapping with rds_conn_path), multiple tc entries in
    the rds_tcp_conn_list will map to a single rds_connection, and will
    be deleted as part of the rds_conn_destroy() operation that is
    done outside the rds_tcp_conn_lock.

    The rds_tcp_conn_list traversal done under the protection of
    rds_tcp_conn_lock should not leave any doomed tc entries in
    the list after the rds_tcp_conn_lock is released, else another
    concurrently executiong netns delete (for a differnt netns) thread
    may trip on these entries.

    Reported-by: syzbot
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

22 Jun, 2017

1 commit

  • If we are unloading the rds_tcp module, we can set linger to 1
    and drop pending packets to accelerate reconnect. The peer will
    end up resetting the connection based on new generation numbers
    of the new incarnation, so hanging on to unsent TCP packets via
    linger is mostly pointless in this case.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

08 Mar, 2017

1 commit

  • Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in
    rds_tcp_listen_data_ready") added the function
    rds_tcp_listen_sock_def_readable() to handle the case when a
    partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
    However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
    through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
    null and we would hit a panic of the form
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: (null)
    :
    ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
    tcp_data_queue+0x39d/0x5b0
    tcp_rcv_established+0x2e5/0x660
    tcp_v4_do_rcv+0x122/0x220
    tcp_v4_rcv+0x8b7/0x980
    :
    In the above case, it is not fatal to encounter a NULL value for
    ready- we should just drop the packet and let the flush of the
    acceptor thread finish gracefully.

    In general, the tear-down sequence for listen() and accept() socket
    that is ensured by this commit is:
    rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
    In rds_tcp_listen_stop():
    serialize with, and prevent, further callbacks using lock_sock()
    flush rds_wq
    flush acceptor workq
    sock_release(listen socket)

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

16 Jul, 2016

1 commit

  • As the existing comments in rds_tcp_listen_data_ready() indicate,
    it is possible under some race-windows to get to this function with the
    accept() socket. If that happens, we could run into a sequence whereby

    thread 1 thread 2

    rds_tcp_accept_one() thread
    sets up new_sock via ->accept().
    The sk_user_data is now
    sock_def_readable
    data comes in for new_sock,
    ->sk_data_ready is called, and
    we land in rds_tcp_listen_data_ready
    rds_tcp_set_callbacks()
    takes the sk_callback_lock and
    sets up sk_user_data to be the cp
    read_lock sk_callback_lock
    ready = cp
    unlock sk_callback_lock
    page fault on ready

    In the above sequence, we end up with a panic on a bad page reference
    when trying to execute (*ready)(). Instead we need to call
    sock_def_readable() safely, which is what this patch achieves.

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Jul, 2016

5 commits


19 Jun, 2016

1 commit

  • Fix coding style issues in the following files:

    ib_cm.c: add space
    loop.c: convert spaces to tabs
    sysctl.c: add space
    tcp.h: convert spaces to tabs
    tcp_connect.c:remove extra indentation in switch statement
    tcp_recv.c: convert spaces to tabs
    tcp_send.c: convert spaces to tabs
    transport.c: move brace up one line on for statement

    Signed-off-by: Joshua Houghton
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Joshua Houghton
     

08 Jun, 2016

1 commit

  • When rds_tcp_accept_one() has to replace the existing tcp socket
    with a newer tcp socket (duelling-syn resolution), it must lock_sock()
    to suppress the rds_tcp_data_recv() path while callbacks are being
    changed. Also, existing RDS datagram reassembly state must be reset,
    so that the next datagram on the new socket does not have corrupted
    state. Similarly when resetting the newly accepted socket, appropriate
    locks and synchronization is needed.

    This commit ensures correct synchronization by invoking
    kernel_sock_shutdown to reset a newly accepted sock, and by taking
    appropriate lock_sock()s (for old and new sockets) when resetting
    existing callbacks.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

04 May, 2016

1 commit

  • An arbitration scheme for duelling SYNs is implemented as part of
    commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
    outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
    involved will arrive at the same arbitration decision. However, this
    needs to be synchronized with an outgoing SYN to be generated by
    rds_tcp_conn_connect(). This commit achieves the synchronization
    through the t_conn_lock mutex in struct rds_tcp_connection.

    The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
    the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
    not already UP (an UP would indicate that rds_tcp_accept_one() has
    completed 3WH, so no SYN needs to be generated).

    Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
    acquiring the t_conn_lock mutex. The only acceptable states (to
    allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
    was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
    outgoing SYN before we saw incoming SYN).

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

08 Aug, 2015

1 commit

  • Register pernet subsys init/stop functions that will set up
    and tear down per-net RDS-TCP listen endpoints. Unregister
    pernet subusys functions on 'modprobe -r' to clean up these
    end points.

    Enable keepalive on both accept and connect socket endpoints.
    The keepalive timer expiration will ensure that client socket
    endpoints will be removed as appropriate from the netns when
    an interface is removed from a namespace.

    Register a device notifier callback that will clean up all
    sockets (and thus avoid the need to wait for keepalive timeout)
    when the loopback device is unregistered from the netns indicating
    that the netns is getting deleted.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

24 Nov, 2014

1 commit


12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Oct, 2010

1 commit


09 Sep, 2010

3 commits


24 Aug, 2009

1 commit

  • This code allows RDS to be tunneled over a TCP connection.

    RDMA operations are disabled when using TCP transport,
    but this frees RDS from the IB/RDMA stack dependency, and allows
    it to be used with standard Ethernet adapters, or in a VM.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover