24 Jul, 2018

1 commit

  • This patch changes the internal representation of an IP address to use
    struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
    All the functions which take an IP address as argument are also
    changed to use struct in6_addr. But RDS socket layer is not modified
    such that it still does not accept IPv6 address from an application.
    And RDS layer does not accept nor initiate IPv6 connections.

    v2: Fixed sparse warnings.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

09 Feb, 2018

1 commit

  • … connection/workq management

    An rds_connection can get added during netns deletion between lines 528
    and 529 of

    506 static void rds_tcp_kill_sock(struct net *net)
    :
    /* code to pull out all the rds_connections that should be destroyed */
    :
    528 spin_unlock_irq(&rds_tcp_conn_lock);
    529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
    530 rds_conn_destroy(tc->t_cpath->cp_conn);

    Such an rds_connection would miss out the rds_conn_destroy()
    loop (that cancels all pending work) and (if it was scheduled
    after netns deletion) could trigger the use-after-free.

    A similar race-window exists for the module unload path
    in rds_tcp_exit -> rds_tcp_destroy_conns

    Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
    by checking check_net() before enqueuing new work or adding new
    connections.

    Concurrency with module-unload is handled by maintaining a module
    specific flag that is set at the start of the module exit function,
    and must be checked before enqueuing new work or adding new connections.

    This commit refactors existing RDS_DESTROY_PENDING checks added by
    commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with
    connection teardown") and consolidates all the concurrency checks
    listed above into the function rds_destroy_pending().

    Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
    Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Sowmini Varadhan
     

24 Jan, 2018

1 commit


23 Jan, 2018

1 commit

  • rds-tcp uses m_ack_seq to track the tcp ack# that indicates
    that the peer has received a rds_message. The m_ack_seq is
    used in rds_tcp_is_acked() to figure out when it is safe to
    drop the rds_message from the RDS retransmit queue.

    The m_ack_seq must be calculated as an offset from the right
    edge of the in-flight tcp buffer, i.e., it should be based on
    the ->write_seq, not the ->snd_nxt.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

06 Jan, 2018

1 commit

  • rds_sendmsg() can enqueue work on cp_send_w from process context, but
    it should not enqueue this work if connection teardown has commenced
    (else we risk enquing work after rds_conn_path_destroy() has assumed that
    all work has been cancelled/flushed).

    Similarly some other functions like rds_cong_queue_updates
    and rds_tcp_data_ready are called in softirq context, and may end
    up enqueuing work on rds_wq after rds_conn_path_destroy() has assumed
    that all workqs are quiesced.

    Check the RDS_DESTROY_PENDING bit and use rcu synchronization to avoid
    all these races.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

17 Jul, 2017

1 commit

  • We could end up executing rds_conn_shutdown before the rds_recv_worker
    thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
    sock_release and set sock->sk to null, which may interleave in bad
    ways with rds_recv_worker, e.g., it could result in:

    "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078"
    [ffff881769f6fd70] release_sock at ffffffff815f337b
    [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp]
    [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds]
    [ffff881769f6fde0] process_one_work at ffffffff810a14c1
    [ffff881769f6fe40] worker_thread at ffffffff810a1940
    [ffff881769f6fec0] kthread at ffffffff810a6b1e

    Also, do not enqueue any new shutdown workq items when the connection is
    shutting down (this may happen for rds-tcp in softirq mode, if a FIN
    or CLOSE is received while the modules is in the middle of an unload)

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

01 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

06 Apr, 2017

1 commit


18 Nov, 2016

1 commit

  • As noted in rds_recv_incoming() sequence numbers on data packets
    can decreas for the failover case, and the Rx path is equipped
    to recover from this, if the RDS_FLAG_RETRANSMITTED is set
    on the rds header of an incoming message with a suspect sequence
    number.

    The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
    flag in the rds_message, so make sure the flag is set on messages
    queued for retransmission.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

16 Jul, 2016

1 commit

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Jul, 2016

2 commits


30 Jun, 2016

1 commit


19 Jun, 2016

1 commit

  • Fix coding style issues in the following files:

    ib_cm.c: add space
    loop.c: convert spaces to tabs
    sysctl.c: add space
    tcp.h: convert spaces to tabs
    tcp_connect.c:remove extra indentation in switch statement
    tcp_recv.c: convert spaces to tabs
    tcp_send.c: convert spaces to tabs
    transport.c: move brace up one line on for statement

    Signed-off-by: Joshua Houghton
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Joshua Houghton
     

15 Jun, 2016

1 commit

  • In preparation for multipath RDS, split the rds_connection
    structure into a base structure, and a per-path struct rds_conn_path.
    The base structure tracks information and locks common to all
    paths. The workqs for send/recv/shutdown etc are tracked per
    rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

    This commit allows for one rds_conn_path per rds_connection, and will
    be extended into multiple conn_paths in subsequent commits.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

20 May, 2016

1 commit

  • TCP stack can now run from process context.

    Use read_lock_bh(&sk->sk_callback_lock) variant to restore previous
    assumption.

    Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
    Fixes: d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2015

1 commit


18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Aug, 2012

1 commit


21 Oct, 2010

1 commit


27 Sep, 2010

1 commit


25 Sep, 2010

1 commit

  • We have for each socket :

    One spinlock (sk_slock.slock)
    One rwlock (sk_callback_lock)

    Possible scenarios are :

    (A) (this is used in net/sunrpc/xprtsock.c)
    read_lock(&sk->sk_callback_lock) (without blocking BH)

    spin_lock(&sk->sk_slock.slock);
    ...
    read_lock(&sk->sk_callback_lock);
    ...

    (B)
    write_lock_bh(&sk->sk_callback_lock)
    stuff
    write_unlock_bh(&sk->sk_callback_lock)

    (C)
    spin_lock_bh(&sk->sk_slock)
    ...
    write_lock_bh(&sk->sk_callback_lock)
    stuff
    write_unlock_bh(&sk->sk_callback_lock)
    spin_unlock_bh(&sk->sk_slock)

    This (C) case conflicts with (A) :

    CPU1 [A] CPU2 [C]
    read_lock(callback_lock)
    spin_lock_bh(slock)

    We have one problematic (C) use case in inet_csk_listen_stop() :

    local_bh_disable();
    bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
    WARN_ON(sock_owned_by_user(child));
    ...
    sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

    lockdep is not happy with this, as reported by Tetsuo Handa

    It seems only way to deal with this is to use read_lock_bh(callbacklock)
    everywhere.

    Thanks to Jarek for pointing a bug in my first attempt and suggesting
    this solution.

    Reported-by: Tetsuo Handa
    Tested-by: Tetsuo Handa
    Signed-off-by: Eric Dumazet
    CC: Jarek Poplawski
    Tested-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Sep, 2010

4 commits


17 Mar, 2010

1 commit


04 Feb, 2010

1 commit


24 Aug, 2009

1 commit

  • This code allows RDS to be tunneled over a TCP connection.

    RDMA operations are disabled when using TCP transport,
    but this frees RDS from the IB/RDMA stack dependency, and allows
    it to be used with standard Ethernet adapters, or in a VM.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover