22 Jul, 2018

1 commit

  • commit f1693c63ab133d16994cc50f773982b5905af264 upstream.

    Loop transport which is self loopback, remote port congestion
    update isn't relevant. Infact the xmit path already ignores it.
    Receive path needs to do the same.

    Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Santosh Shilimkar
     

19 May, 2018

1 commit

  • [ Upstream commit eb80ca476ec11f67a62691a93604b405ffc7d80c ]

    syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
    from rds_cmsg_recv().

    Simply clear the structure, since we have holes there, or since
    rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.

    BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
    BUG: KMSAN: uninit-value in put_cmsg+0x600/0x870 net/core/scm.c:242
    CPU: 0 PID: 4459 Comm: syz-executor582 Not tainted 4.16.0+ #87
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
    kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x600/0x870 net/core/scm.c:242
    rds_cmsg_recv net/rds/recv.c:570 [inline]
    rds_recvmsg+0x2db5/0x3170 net/rds/recv.c:657
    sock_recvmsg_nosec net/socket.c:803 [inline]
    sock_recvmsg+0x1d0/0x230 net/socket.c:810
    ___sys_recvmsg+0x3fb/0x810 net/socket.c:2205
    __sys_recvmsg net/socket.c:2250 [inline]
    SYSC_recvmsg+0x298/0x3c0 net/socket.c:2262
    SyS_recvmsg+0x54/0x80 net/socket.c:2257
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Santosh Shilimkar
    Cc: linux-rdma
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

05 Jul, 2017

1 commit


22 Jun, 2017

1 commit

  • The RDS handshake ping probe added by commit 5916e2c1554f
    ("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
    before the first data packet is sent to a peer. If the conversation
    is not bidirectional (i.e., one side is always passive and never
    invokes rds_sendmsg()) and the passive side restarts its rds_tcp
    module, a new HS ping probe needs to be sent, so that the number
    of paths can be re-established.

    This patch achieves that by sending a HS ping probe from
    rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
    a handshake probe with this peer yet).

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

17 Jun, 2017

1 commit

  • Found when testing between sparc and x86 machines on different
    subnets, so the address comparison patterns hit the corner cases and
    brought out some bugs fixed by this patch.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

22 Apr, 2017

1 commit


03 Jan, 2017

2 commits


18 Nov, 2016

1 commit

  • The RDS transport has to be able to distinguish between
    two types of failure events:
    (a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
    (b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
    In case (a) both sides must reconnect and continue the RDS messaging
    without any message loss or disruption to the message sequence numbers,
    and this is achieved by rds_send_path_reset().

    In case (b) we should reset all rds_connection state to the
    new incarnation of the peer. Examples of state that needs to
    be reset are next expected rx sequence number from, or messages to be
    retransmitted to, the new incarnation of the peer.

    To achieve this, the RDS handshake probe added as part of
    commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
    is enhanced so that sender and receiver of the RDS ping-probe
    will add a generation number as part of the RDS_EXTHDR_GEN_NUM
    extension header. Each peer stores local and remote generation
    numbers as part of each rds_connection. Changes in generation
    number will be detected via incoming handshake probe ping
    request or response and will allow the receiver to reset rds_connection
    state.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

16 Jul, 2016

1 commit

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Jul, 2016

1 commit

  • RDS ping messages are sent with a non-zero src port to a zero
    dst port, so that the rds pong messages can be sent back to the
    originators src port. However if a confused/malicious sender
    sends a ping with a 0 src port, we'd have an infinite ping-pong
    loop. To avoid this, the receiver should ignore ping messages
    with a 0 src port.

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

15 Jun, 2016

4 commits


03 Jun, 2016

1 commit


03 Mar, 2016

1 commit


03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

10 Dec, 2014

1 commit

  • Note that the code _using_ ->msg_iter at that point will be very
    unhappy with anything other than unshifted iovec-backed iov_iter.
    We still need to convert users to proper primitives.

    Signed-off-by: Al Viro

    Al Viro
     

24 Nov, 2014

1 commit


19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

21 Nov, 2013

1 commit


23 Jul, 2012

1 commit

  • Jay Fenlason (fenlason@redhat.com) found a bug,
    that recvfrom() on an RDS socket can return the contents of random kernel
    memory to userspace if it was called with a address length larger than
    sizeof(struct sockaddr_in).
    rds_recvmsg() also fails to set the addr_len paramater properly before
    returning, but that's just a bug.
    There are also a number of cases wher recvfrom() can return an entirely bogus
    address. Anything in rds_recvmsg() that returns a non-negative value but does
    not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
    at the end of the while(1) loop will return up to 128 bytes of kernel memory
    to userspace.

    And I write two test programs to reproduce this bug, you will see that in
    rds_server, fromAddr will be overwritten and the following sock_fd will be
    destroyed.
    Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
    better to make the kernel copy the real length of address to user space in
    such case.

    How to run the test programs ?
    I test them on 32bit x86 system, 3.5.0-rc7.

    1 compile
    gcc -o rds_client rds_client.c
    gcc -o rds_server rds_server.c

    2 run ./rds_server on one console

    3 run ./rds_client on another console

    4 you will see something like:
    server is waiting to receive data...
    old socket fd=3
    server received data from client:data from client
    msg.msg_namelen=32
    new socket fd=-1067277685
    sendmsg()
    : Bad file descriptor

    /***************** rds_client.c ********************/

    int main(void)
    {
    int sock_fd;
    struct sockaddr_in serverAddr;
    struct sockaddr_in toAddr;
    char recvBuffer[128] = "data from client";
    struct msghdr msg;
    struct iovec iov;

    sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
    if (sock_fd < 0) {
    perror("create socket error\n");
    exit(1);
    }

    memset(&serverAddr, 0, sizeof(serverAddr));
    serverAddr.sin_family = AF_INET;
    serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
    serverAddr.sin_port = htons(4001);

    if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
    perror("bind() error\n");
    close(sock_fd);
    exit(1);
    }

    memset(&toAddr, 0, sizeof(toAddr));
    toAddr.sin_family = AF_INET;
    toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
    toAddr.sin_port = htons(4000);
    msg.msg_name = &toAddr;
    msg.msg_namelen = sizeof(toAddr);
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = recvBuffer;
    msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
    msg.msg_control = 0;
    msg.msg_controllen = 0;
    msg.msg_flags = 0;

    if (sendmsg(sock_fd, &msg, 0) == -1) {
    perror("sendto() error\n");
    close(sock_fd);
    exit(1);
    }

    printf("client send data:%s\n", recvBuffer);

    memset(recvBuffer, '\0', 128);

    msg.msg_name = &toAddr;
    msg.msg_namelen = sizeof(toAddr);
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = recvBuffer;
    msg.msg_iov->iov_len = 128;
    msg.msg_control = 0;
    msg.msg_controllen = 0;
    msg.msg_flags = 0;
    if (recvmsg(sock_fd, &msg, 0) == -1) {
    perror("recvmsg() error\n");
    close(sock_fd);
    exit(1);
    }

    printf("receive data from server:%s\n", recvBuffer);

    close(sock_fd);

    return 0;
    }

    /***************** rds_server.c ********************/

    int main(void)
    {
    struct sockaddr_in fromAddr;
    int sock_fd;
    struct sockaddr_in serverAddr;
    unsigned int addrLen;
    char recvBuffer[128];
    struct msghdr msg;
    struct iovec iov;

    sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
    if(sock_fd < 0) {
    perror("create socket error\n");
    exit(0);
    }

    memset(&serverAddr, 0, sizeof(serverAddr));
    serverAddr.sin_family = AF_INET;
    serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
    serverAddr.sin_port = htons(4000);
    if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
    perror("bind error\n");
    close(sock_fd);
    exit(1);
    }

    printf("server is waiting to receive data...\n");
    msg.msg_name = &fromAddr;

    /*
    * I add 16 to sizeof(fromAddr), ie 32,
    * and pay attention to the definition of fromAddr,
    * recvmsg() will overwrite sock_fd,
    * since kernel will copy 32 bytes to userspace.
    *
    * If you just use sizeof(fromAddr), it works fine.
    * */
    msg.msg_namelen = sizeof(fromAddr) + 16;
    /* msg.msg_namelen = sizeof(fromAddr); */
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = recvBuffer;
    msg.msg_iov->iov_len = 128;
    msg.msg_control = 0;
    msg.msg_controllen = 0;
    msg.msg_flags = 0;

    while (1) {
    printf("old socket fd=%d\n", sock_fd);
    if (recvmsg(sock_fd, &msg, 0) == -1) {
    perror("recvmsg() error\n");
    close(sock_fd);
    exit(1);
    }
    printf("server received data from client:%s\n", recvBuffer);
    printf("msg.msg_namelen=%d\n", msg.msg_namelen);
    printf("new socket fd=%d\n", sock_fd);
    strcat(recvBuffer, "--data from server");
    if (sendmsg(sock_fd, &msg, 0) == -1) {
    perror("sendmsg()\n");
    close(sock_fd);
    exit(1);
    }
    }

    close(sock_fd);
    return 0;
    }

    Signed-off-by: Weiping Pan
    Signed-off-by: David S. Miller

    Weiping Pan
     

20 Mar, 2012

1 commit


01 Nov, 2011

1 commit


21 Oct, 2010

1 commit


09 Sep, 2010

3 commits


19 Aug, 2010

1 commit


21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Nov, 2009

1 commit


24 Aug, 2009

1 commit


20 Jul, 2009

1 commit


27 Feb, 2009

1 commit

  • Upon receiving a datagram from the transport, RDS parses the
    headers and potentially queues an ACK.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover