05 Oct, 2020

1 commit

  • When a new incoming call arrives at an userspace rxrpc socket on a new
    connection that has a security class set, the code currently pushes it onto
    the accept queue to hold a ref on it for the socket. This doesn't work,
    however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
    state and discards the ref. This means that the call runs out of refs too
    early and the kernel oopses.

    By contrast, a kernel rxrpc socket manually pre-charges the incoming call
    pool with calls that already have user call IDs assigned, so they are ref'd
    by the call tree on the socket.

    Change the mode of operation for userspace rxrpc server sockets to work
    like this too. Although this is a UAPI change, server sockets aren't
    currently functional.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

31 Jul, 2020

1 commit

  • There's a race between rxrpc_sendmsg setting up a call, but then failing to
    send anything on it due to an error, and recvmsg() seeing the call
    completion occur and trying to return the state to the user.

    An assertion fails in rxrpc_recvmsg() because the call has already been
    released from the socket and is about to be released again as recvmsg deals
    with it. (The recvmsg_q queue on the socket holds a ref, so there's no
    problem with use-after-free.)

    We also have to be careful not to end up reporting an error twice, in such
    a way that both returns indicate to userspace that the user ID supplied
    with the call is no longer in use - which could cause the client to
    malfunction if it recycles the user ID fast enough.

    Fix this by the following means:

    (1) When sendmsg() creates a call after the point that the call has been
    successfully added to the socket, don't return any errors through
    sendmsg(), but rather complete the call and let recvmsg() retrieve
    them. Make sendmsg() return 0 at this point. Further calls to
    sendmsg() for that call will fail with ESHUTDOWN.

    Note that at this point, we haven't send any packets yet, so the
    server doesn't yet know about the call.

    (2) If sendmsg() returns an error when it was expected to create a new
    call, it means that the user ID wasn't used.

    (3) Mark the call disconnected before marking it completed to prevent an
    oops in rxrpc_release_call().

    (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate
    that the user ID is no longer known by the kernel.

    An oops like the following is produced:

    kernel BUG at net/rxrpc/recvmsg.c:605!
    ...
    RIP: 0010:rxrpc_recvmsg+0x256/0x5ae
    ...
    Call Trace:
    ? __init_waitqueue_head+0x2f/0x2f
    ____sys_recvmsg+0x8a/0x148
    ? import_iovec+0x69/0x9c
    ? copy_msghdr_from_user+0x5c/0x86
    ___sys_recvmsg+0x72/0xaa
    ? __fget_files+0x22/0x57
    ? __fget_light+0x46/0x51
    ? fdget+0x9/0x1b
    do_recvmmsg+0x15e/0x232
    ? _raw_spin_unlock+0xa/0xb
    ? vtime_delta+0xf/0x25
    __x64_sys_recvmmsg+0x2c/0x2f
    do_syscall_64+0x4c/0x78
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 357f5ef64628 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()")
    Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne
    Signed-off-by: David S. Miller

    David Howells
     

21 Jul, 2020

1 commit

  • rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
    rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.

    Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
    as this particular error doesn't get stored in ->sk_err by the networking
    core.

    Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
    errors (there's no way for it to report which call, if any, the error was
    caused by).

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

05 Jun, 2020

2 commits

  • Under some circumstances, rxrpc will fail a transmit a packet through the
    underlying UDP socket (ie. UDP sendmsg returns an error). This may result
    in a call getting stuck.

    In the instance being seen, where AFS tries to send a probe to the Volume
    Location server, tracepoints show the UDP Tx failure (in this case returing
    error 99 EADDRNOTAVAIL) and then nothing more:

    afs_make_vl_call: c=0000015d VL.GetCapabilities
    rxrpc_call: c=0000015d NWc u=1 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000dd89ee8a
    rxrpc_call: c=0000015d Gus u=2 sp=rxrpc_new_client_call+0x14f/0x580 [rxrpc] a=00000000e20e4b08
    rxrpc_call: c=0000015d SEE u=2 sp=rxrpc_activate_one_channel+0x7b/0x1c0 [rxrpc] a=00000000e20e4b08
    rxrpc_call: c=0000015d CON u=2 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000e20e4b08
    rxrpc_tx_fail: c=0000015d r=1 ret=-99 CallDataNofrag

    The problem is that if the initial packet fails and the retransmission
    timer hasn't been started, the call is set to completed and an error is
    returned from rxrpc_send_data_packet() to rxrpc_queue_packet(). Though
    rxrpc_instant_resend() is called, this does nothing because the call is
    marked completed.

    So rxrpc_notify_socket() isn't called and the error is passed back up to
    rxrpc_send_data(), rxrpc_kernel_send_data() and thence to afs_make_call()
    and afs_vl_get_capabilities() where it is simply ignored because it is
    assumed that the result of a probe will be collected asynchronously.

    Fileserver probing is similarly affected via afs_fs_get_capabilities().

    Fix this by always issuing a notification in __rxrpc_set_call_completion()
    if it shifts a call to the completed state, even if an error is also
    returned to the caller through the function return value.

    Also put in a little bit of optimisation to avoid taking the call
    state_lock and disabling softirqs if the call is already in the completed
    state and remove some now redundant rxrpc_notify_socket() calls.

    Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
    Reported-by: Gerry Seidman
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne

    David Howells
     
  • Move the handling of call completion out of line so that the next patch can
    add more code in that area.

    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne

    David Howells
     

01 Nov, 2019

1 commit

  • When rxrpc_recvmsg_data() sets the return value to 1 because it's drained
    all the data for the last packet, it checks the last-packet flag on the
    whole packet - but this is wrong, since the last-packet flag is only set on
    the final subpacket of the last jumbo packet. This means that a call that
    receives its last packet in a jumbo packet won't complete properly.

    Fix this by having rxrpc_locate_data() determine the last-packet state of
    the subpacket it's looking at and passing that back to the caller rather
    than having the caller look in the packet header. The caller then needs to
    cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then
    called again for this packet.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

07 Oct, 2019

1 commit

  • Fix the cleanup of the crypto state on a call after the call has been
    disconnected. As the call has been disconnected, its connection ref has
    been discarded and so we can't go through that to get to the security ops
    table.

    Fix this by caching the security ops pointer in the rxrpc_call struct and
    using that when freeing the call security state. Also use this in other
    places we're dealing with call-specific security.

    The symptoms look like:

    BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
    net/rxrpc/call_object.c:481
    Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764

    Fixes: 1db88c534371 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
    Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
    Signed-off-by: David Howells

    David Howells
     

27 Aug, 2019

2 commits

  • Use the previously-added transmit-phase skbuff private flag to simplify the
    socket buffer tracing a bit. Which phase the skbuff comes from can now be
    divined from the skb rather than having to be guessed from the call state.

    We can also reduce the number of rxrpc_skb_trace values by eliminating the
    difference between Tx and Rx in the symbols.

    Signed-off-by: David Howells

    David Howells
     
  • Use the information now cached in the skbuff private data to avoid the need
    to reparse a jumbo packet. We can find all the subpackets by dead
    reckoning, so it's only necessary to note how many there are, whether the
    last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK
    flag set.

    This is necessary as once recvmsg() can see the packet, it can start
    modifying it, such as doing in-place decryption.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     

09 Aug, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Feb, 2019

1 commit

  • When either "goto wait_interrupted;" or "goto wait_error;"
    paths are taken, socket lock has already been released.

    This patch fixes following syzbot splat :

    WARNING: bad unlock balance detected!
    5.0.0-rc4+ #59 Not tainted
    -------------------------------------
    syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at:
    [] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by syz-executor223/8256:
    #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline]
    #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798

    stack backtrace:
    CPU: 1 PID: 8256 Comm: syz-executor223 Not tainted 5.0.0-rc4+ #59
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    print_unlock_imbalance_bug kernel/locking/lockdep.c:3391 [inline]
    print_unlock_imbalance_bug.cold+0x114/0x123 kernel/locking/lockdep.c:3368
    __lock_release kernel/locking/lockdep.c:3601 [inline]
    lock_release+0x67e/0xa00 kernel/locking/lockdep.c:3860
    sock_release_ownership include/net/sock.h:1471 [inline]
    release_sock+0x183/0x1c0 net/core/sock.c:2808
    rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
    sock_recvmsg_nosec net/socket.c:794 [inline]
    sock_recvmsg net/socket.c:801 [inline]
    sock_recvmsg+0xd0/0x110 net/socket.c:797
    __sys_recvfrom+0x1ff/0x350 net/socket.c:1845
    __do_sys_recvfrom net/socket.c:1863 [inline]
    __se_sys_recvfrom net/socket.c:1859 [inline]
    __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:1859
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x446379
    Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fe5da89fd98 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
    RAX: ffffffffffffffda RBX: 00000000006dbc28 RCX: 0000000000446379
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
    R13: 0000000000000000 R14: 0000000000000000 R15: 20c49ba5e353f7cf

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: Eric Dumazet
    Cc: David Howells
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Oct, 2018

1 commit


04 Aug, 2018

1 commit


01 Aug, 2018

2 commits

  • Immediately flush any outstanding ACK on entry to rxrpc_recvmsg_data() -
    which transfers data to the target buffers - if we previously had an Rx
    underrun (ie. we returned -EAGAIN because we ran out of received data).
    This lets the server know what we've managed to receive something.

    Also flush any outstanding ACK after calling the function if it hit -EAGAIN
    to let the server know we processed some data.

    It might be better to send more ACKs, possibly on a time-based scheme, but
    that needs some more consideration.

    With this and some additional AFS patches, it is possible to get large
    unencrypted O_DIRECT reads to be almost as fast as NFS over TCP. It looks
    like it might be theoretically possible to improve performance yet more for
    a server running a single operation as investigation of packet timestamps
    indicates that the server keeps stalling.

    The issue appears to be that rxrpc runs in to trouble with ACK packets
    getting batched together (up to ~32 at a time) somewhere between the IP
    transmit queue on the client and the ethernet receive queue on the server.

    However, this case isn't too much of a worry as even a lightly loaded
    server should be receiving sufficient packet flux to flush the ACK packets
    to the UDP socket.

    Signed-off-by: David Howells

    David Howells
     
  • The final ACK that closes out an rxrpc call needs to be transmitted by the
    client unless we're going to follow up with a DATA packet for a new call on
    the same channel (which implicitly ACK's the previous call, thereby saving
    an ACK).

    Currently, we don't do that, so if no follow on call is immediately
    forthcoming, the server will resend the last DATA packet - at which point
    rxrpc_conn_retransmit_call() will be triggered and will (re)send the final
    ACK. But the server has to hold on to the last packet until the ACK is
    received, thereby holding up its resources.

    Fix the client side to propose a delayed final ACK, to be transmitted after
    a short delay, assuming the call isn't superseded by a new one.

    Signed-off-by: David Howells

    David Howells
     

16 Mar, 2018

1 commit

  • The variable 'len' is being initialized with a value that is never
    read and it is re-assigned later, hence the initialization is redundant
    and can be removed.

    Cleans up clang warning:
    net/rxrpc/recvmsg.c:275:15: warning: Value stored to 'len' during its
    initialization is never read

    Signed-off-by: Colin Ian King
    Signed-off-by: David S. Miller

    Colin Ian King
     

17 Feb, 2018

1 commit

  • Due to a check recently added to copy_to_user(), it's now not permitted to
    copy from slab-held data to userspace unless the slab is whitelisted. This
    affects rxrpc_recvmsg() when it attempts to place an RXRPC_USER_CALL_ID
    control message in the userspace control message buffer. A warning is
    generated by usercopy_warn() because the source is the copy of the
    user_call_ID retained in the rxrpc_call struct.

    Work around the issue by copying the user_call_ID to a variable on the
    stack and passing that to put_cmsg().

    The warning generated looks like:

    Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'dmaengine-unmap-128' (offset 680, size 8)!
    WARNING: CPU: 0 PID: 1401 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0
    ...
    RIP: 0010:usercopy_warn+0x7e/0xa0
    ...
    Call Trace:
    __check_object_size+0x9c/0x1a0
    put_cmsg+0x98/0x120
    rxrpc_recvmsg+0x6fc/0x1010 [rxrpc]
    ? finish_wait+0x80/0x80
    ___sys_recvmsg+0xf8/0x240
    ? __clear_rsb+0x25/0x3d
    ? __clear_rsb+0x15/0x3d
    ? __clear_rsb+0x25/0x3d
    ? __clear_rsb+0x15/0x3d
    ? __clear_rsb+0x25/0x3d
    ? __clear_rsb+0x15/0x3d
    ? __clear_rsb+0x25/0x3d
    ? __clear_rsb+0x15/0x3d
    ? finish_task_switch+0xa6/0x2b0
    ? trace_hardirqs_on_caller+0xed/0x180
    ? _raw_spin_unlock_irq+0x29/0x40
    ? __sys_recvmsg+0x4e/0x90
    __sys_recvmsg+0x4e/0x90
    do_syscall_64+0x7a/0x220
    entry_SYSCALL_64_after_hwframe+0x26/0x9b

    Reported-by: Jonathan Billings
    Signed-off-by: David Howells
    Acked-by: Kees Cook
    Tested-by: Jonathan Billings
    Signed-off-by: David S. Miller

    David Howells
     

24 Nov, 2017

4 commits

  • Add an extra timeout that is set/updated when we send a DATA packet that
    has the request-ack flag set. This allows us to detect if we don't get an
    ACK in response to the latest flagged packet.

    The ACK packet is adjudged to have been lost if it doesn't turn up within
    2*RTT of the transmission.

    If the timeout occurs, we schedule the sending of a PING ACK to find out
    the state of the other side. If a new DATA packet is ready to go sooner,
    we cancel the sending of the ping and set the request-ack flag on that
    instead.

    If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
    we had at the time of the ping transmission, we adjudge all the DATA
    packets sent between the response tx_top and the ping-time tx_top to have
    been lost and retransmit immediately.

    Rather than sending a PING ACK, we could just pick a DATA packet and
    speculatively retransmit that with request-ack set. It should result in
    either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
    a PING-RESPONSE ACK mentioned above.

    Signed-off-by: David Howells

    David Howells
     
  • Don't transmit a DELAY ACK immediately on proposal when the Rx window is
    rotated, but rather defer it to the work function. This means that we have
    a chance to queue/consume more received packets before we actually send the
    DELAY ACK, or even cancel it entirely, thereby reducing the number of
    packets transmitted.

    We do, however, want to continue sending other types of packet immediately,
    particularly REQUESTED ACKs, as they may be used for RTT calculation by the
    other side.

    Signed-off-by: David Howells

    David Howells
     
  • Fix the rxrpc call expiration timeouts and make them settable from
    userspace. By analogy with other rx implementations, there should be three
    timeouts:

    (1) "Normal timeout"

    This is set for all calls and is triggered if we haven't received any
    packets from the peer in a while. It is measured from the last time
    we received any packet on that call. This is not reset by any
    connection packets (such as CHALLENGE/RESPONSE packets).

    If a service operation takes a long time, the server should generate
    PING ACKs at a duration that's substantially less than the normal
    timeout so is to keep both sides alive. This is set at 1/6 of normal
    timeout.

    (2) "Idle timeout"

    This is set only for a service call and is triggered if we stop
    receiving the DATA packets that comprise the request data. It is
    measured from the last time we received a DATA packet.

    (3) "Hard timeout"

    This can be set for a call and specified the maximum lifetime of that
    call. It should not be specified by default. Some operations (such
    as volume transfer) take a long time.

    Allow userspace to set/change the timeouts on a call with sendmsg, using a
    control message:

    RXRPC_SET_CALL_TIMEOUTS

    The data to the message is a number of 32-bit words, not all of which need
    be given:

    u32 hard_timeout; /* sec from first packet */
    u32 idle_timeout; /* msec from packet Rx */
    u32 normal_timeout; /* msec from data Rx */

    This can be set in combination with any other sendmsg() that affects a
    call.

    Signed-off-by: David Howells

    David Howells
     
  • Delay terminal ACK transmission on a client call by deferring it to the
    connection processor. This allows it to be skipped if we can send the next
    call instead, the first DATA packet of which will implicitly ack this call.

    Signed-off-by: David Howells

    David Howells
     

02 Nov, 2017

1 commit

  • Place a spinlock around the invocation of call->notify_rx() for a kernel
    service call and lock again when ending the call and replace the
    notification pointer with a pointer to a dummy function.

    This is required because it's possible for rxrpc_notify_socket() to be
    called after the call has been ended by the kernel service if called from
    the asynchronous work function rxrpc_process_call().

    However, rxrpc_notify_socket() currently only holds the RCU read lock when
    invoking ->notify_rx(), which means that the afs_call struct would need to
    be disposed of by call_rcu() rather than by kfree().

    But we shouldn't see any notifications from a call after calling
    rxrpc_kernel_end_call(), so a lock is required in rxrpc code.

    Without this, we may see the call wait queue as having a corrupt spinlock:

    BUG: spinlock bad magic on CPU#0, kworker/0:2/1612
    general protection fault: 0000 [#1] SMP
    ...
    Workqueue: krxrpcd rxrpc_process_call
    task: ffff88040b83c400 task.stack: ffff88040adfc000
    RIP: 0010:spin_bug+0x161/0x18f
    RSP: 0018:ffff88040adffcc0 EFLAGS: 00010002
    RAX: 0000000000000032 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81ab16cf
    RDX: ffff88041fa14c01 RSI: ffff88041fa0ccb8 RDI: ffff88041fa0ccb8
    RBP: ffff88040adffcd8 R08: 00000000ffffffff R09: 00000000ffffffff
    R10: ffff88040adffc60 R11: 000000000000022c R12: ffff88040aca2208
    R13: ffffffff81a58114 R14: 0000000000000000 R15: 0000000000000000
    ....
    Call Trace:
    do_raw_spin_lock+0x1d/0x89
    _raw_spin_lock_irqsave+0x3d/0x49
    ? __wake_up_common_lock+0x4c/0xa7
    __wake_up_common_lock+0x4c/0xa7
    ? __lock_is_held+0x47/0x7a
    __wake_up+0xe/0x10
    afs_wake_up_call_waiter+0x11b/0x122 [kafs]
    rxrpc_notify_socket+0x12b/0x258
    rxrpc_process_call+0x18e/0x7d0
    process_one_work+0x298/0x4de
    ? rescuer_thread+0x280/0x280
    worker_thread+0x1d1/0x2ae
    ? rescuer_thread+0x280/0x280
    kthread+0x12c/0x134
    ? kthread_create_on_node+0x3a/0x3a
    ret_from_fork+0x27/0x40

    In this case, note the corrupt data in EBX. The address of the offending
    afs_call is in R12, plus the offset to the spinlock.

    Signed-off-by: David Howells

    David Howells
     

18 Oct, 2017

1 commit

  • Provide support for a kernel service to make use of the service upgrade
    facility. This involves:

    (1) Pass an upgrade request flag to rxrpc_kernel_begin_call().

    (2) Make rxrpc_kernel_recv_data() return the call's current service ID so
    that the caller can detect service upgrade and see what the service
    was upgraded to.

    Signed-off-by: David Howells

    David Howells
     

05 Jun, 2017

1 commit

  • Keep the rxrpc_connection struct's idea of the service ID that is exposed
    in the protocol separate from the service ID that's used as a lookup key.

    This allows the protocol service ID on a client connection to get upgraded
    without making the connection unfindable for other client calls that also
    would like to use the upgraded connection.

    The connection's actual service ID is then returned through recvmsg() by
    way of msg_name.

    Whilst we're at it, we get rid of the last_service_id field from each
    channel. The service ID is per-connection, not per-call and an entire
    connection is upgraded in one go.

    Signed-off-by: David Howells

    David Howells
     

06 Apr, 2017

2 commits

  • Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
    packets. The following changes are made:

    (1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
    call and mark the call aborted. This is wrapped by
    rxrpc_abort_eproto() that makes the why string usable in trace.

    (2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
    generation points, replacing rxrpc_abort_call() with the latter.

    (3) Only send an abort packet in rxkad_verify_packet*() if we actually
    managed to abort the call.

    Note that a trace event is also emitted if a kernel user (e.g. afs) tries
    to send data through a call when it's not in the transmission phase, though
    it's not technically a receive event.

    Signed-off-by: David Howells

    David Howells
     
  • Use negative error codes in struct rxrpc_call::error because that's what
    the kernel normally deals with and to make the code consistent. We only
    turn them positive when transcribing into a cmsg for userspace recvmsg.

    Signed-off-by: David Howells

    David Howells
     

08 Mar, 2017

1 commit


05 Mar, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) Fix double-free in batman-adv, from Sven Eckelmann.

    2) Fix packet stats for fast-RX path, from Joannes Berg.

    3) Netfilter's ip_route_me_harder() doesn't handle request sockets
    properly, fix from Florian Westphal.

    4) Fix sendmsg deadlock in rxrpc, from David Howells.

    5) Add missing RCU locking to transport hashtable scan, from Xin Long.

    6) Fix potential packet loss in mlxsw driver, from Ido Schimmel.

    7) Fix race in NAPI handling between poll handlers and busy polling,
    from Eric Dumazet.

    8) TX path in vxlan and geneve need proper RCU locking, from Jakub
    Kicinski.

    9) SYN processing in DCCP and TCP need to disable BH, from Eric
    Dumazet.

    10) Properly handle net_enable_timestamp() being invoked from IRQ
    context, also from Eric Dumazet.

    11) Fix crash on device-tree systems in xgene driver, from Alban Bedel.

    12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de
    Melo.

    13) Fix use-after-free in netvsc driver, from Dexuan Cui.

    14) Fix max MTU setting in bonding driver, from WANG Cong.

    15) xen-netback hash table can be allocated from softirq context, so use
    GFP_ATOMIC. From Anoob Soman.

    16) Fix MAC address change bug in bgmac driver, from Hari Vyas.

    17) strparser needs to destroy strp_wq on module exit, from WANG Cong.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
    strparser: destroy workqueue on module exit
    sfc: fix IPID endianness in TSOv2
    sfc: avoid max() in array size
    rds: remove unnecessary returned value check
    rxrpc: Fix potential NULL-pointer exception
    nfp: correct DMA direction in XDP DMA sync
    nfp: don't tell FW about the reserved buffer space
    net: ethernet: bgmac: mac address change bug
    net: ethernet: bgmac: init sequence bug
    xen-netback: don't vfree() queues under spinlock
    xen-netback: keep a local pointer for vif in backend_disconnect()
    netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails
    netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups
    netfilter: nf_conntrack_sip: fix wrong memory initialisation
    can: flexcan: fix typo in comment
    can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
    can: gs_usb: fix coding style
    can: gs_usb: Don't use stack memory for USB transfers
    ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
    ixgbe: update the rss key on h/w, when ethtool ask for it
    ...

    Linus Torvalds
     

02 Mar, 2017

2 commits

  • …hed.h> into <linux/sched/signal.h>

    Fix up affected files that include this signal functionality via sched.h.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • All the routines by which rxrpc is accessed from the outside are serialised
    by means of the socket lock (sendmsg, recvmsg, bind,
    rxrpc_kernel_begin_call(), ...) and this presents a problem:

    (1) If a number of calls on the same socket are in the process of
    connection to the same peer, a maximum of four concurrent live calls
    are permitted before further calls need to wait for a slot.

    (2) If a call is waiting for a slot, it is deep inside sendmsg() or
    rxrpc_kernel_begin_call() and the entry function is holding the socket
    lock.

    (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
    from servicing the other calls as they need to take the socket lock to
    do so.

    (4) The socket is stuck until a call is aborted and makes its slot
    available to the waiter.

    Fix this by:

    (1) Provide each call with a mutex ('user_mutex') that arbitrates access
    by the users of rxrpc separately for each specific call.

    (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
    they've got a call and taken its mutex.

    Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
    set but someone else has the lock. Should I instead only return
    EWOULDBLOCK if there's nothing currently to be done on a socket, and
    sleep in this particular instance because there is something to be
    done, but we appear to be blocked by the interrupt handler doing its
    ping?

    (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
    call, locking its user mutex and adding it to the socket's call tree.
    The call is returned locked so that sendmsg() can add data to it
    immediately.

    From the moment the call is in the socket tree, it is subject to
    access by sendmsg() and recvmsg() - even if it isn't connected yet.

    (4) Lock new service calls in the UDP data_ready handler (in
    rxrpc_new_incoming_call()) because they may already be in the socket's
    tree and the data_ready handler makes them live immediately if a user
    ID has already been preassigned.

    Note that the new call is locked before any notifications are sent
    that it is live, so doing mutex_trylock() *ought* to always succeed.
    Userspace is prevented from doing sendmsg() on calls that are in a
    too-early state in rxrpc_do_sendmsg().

    (5) Make rxrpc_new_incoming_call() return the call with the user mutex
    held so that a ping can be scheduled immediately under it.

    Note that it might be worth moving the ping call into
    rxrpc_new_incoming_call() and then we can drop the mutex there.

    (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
    release the socket after adding the call to the socket's tree. This
    is slightly tricky as we've dequeued the call by that point and have
    to requeue it.

    Note that requeuing emits a trace event.

    (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
    new mutex immediately and don't bother with the socket mutex at all.

    This patch has the nice bonus that calls on the same socket are now to some
    extent parallelisable.

    Note that we might want to move rxrpc_service_prealloc() calls out from the
    socket lock and give it its own lock, so that we don't hang progress in
    other calls because we're waiting for the allocator.

    We probably also want to avoid calling rxrpc_notify_socket() from within
    the socket lock (rxrpc_accept_call()).

    Signed-off-by: David Howells
    Tested-by: Marc Dionne
    Signed-off-by: David S. Miller

    David Howells
     

27 Feb, 2017

1 commit

  • Calls made through the in-kernel interface can end up getting stuck because
    of a missed variable update in a loop in rxrpc_recvmsg_data(). The problem
    is like this:

    (1) A new packet comes in and doesn't cause a notification to be given to
    the client as there's still another packet in the ring - the
    assumption being that if the client will keep drawing off data until
    the ring is empty.

    (2) The client is in rxrpc_recvmsg_data(), inside the big while loop that
    iterates through the packets. This copies the window pointers into
    variables rather than using the information in the call struct
    because:

    (a) MSG_PEEK might be in effect;

    (b) we need a barrier after reading call->rx_top to pair with the
    barrier in the softirq routine that loads the buffer.

    (3) The reading of call->rx_top is done outside of the loop, and top is
    never updated whilst we're in the loop. This means that even through
    there's a new packet available, we don't see it and may return -EFAULT
    to the caller - who will happily return to the scheduler and await the
    next notification.

    (4) No further notifications are forthcoming until there's an abort as the
    ring isn't empty.

    The fix is to move the read of call->rx_top inside the loop - but it needs
    to be done before the condition is checked.

    Reported-by: Marc Dionne
    Signed-off-by: David Howells
    Tested-by: Marc Dionne
    Signed-off-by: David S. Miller

    David Howells
     

06 Oct, 2016

4 commits

  • We need to generate a DELAY ACK from the service end of an operation if we
    start doing the actual operation work and it takes longer than expected.
    This will hard-ACK the request data and allow the client to release its
    resources.

    To make this work:

    (1) We have to set the ack timer and propose an ACK when the call moves to
    the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
    the timer when we start transmitting the reply (the first DATA packet
    of the reply implicitly ACKs the request phase).

    (2) It must be possible to set the timer when the caller is holding
    call->state_lock, so split the lock-getting part of the timer function
    out.

    (3) Add trace notes for the ACK we're requesting and the timer we clear.

    Signed-off-by: David Howells

    David Howells
     
  • In rxrpc_kernel_recv_data(), when we return the error number incurred by a
    failed call, we must negate it before returning it as it's stored as
    positive (that's what we have to pass back to userspace).

    Signed-off-by: David Howells

    David Howells
     
  • Separate the output of PING ACKs from the output of other sorts of ACK so
    that if we receive a PING ACK and schedule transmission of a PING RESPONSE
    ACK, the response doesn't get cancelled by a PING ACK we happen to be
    scheduling transmission of at the same time.

    If a PING RESPONSE gets lost, the other side might just sit there waiting
    for it and refuse to proceed otherwise.

    Signed-off-by: David Howells

    David Howells
     
  • Split rxrpc_send_data_packet() to separate ACK generation (which is more
    complicated) from ABORT generation. This simplifies the code a bit and
    fixes the following warning:

    In file included from ../net/rxrpc/output.c:20:0:
    net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
    net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    net/rxrpc/output.c:103:24: note: 'top' was declared here
    net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    Reported-by: Arnd Bergmann
    Signed-off-by: David Howells

    David Howells
     

30 Sep, 2016

1 commit


25 Sep, 2016

2 commits