04 Sep, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

    2) Fix loss of RTT samples in rxrpc, from David Howells.

    3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

    4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

    5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

    6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

    7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

    8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

    9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

    10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

    11) Memory leak in rxkad_verify_response, from Dinghao Liu.

    12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

    13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

    14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

    15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
    net/smc: fix sock refcounting in case of termination
    net/smc: reset sndbuf_desc if freed
    net/smc: set rx_off for SMCR explicitly
    net/smc: fix toleration of fake add_link messages
    tg3: Fix soft lockup when tg3_reset_task() fails.
    doc: net: dsa: Fix typo in config code sample
    net: dp83867: Fix WoL SecureOn password
    nfp: flower: fix ABI mismatch between driver and firmware
    tipc: fix shutdown() of connectionless socket
    ipv6: Fix sysctl max for fib_multipath_hash_policy
    drivers/net/wan/hdlc: Change the default of hard_header_len to 0
    net: gemini: Fix another missing clk_disable_unprepare() in probe
    net: bcmgenet: fix mask check in bcmgenet_validate_flow()
    amd-xgbe: Add support for new port mode
    net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    vhost: fix typo in error message
    net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
    pktgen: fix error message with wrong function name
    net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
    cxgb4: fix thermal zone device registration
    ...

    Linus Torvalds
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

21 Aug, 2020

1 commit

  • The Rx protocol has a mechanism to help generate RTT samples that works by
    a client transmitting a REQUESTED-type ACK when it receives a DATA packet
    that has the REQUEST_ACK flag set.

    The peer, however, may interpose other ACKs before transmitting the
    REQUESTED-ACK, as can be seen in the following trace excerpt:

    rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07
    rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0
    rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0
    ...

    DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx). The incoming
    ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the
    sequence number of the DATA packet), causing it to be discarded from the Tx
    ring. The ACK that was requested (labelled REQ, r=xx references the serial
    of the DATA packet) comes after the ping, but the sk_buff holding the
    timestamp has gone and the RTT sample is lost.

    This is particularly noticeable on RPC calls used to probe the service
    offered by the peer. A lot of peers end up with an unknown RTT because we
    only ever sent a single RPC. This confuses the server rotation algorithm.

    Fix this by caching the information about the outgoing packet in RTT
    calculations in the rxrpc_call struct rather than looking in the Tx ring.

    A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and
    PING-ACK transmissions are recorded in there. When the appropriate
    response ACK is received, the buffer is checked for a match and, if found,
    an RTT sample is recorded.

    If a received ACK refers to a packet with a later serial number than an
    entry in the cache, that entry is presumed lost and the entry is made
    available to record a new transmission.

    ACKs types other than REQUESTED-type and PING-type cause any matching
    sample to be cancelled as they don't necessarily represent a useful
    measurement.

    If there's no space in the buffer on ping/data transmission, the sample
    base is discarded.

    Fixes: 50235c4b5a2f ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets")
    Signed-off-by: David Howells

    David Howells
     

20 Aug, 2020

1 commit


18 Jun, 2020

1 commit

  • The handling of the receive window size (rwind) from a received ACK packet
    is not correct. The rxrpc_input_ackinfo() function currently checks the
    current Tx window size against the rwind from the ACK to see if it has
    changed, but then limits the rwind size before storing it in the tx_winsize
    member and, if it increased, wake up the transmitting process. This means
    that if rwind > RXRPC_RXTX_BUFF_SIZE - 1, this path will always be
    followed.

    Fix this by limiting rwind before we compare it to tx_winsize.

    The effect of this can be seen by enabling the rxrpc_rx_rwind_change
    tracepoint.

    Fixes: 702f2ac87a9a ("rxrpc: Wake up the transmitter if Rx window size increases on the peer")
    Signed-off-by: David Howells

    David Howells
     

05 Jun, 2020

1 commit

  • Under some circumstances, rxrpc will fail a transmit a packet through the
    underlying UDP socket (ie. UDP sendmsg returns an error). This may result
    in a call getting stuck.

    In the instance being seen, where AFS tries to send a probe to the Volume
    Location server, tracepoints show the UDP Tx failure (in this case returing
    error 99 EADDRNOTAVAIL) and then nothing more:

    afs_make_vl_call: c=0000015d VL.GetCapabilities
    rxrpc_call: c=0000015d NWc u=1 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000dd89ee8a
    rxrpc_call: c=0000015d Gus u=2 sp=rxrpc_new_client_call+0x14f/0x580 [rxrpc] a=00000000e20e4b08
    rxrpc_call: c=0000015d SEE u=2 sp=rxrpc_activate_one_channel+0x7b/0x1c0 [rxrpc] a=00000000e20e4b08
    rxrpc_call: c=0000015d CON u=2 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000e20e4b08
    rxrpc_tx_fail: c=0000015d r=1 ret=-99 CallDataNofrag

    The problem is that if the initial packet fails and the retransmission
    timer hasn't been started, the call is set to completed and an error is
    returned from rxrpc_send_data_packet() to rxrpc_queue_packet(). Though
    rxrpc_instant_resend() is called, this does nothing because the call is
    marked completed.

    So rxrpc_notify_socket() isn't called and the error is passed back up to
    rxrpc_send_data(), rxrpc_kernel_send_data() and thence to afs_make_call()
    and afs_vl_get_capabilities() where it is simply ignored because it is
    assumed that the result of a probe will be collected asynchronously.

    Fileserver probing is similarly affected via afs_fs_get_capabilities().

    Fix this by always issuing a notification in __rxrpc_set_call_completion()
    if it shifts a call to the completed state, even if an error is also
    returned to the caller through the function return value.

    Also put in a little bit of optimisation to avoid taking the call
    state_lock and disabling softirqs if the call is already in the completed
    state and remove some now redundant rxrpc_notify_socket() calls.

    Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
    Reported-by: Gerry Seidman
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne

    David Howells
     

20 May, 2020

2 commits

  • The Rx protocol has a "previousPacket" field in it that is not handled in
    the same way by all protocol implementations. Sometimes it contains the
    serial number of the last DATA packet received, sometimes the sequence
    number of the last DATA packet received and sometimes the highest sequence
    number so far received.

    AF_RXRPC is using this to weed out ACKs that are out of date (it's possible
    for ACK packets to get reordered on the wire), but this does not work with
    OpenAFS which will just stick the sequence number of the last packet seen
    into previousPacket.

    The issue being seen is that big AFS FS.StoreData RPC (eg. of ~256MiB) are
    timing out when partly sent. A trace was captured, with an additional
    tracepoint to show ACKs being discarded in rxrpc_input_ack(). Here's an
    excerpt showing the problem.

    52873.203230: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 0002449c q=00024499 fl=09

    A DATA packet with sequence number 00024499 has been transmitted (the "q="
    field).

    ...
    52873.243296: rxrpc_rx_ack: c=000004ae 00012a2b DLY r=00024499 f=00024497 p=00024496 n=0
    52873.243376: rxrpc_rx_ack: c=000004ae 00012a2c IDL r=0002449b f=00024499 p=00024498 n=0
    52873.243383: rxrpc_rx_ack: c=000004ae 00012a2d OOS r=0002449d f=00024499 p=0002449a n=2

    The Out-Of-Sequence ACK indicates that the server didn't see DATA sequence
    number 00024499, but did see seq 0002449a (previousPacket, shown as "p=",
    skipped the number, but firstPacket, "f=", which shows the bottom of the
    window is set at that point).

    52873.252663: rxrpc_retransmit: c=000004ae q=24499 a=02 xp=14581537
    52873.252664: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244bc q=00024499 fl=0b *RETRANS*

    The packet has been retransmitted. Retransmission recurs until the peer
    says it got the packet.

    52873.271013: rxrpc_rx_ack: c=000004ae 00012a31 OOS r=000244a1 f=00024499 p=0002449e n=6

    More OOS ACKs indicate that the other packets that are already in the
    transmission pipeline are being received. The specific-ACK list is up to 6
    ACKs and NAKs.

    ...
    52873.284792: rxrpc_rx_ack: c=000004ae 00012a49 OOS r=000244b9 f=00024499 p=000244b6 n=30
    52873.284802: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=63505500
    52873.284804: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c2 q=00024499 fl=0b *RETRANS*
    52873.287468: rxrpc_rx_ack: c=000004ae 00012a4a OOS r=000244ba f=00024499 p=000244b7 n=31
    52873.287478: rxrpc_rx_ack: c=000004ae 00012a4b OOS r=000244bb f=00024499 p=000244b8 n=32

    At this point, the server's receive window is full (n=32) with presumably 1
    NAK'd packet and 31 ACK'd packets. We can't transmit any more packets.

    52873.287488: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=61327980
    52873.287489: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c3 q=00024499 fl=0b *RETRANS*
    52873.293850: rxrpc_rx_ack: c=000004ae 00012a4c DLY r=000244bc f=000244a0 p=00024499 n=25

    And now we've received an ACK indicating that a DATA retransmission was
    received. 7 packets have been processed (the occupied part of the window
    moved, as indicated by f= and n=).

    52873.293853: rxrpc_rx_discard_ack: c=000004ae r=00012a4c 000244a0
    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to track received ACKs that are discarded due to being
    outside of the Tx window.

    Signed-off-by: David Howells

    David Howells
     

11 May, 2020

1 commit

  • rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
    sufficiently sampled. This can cause problems with some fileservers with
    calls to the cache manager in the afs filesystem being dropped from the
    fileserver because a packet goes missing and the retransmission timeout is
    greater than the call expiry timeout.

    Fix this by:

    (1) Copying the RTT/RTO calculation code from Linux's TCP implementation
    and altering it to fit rxrpc.

    (2) Altering the various users of the RTT to make use of the new SRTT
    value.

    (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
    value instead (which is needed in jiffies), along with a backoff.

    Notes:

    (1) rxrpc provides RTT samples by matching the serial numbers on outgoing
    DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
    against the reference serial number in incoming REQUESTED ACK and
    PING-RESPONSE ACK packets.

    (2) Each packet that is transmitted on an rxrpc connection gets a new
    per-connection serial number, even for retransmissions, so an ACK can
    be cross-referenced to a specific trigger packet. This allows RTT
    information to be drawn from retransmitted DATA packets also.

    (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
    on an rxrpc_call because many RPC calls won't live long enough to
    generate more than one sample.

    (4) The calculated SRTT value is in units of 8ths of a microsecond rather
    than nanoseconds.

    The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

    Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
    Signed-off-by: David Howells

    David Howells
     

14 Mar, 2020

1 commit

  • Fix the handling of signals in client rxrpc calls made by the afs
    filesystem. Ignore signals completely, leaving call abandonment or
    connection loss to be detected by timeouts inside AF_RXRPC.

    Allowing a filesystem call to be interrupted after the entire request has
    been transmitted and an abort sent means that the server may or may not
    have done the action - and we don't know. It may even be worse than that
    for older servers.

    Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
    Signed-off-by: David Howells

    David Howells
     

31 Jan, 2020

1 commit

  • In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence
    number of the packet is immediately following the hard-ack point at the end
    of the function. However, this isn't sufficient, since the recvmsg side
    may have been advancing the window and then overrun the position in which
    we're adding - at which point rx_hard_ack >= seq0 and no notification is
    generated.

    Fix this by always generating a notification at the end of the input
    function.

    Without this, a long call may stall, possibly indefinitely.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     

27 Jan, 2020

1 commit

  • The subpacket scanning loop in rxrpc_receive_data() references the
    subpacket count in the private data part of the sk_buff in the loop
    termination condition. However, when the final subpacket is pasted into
    the ring buffer, the function is no longer has a ref on the sk_buff and
    should not be looking at sp->* any more. This point is actually marked in
    the code when skb is cleared (but sp is not - which is an error).

    Fix this by caching sp->nr_subpackets in a local variable and using that
    instead.

    Also clear 'sp' to catch accesses after that point.

    This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets
    trashed by the sk_buff getting freed and reused in the meantime.

    Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

21 Dec, 2019

1 commit


05 Sep, 2019

1 commit

  • There's a misplaced traceline in rxrpc_input_packet() which is looking at a
    packet that just got released rather than the replacement packet.

    Fix this by moving the traceline after the assignment that moves the new
    packet pointer to the actual packet pointer.

    Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
    Reported-by: Hillf Danton
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

27 Aug, 2019

5 commits

  • The in-place decryption routines in AF_RXRPC's rxkad security module
    currently call skb_cow_data() to make sure the data isn't shared and that
    the skb can be written over. This has a problem, however, as the softirq
    handler may be still holding a ref or the Rx ring may be holding multiple
    refs when skb_cow_data() is called in rxkad_verify_packet() - and so
    skb_shared() returns true and __pskb_pull_tail() dislikes that. If this
    occurs, something like the following report will be generated.

    kernel BUG at net/core/skbuff.c:1463!
    ...
    RIP: 0010:pskb_expand_head+0x253/0x2b0
    ...
    Call Trace:
    __pskb_pull_tail+0x49/0x460
    skb_cow_data+0x6f/0x300
    rxkad_verify_packet+0x18b/0xb10 [rxrpc]
    rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
    rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
    afs_extract_data+0x51/0x2d0 [kafs]
    afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
    afs_deliver_to_call+0xac/0x430 [kafs]
    afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
    afs_make_call+0x282/0x3f0 [kafs]
    afs_fs_fetch_data+0x164/0x300 [kafs]
    afs_fetch_data+0x54/0x130 [kafs]
    afs_readpages+0x20d/0x340 [kafs]
    read_pages+0x66/0x180
    __do_page_cache_readahead+0x188/0x1a0
    ondemand_readahead+0x17d/0x2e0
    generic_file_read_iter+0x740/0xc10
    __vfs_read+0x145/0x1a0
    vfs_read+0x8c/0x140
    ksys_read+0x4a/0xb0
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fix this by using skb_unshare() instead in the input path for DATA packets
    that have a security index != 0. Non-DATA packets don't need in-place
    encryption and neither do unencrypted DATA packets.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Reported-by: Julian Wollrath
    Signed-off-by: David Howells

    David Howells
     
  • Use the previously-added transmit-phase skbuff private flag to simplify the
    socket buffer tracing a bit. Which phase the skbuff comes from can now be
    divined from the skb rather than having to be guessed from the call state.

    We can also reduce the number of rxrpc_skb_trace values by eliminating the
    difference between Tx and Rx in the symbols.

    Signed-off-by: David Howells

    David Howells
     
  • Pass the reference held on a DATA skb in the rxrpc input handler into the
    Rx ring rather than getting an additional ref for this and then dropping
    the original ref at the end.

    Signed-off-by: David Howells

    David Howells
     
  • Use the information now cached in the skbuff private data to avoid the need
    to reparse a jumbo packet. We can find all the subpackets by dead
    reckoning, so it's only necessary to note how many there are, whether the
    last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK
    flag set.

    This is necessary as once recvmsg() can see the packet, it can start
    modifying it, such as doing in-place decryption.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     
  • Improve the information stored about jumbo packets so that we don't need to
    reparse them so much later.

    Signed-off-by: David Howells
    Reviewed-by: Jeffrey Altman

    David Howells
     

09 Aug, 2019

2 commits

  • Don't bother generating maxSkew in the ACK packet as it has been obsolete
    since AFS 3.1.

    Signed-off-by: David Howells
    Reviewed-by: Jeffrey Altman

    David Howells
     
  • The object lifetime management on the rxrpc_local struct is broken in that
    the rxrpc_local_processor() function is expected to clean up and remove an
    object - but it may get requeued by packets coming in on the backing UDP
    socket once it starts running.

    This may result in the assertion in rxrpc_local_rcu() firing because the
    memory has been scheduled for RCU destruction whilst still queued:

    rxrpc: Assertion failed
    ------------[ cut here ]------------
    kernel BUG at net/rxrpc/local_object.c:468!

    Note that if the processor comes around before the RCU free function, it
    will just do nothing because ->dead is true.

    Fix this by adding a separate refcount to count active users of the
    endpoint that causes the endpoint to be destroyed when it reaches 0.

    The original refcount can then be used to refcount objects through the work
    processor and cause the memory to be rcu freed when that reaches 0.

    Fixes: 4f95dd78a77e ("rxrpc: Rework local endpoint management")
    Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
    Signed-off-by: David Howells

    David Howells
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 Apr, 2019

1 commit

  • After commit 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook"),
    rxrpc_input_packet() is directly called from lockless UDP receive
    path, under rcu_read_lock() protection.

    It must therefore use RCU rules :

    - udp_sk->sk_user_data can be cleared at any point in this function.
    rcu_dereference_sk_user_data() is what we need here.

    - Also, since sk_user_data might have been set in rxrpc_open_socket()
    we must observe a proper RCU grace period before kfree(local) in
    rxrpc_lookup_local()

    v4: @local can be NULL in xrpc_lookup_local() as reported by kbuild test robot
    and Julia Lawall , thanks !

    v3,v2 : addressed David Howells feedback, thanks !

    syzbot reported :

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
    Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
    RSP: 0018:ffff88809d7aef58 EFLAGS: 00010002
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000026 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffff88809d7af090 R08: 0000000000000001 R09: 0000000000000001
    R10: ffffed1015d05bc7 R11: ffff888089428600 R12: 0000000000000000
    R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000001
    FS: 00007f059044d700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004b6040 CR3: 00000000955ca000 CR4: 00000000001406f0
    Call Trace:
    lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
    _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
    skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
    rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
    rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
    udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
    udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
    udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
    __udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
    udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
    ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
    ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
    NF_HOOK include/linux/netfilter.h:289 [inline]
    NF_HOOK include/linux/netfilter.h:283 [inline]
    ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
    dst_input include/net/dst.h:450 [inline]
    ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:289 [inline]
    NF_HOOK include/linux/netfilter.h:283 [inline]
    ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
    __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
    netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
    napi_frags_finish net/core/dev.c:5769 [inline]
    napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
    tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
    call_write_iter include/linux/fs.h:1866 [inline]
    do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
    do_iter_write fs/read_write.c:957 [inline]
    do_iter_write+0x184/0x610 fs/read_write.c:938
    vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
    do_writev+0x15e/0x370 fs/read_write.c:1037
    __do_sys_writev fs/read_write.c:1110 [inline]
    __se_sys_writev fs/read_write.c:1107 [inline]
    __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: David Howells
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Apr, 2019

1 commit

  • The rxrpc packet serial number cannot be safely used to compute out of
    order ack packets for several reasons:

    1. The allocation of serial numbers cannot be assumed to imply the order
    by which acks are populated and transmitted. In some rxrpc
    implementations, delayed acks and ping acks are transmitted
    asynchronously to the receipt of data packets and so may be transmitted
    out of order. As a result, they can race with idle acks.

    2. Serial numbers are allocated by the rxrpc connection and not the call
    and as such may wrap independently if multiple channels are in use.

    In any case, what matters is whether the ack packet provides new
    information relating to the bounds of the window (the firstPacket and
    previousPacket in the ACK data).

    Fix this by discarding packets that appear to wind back the window bounds
    rather than on serial number procession.

    Fixes: 298bc15b2079 ("rxrpc: Only take the rwind and mtu values from latest ACK")
    Signed-off-by: Jeffrey Altman
    Signed-off-by: David Howells
    Tested-by: Marc Dionne
    Signed-off-by: David S. Miller

    Jeffrey Altman
     

13 Oct, 2018

1 commit


09 Oct, 2018

2 commits

  • The rxrpc_input_packet() function and its call tree was built around the
    assumption that data_ready() handler called from UDP to inform a kernel
    service that there is data to be had was non-reentrant. This means that
    certain locking could be dispensed with.

    This, however, turns out not to be the case with a multi-queue network card
    that can deliver packets to multiple cpus simultaneously. Each of those
    cpus can be in the rxrpc_input_packet() function at the same time.

    Fix by adding or changing some structure members:

    (1) Add peer->rtt_input_lock to serialise access to the RTT buffer.

    (2) Make conn->service_id into a 32-bit variable so that it can be
    cmpxchg'd on all arches.

    (3) Add call->input_lock to serialise access to the Rx/Tx state. Note
    that although the Rx and Tx states are (almost) entirely separate,
    there's no point completing the separation and having separate locks
    since it's a bi-phasal RPC protocol rather than a bi-direction
    streaming protocol. Data transmission and data reception do not take
    place simultaneously on any particular call.

    and making the following functional changes:

    (1) In rxrpc_input_data(), hold call->input_lock around the core to
    prevent simultaneous producing of packets into the Rx ring and
    updating of tracking state for a particular call.

    (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
    check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
    The bit test and bit clear can then be combined. No further locking
    is needed here.

    (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
    the ACK packet. The superseded ACK check is then done both before and
    after the lock is taken.

    The handing of ackinfo data is split, parsing before the lock is taken
    and processing with it held. This is keyed on rxMTU being non-zero.

    Congestion management is also done within the locked section.

    (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
    rotation. The ACKALL packet carries no information and is only really
    useful after all packets have been transmitted since it's imprecise.

    (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
    prevent calls being simultaneously implicitly ended on two cpus and
    also to prevent any races with incoming call setup.

    (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
    on a connection. It is only permitted to happen once for a
    connection.

    (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
    rx->incoming_lock to see if someone else set up the call, connection
    or peer whilst we were getting there. We can't trust the values from
    the earlier routing check unless we pin refs on them - which we want
    to avoid.

    Further, we need to allow for an incoming call to have its state
    changed on another CPU between us making it live and us adjusting it
    because the conn is now in the RXRPC_CONN_SERVICE state.

    (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
    to the RTT buffer. Don't need to lock around setting peer->rtt.

    For reference, the inventory of state-accessing or state-altering functions
    used by the packet input procedure is:

    > rxrpc_input_packet()
    * PACKET CHECKING

    * ROUTING
    > rxrpc_post_packet_to_local()
    > rxrpc_find_connection_rcu() - uses RCU
    > rxrpc_lookup_peer_rcu() - uses RCU
    > rxrpc_find_service_conn_rcu() - uses RCU
    > idr_find() - uses RCU

    * CONNECTION-LEVEL PROCESSING
    - Service upgrade
    - Can only happen once per conn
    ! Changed to use cmpxchg
    > rxrpc_post_packet_to_conn()
    - Setting conn->hi_serial
    - Probably safe not using locks
    - Maybe use cmpxchg

    * CALL-LEVEL PROCESSING
    > Old-call checking
    > rxrpc_input_implicit_end_call()
    > rxrpc_call_completed()
    > rxrpc_queue_call()
    ! Need to take rx->incoming_lock
    > __rxrpc_disconnect_call()
    > rxrpc_notify_socket()
    > rxrpc_new_incoming_call()
    - Uses rx->incoming_lock for the entire process
    - Might be able to drop this earlier in favour of the call lock
    > rxrpc_incoming_call()
    ! Conflicts with rxrpc_input_implicit_end_call()
    > rxrpc_send_ping()
    - Don't need locks to check rtt state
    > rxrpc_propose_ACK

    * PACKET DISTRIBUTION
    > rxrpc_input_call_packet()
    > rxrpc_input_data()
    * QUEUE DATA PACKET ON CALL
    > rxrpc_reduce_call_timer()
    - Uses timer_reduce()
    ! Needs call->input_lock()
    > rxrpc_receiving_reply()
    ! Needs locking around ack state
    > rxrpc_rotate_tx_window()
    > rxrpc_end_tx_phase()
    > rxrpc_proto_abort()
    > rxrpc_input_dup_data()
    - Fills the Rx buffer
    - rxrpc_propose_ACK()
    - rxrpc_notify_socket()

    > rxrpc_input_ack()
    * APPLY ACK PACKET TO CALL AND DISCARD PACKET
    > rxrpc_input_ping_response()
    - Probably doesn't need any extra locking
    ! Need READ_ONCE() on call->ping_serial
    > rxrpc_input_check_for_lost_ack()
    - Takes call->lock to consult Tx buffer
    > rxrpc_peer_add_rtt()
    ! Needs to take a lock (peer->rtt_input_lock)
    ! Could perhaps manage with cmpxchg() and xadd() instead
    > rxrpc_input_requested_ack
    - Consults Tx buffer
    ! Probably needs a lock
    > rxrpc_peer_add_rtt()
    > rxrpc_propose_ack()
    > rxrpc_input_ackinfo()
    - Changes call->tx_winsize
    ! Use cmpxchg to handle change
    ! Should perhaps track serial number
    - Uses peer->lock to record MTU specification changes
    > rxrpc_proto_abort()
    ! Need to take call->input_lock
    > rxrpc_rotate_tx_window()
    > rxrpc_end_tx_phase()
    > rxrpc_input_soft_acks()
    - Consults the Tx buffer
    > rxrpc_congestion_management()
    - Modifies the Tx annotations
    ! Needs call->input_lock()
    > rxrpc_queue_call()

    > rxrpc_input_abort()
    * APPLY ABORT PACKET TO CALL AND DISCARD PACKET
    > rxrpc_set_call_completion()
    > rxrpc_notify_socket()

    > rxrpc_input_ackall()
    * APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
    ! Need to take call->input_lock
    > rxrpc_rotate_tx_window()
    > rxrpc_end_tx_phase()

    > rxrpc_reject_packet()

    There are some functions used by the above that queue the packet, after
    which the procedure is terminated:

    - rxrpc_post_packet_to_local()
    - local->event_queue is an sk_buff_head
    - local->processor is a work_struct
    - rxrpc_post_packet_to_conn()
    - conn->rx_queue is an sk_buff_head
    - conn->processor is a work_struct
    - rxrpc_reject_packet()
    - local->reject_queue is an sk_buff_head
    - local->processor is a work_struct

    And some that offload processing to process context:

    - rxrpc_notify_socket()
    - Uses RCU lock
    - Uses call->notify_lock to call call->notify_rx
    - Uses call->recvmsg_lock to queue recvmsg side
    - rxrpc_queue_call()
    - call->processor is a work_struct
    - rxrpc_propose_ACK()
    - Uses call->lock to wrap __rxrpc_propose_ACK()

    And a bunch that complete a call, all of which use call->state_lock to
    protect the call state:

    - rxrpc_call_completed()
    - rxrpc_set_call_completion()
    - rxrpc_abort_call()
    - rxrpc_proto_abort()
    - Also uses rxrpc_queue_call()

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Signed-off-by: David Howells

    David Howells
     
  • Move the out-of-order and duplicate ACK packet check to before the call to
    rxrpc_input_ackinfo() so that the receive window size and MTU size are only
    checked in the latest ACK packet and don't regress.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     

08 Oct, 2018

4 commits

  • Carry the call state out of the locked section in rxrpc_rotate_tx_window()
    rather than sampling it afterwards. This is only used to select tracepoint
    data, but could have changed by the time we do the tracepoint.

    Signed-off-by: David Howells

    David Howells
     
  • We should only call the function to end a call's Tx phase if we rotated the
    marked-last packet out of the transmission buffer.

    Make rxrpc_rotate_tx_window() return an indication of whether it just
    rotated the packet marked as the last out of the transmit buffer, carrying
    the information out of the locked section in that function.

    We can then check the return value instead of examining RXRPC_CALL_TX_LAST.

    Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
    Signed-off-by: David Howells

    David Howells
     
  • We don't need to take the RCU read lock in the rxrpc packet receive
    function because it's held further up the stack in the IP input routine
    around the UDP receive routines.

    Fix this by dropping the RCU read lock calls from rxrpc_input_packet().
    This simplifies the code.

    Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
    Signed-off-by: David Howells

    David Howells
     
  • Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception
    in which a packet is placed onto the UDP receive queue and then immediately
    removed again by rxrpc. Going via the queue in this manner seems like it
    should be unnecessary.

    This does, however, require the invention of a value to place in encap_type
    as that's one of the conditions to switch packets out to the encap_rcv
    hook. Possibly the value doesn't actually matter for anything other than
    sockopts on the UDP socket, which aren't accessible outside of rxrpc
    anyway.

    This seems to cut a bit of time out of the time elapsed between each
    sk_buff being timestamped and turning up in rxrpc (the final number in the
    following trace excerpts). I measured this by making the rxrpc_rx_packet
    trace point print the time elapsed between the skb being timestamped and
    the current time (in ns), e.g.:

    ... 424.278721: rxrpc_rx_packet: ... ACK 25026

    So doing a 512MiB DIO read from my test server, with an unmodified kernel:

    N min max sum mean stddev
    27605 2626 7581 7.83992e+07 2840.04 181.029

    and with the patch applied:

    N min max sum mean stddev
    27547 1895 12165 6.77461e+07 2459.29 255.02

    Signed-off-by: David Howells

    David Howells
     

05 Oct, 2018

2 commits

  • Fix the rxrpc_data_ready() function to pick up all packets and to not miss
    any. There are two problems:

    (1) The sk_data_ready pointer on the UDP socket is set *after* it is
    bound. This means that it's open for business before we're ready to
    dequeue packets and there's a tiny window exists in which a packet can
    sneak onto the receive queue, but we never know about it.

    Fix this by setting the pointers on the socket prior to binding it.

    (2) skb_recv_udp() will return an error (such as ENETUNREACH) if there was
    an error on the transmission side, even though we set the
    sk_error_report hook. Because rxrpc_data_ready() returns immediately
    in such a case, it never actually removes its packet from the receive
    queue.

    Fix this by abstracting out the UDP dequeuing and checksumming into a
    separate function that keeps hammering on skb_recv_udp() until it
    returns -EAGAIN, passing the packets extracted to the remainder of the
    function.

    and two potential problems:

    (3) It might be possible in some circumstances or in the future for
    packets to be being added to the UDP receive queue whilst rxrpc is
    running consuming them, so the data_ready() handler might get called
    less often than once per packet.

    Allow for this by fully draining the queue on each call as (2).

    (4) If a packet fails the checksum check, the code currently returns after
    discarding the packet without checking for more.

    Allow for this by fully draining the queue on each call as (2).

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Signed-off-by: David Howells
    Acked-by: Paolo Abeni

    David Howells
     
  • Fix some refs to init_net that should've been changed to the appropriate
    network namespace.

    Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
    Signed-off-by: David Howells
    Acked-by: Paolo Abeni

    David Howells
     

04 Oct, 2018

2 commits


28 Sep, 2018

5 commits

  • Make the following changes to improve the robustness of the code that sets
    up a new service call:

    (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
    service ID check and pass that along to rxrpc_new_incoming_call().
    This means that I can remove the check from rxrpc_new_incoming_call()
    without the need to worry about the socket attached to the local
    endpoint getting replaced - which would invalidate the check.

    (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
    done once. The peer is passed to rxrpc_new_incoming_call(), thereby
    saving the need to repeat the search.

    This also reduces the possibility of rxrpc_publish_service_conn()
    BUG()'ing due to the detection of a duplicate connection, despite the
    initial search done by rxrpc_find_connection_rcu() having turned up
    nothing.

    This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
    non-reentrant and the result of the initial search should still hold
    true, but it has proven possible to hit.

    I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
    the iteration over the hash table if it finds a matching peer with a
    zero usage count, but I don't know for sure since it's only ever been
    hit once that I know of.

    Another possibility is that a bug in rxrpc_data_ready() that checked
    the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
    might've let through a packet that caused a spurious and invalid call
    to be set up. That is addressed in another patch.

    (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
    usage count rather than stopping and returning not found, just in case
    there's another peer record behind it in the bucket.

    (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
    rather either use the peer cached in (2) or, if one wasn't found,
    preemptively install a new one.

    Fixes: 8496af50eb38 ("rxrpc: Use RCU to access a peer's service connection tree")
    Signed-off-by: David Howells

    David Howells
     
  • Do more up-front checking on incoming packets to weed out invalid ones and
    also ones aimed at services that we don't support.

    Whilst we're at it, replace the clearing of call and skew if we don't find
    a connection with just initialising the variables to zero at the top of the
    function.

    Signed-off-by: David Howells

    David Howells
     
  • In the input path, a received sk_buff can be marked for rejection by
    setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
    (such as an abort code) in skb->priority. The rejection is handled by
    queueing the sk_buff up for dealing with in process context. The output
    code reads the mark and priority and, theoretically, generates an
    appropriate response packet.

    However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
    message with a random abort code is generated (since skb->priority wasn't
    set to anything).

    Fix this by outputting the appropriate sort of packet.

    Also, whilst we're at it, most of the marks are no longer used, so remove
    them and rename the remaining two to something more obvious.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     
  • Fix RTT information gathering in AF_RXRPC by the following means:

    (1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.

    (2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
    collects it, set it at that point.

    (3) Allow ACKs to be requested on the last packet of a client call, but
    not a service call. We need to be careful lest we undo:

    bf7d620abf22c321208a4da4f435e7af52551a21
    Author: David Howells
    Date: Thu Oct 6 08:11:51 2016 +0100
    rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase

    but that only really applies to service calls that we're handling,
    since the client side gets to send the final ACK (or not).

    (4) When about to transmit an ACK or DATA packet, record the Tx timestamp
    before only; don't update the timestamp afterwards.

    (5) Switch the ordering between recording the serial and recording the
    timestamp to always set the serial number first. The serial number
    shouldn't be seen referenced by an ACK packet until we've transmitted
    the packet bearing it - so in the Rx path, we don't need the timestamp
    until we've checked the serial number.

    Fixes: cf1a6474f807 ("rxrpc: Add per-peer RTT tracker")
    Signed-off-by: David Howells

    David Howells
     
  • There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
    flag in the packet type field rather than in the packet flags field.

    Fix this by creating a pair of helper functions to check whether the packet
    is going to the client or to the server and use them generally.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells