15 Oct, 2020

2 commits

  • Fix the loss of transmission of a call's final ack when a socket gets shut
    down. This means that the server will retransmit the last data packet or
    send a ping ack and then get an ICMP indicating the port got closed. The
    server will then view this as a failure.

    Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call")
    Signed-off-by: David Howells

    David Howells
     
  • Fix rxrpc_unbundle_conn() to not drop the bundle usage count when cleaning
    up an exclusive connection.

    Based on the suggested fix from Hillf Danton.

    Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager")
    Reported-by: syzbot+d57aaf84dd8a550e6d91@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    cc: Hillf Danton

    David Howells
     

09 Oct, 2020

1 commit


06 Oct, 2020

1 commit

  • If someone calls setsockopt() twice to set a server key keyring, the first
    keyring is leaked.

    Fix it to return an error instead if the server key keyring is already set.

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Signed-off-by: David Howells

    David Howells
     

05 Oct, 2020

5 commits

  • The keyring containing the server's tokens isn't network-namespaced, so it
    shouldn't be looked up with a network namespace. It is expected to be
    owned specifically by the server, so namespacing is unnecessary.

    Fixes: a58946c158a0 ("keys: Pass the network namespace into request_key mechanism")
    Signed-off-by: David Howells

    David Howells
     
  • When a new incoming call arrives at an userspace rxrpc socket on a new
    connection that has a security class set, the code currently pushes it onto
    the accept queue to hold a ref on it for the socket. This doesn't work,
    however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
    state and discards the ref. This means that the call runs out of refs too
    early and the kernel oopses.

    By contrast, a kernel rxrpc socket manually pre-charges the incoming call
    pool with calls that already have user call IDs assigned, so they are ref'd
    by the call tree on the socket.

    Change the mode of operation for userspace rxrpc server sockets to work
    like this too. Although this is a UAPI change, server sockets aren't
    currently functional.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells

    David Howells
     
  • conn->state_lock may be taken in softirq mode, but a previous patch
    replaced an outer lock in the response-packet event handling code, and lost
    the _bh from that when doing so.

    Fix this by applying the _bh annotation to the state_lock locking.

    Fixes: a1399f8bb033 ("rxrpc: Call channels should have separate call number spaces")
    Signed-off-by: David Howells

    David Howells
     
  • If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a
    type it doesn't recognise, it can BUG in a couple of places, which is
    unnecessary as it can easily get back to userspace.

    Fix this to print an error message instead.

    Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
    Signed-off-by: David Howells

    David Howells
     
  • The session key should be encoded with just the 8 data bytes and
    no length; ENCODE_DATA precedes it with a 4 byte length, which
    confuses some existing tools that try to parse this format.

    Add an ENCODE_BYTES macro that does not include a length, and use
    it for the key. Also adjust the expected length.

    Note that commit 774521f353e1d ("rxrpc: Fix an assertion in
    rxrpc_read()") had fixed a BUG by changing the length rather than
    fixing the encoding. The original length was correct.

    Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells

    Marc Dionne
     

14 Sep, 2020

4 commits

  • When setting up a client connection, a second ref is accidentally obtained
    on the connection bundle (we get one when allocating the conn and a second
    one when adding the conn to the bundle).

    Fix it to only use the ref obtained by rxrpc_alloc_client_connection() and
    not to add a second when adding the candidate conn to the bundle.

    Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager")
    Signed-off-by: David Howells

    David Howells
     
  • When the network namespace exits, rxrpc_clean_up_local_conns() needs to
    unbundle each client connection it evicts. Fix it to do this.

    kernel BUG at net/rxrpc/conn_object.c:481!
    RIP: 0010:rxrpc_destroy_all_connections.cold+0x11/0x13 net/rxrpc/conn_object.c:481
    Call Trace:
    rxrpc_exit_net+0x1a4/0x2e0 net/rxrpc/net_ns.c:119
    ops_exit_list+0xb0/0x160 net/core/net_namespace.c:186
    cleanup_net+0x4ea/0xa00 net/core/net_namespace.c:603
    process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
    worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
    kthread+0x3b5/0x4a0 kernel/kthread.c:292
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

    Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager")
    Reported-by: syzbot+52071f826a617b9c76ed@syzkaller.appspotmail.com
    Signed-off-by: David Howells

    David Howells
     
  • The alloc_error field in the rxrpc_bundle struct should be signed as it has
    negative error codes assigned to it. Checks directly on it may then fail,
    and may produce a warning like this:

    net/rxrpc/conn_client.c:662 rxrpc_wait_for_channel()
    warn: 'bundle->alloc_error' is unsigned

    Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager")
    Reported-by Dan Carpenter
    Signed-off-by: David Howells

    David Howells
     
  • Fix an error-handling goto in rxrpc_connect_call() whereby it will jump to
    free the bundle it failed to allocate.

    Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager")
    Reported-by: kernel test robot
    Reported-by: Dan Carpenter
    Signed-off-by: David Howells

    David Howells
     

09 Sep, 2020

3 commits

  • Allow the number of parallel connections to a machine to be expanded from a
    single connection to a maximum of four. This allows up to 16 calls to be
    in progress at the same time to any particular peer instead of 4.

    Signed-off-by: David Howells

    David Howells
     
  • Rewrite the rxrpc client connection manager so that it can support multiple
    connections for a given security key to a peer. The following changes are
    made:

    (1) For each open socket, the code currently maintains an rbtree with the
    connections placed into it, keyed by communications parameters. This
    is tricky to maintain as connections can be culled from the tree or
    replaced within it. Connections can require replacement for a number
    of reasons, e.g. their IDs span too great a range for the IDR data
    type to represent efficiently, the call ID numbers on that conn would
    overflow or the conn got aborted.

    This is changed so that there's now a connection bundle object placed
    in the tree, keyed on the same parameters. The bundle, however, does
    not need to be replaced.

    (2) An rxrpc_bundle object can now manage the available channels for a set
    of parallel connections. The lock that manages this is moved there
    from the rxrpc_connection struct (channel_lock).

    (3) There'a a dummy bundle for all incoming connections to share so that
    they have a channel_lock too. It might be better to give each
    incoming connection its own bundle. This bundle is not needed to
    manage which channels incoming calls are made on because that's the
    solely at whim of the client.

    (4) The restrictions on how many client connections are around are
    removed. Instead, a previous patch limits the number of client calls
    that can be allocated. Ordinarily, client connections are reaped
    after 2 minutes on the idle queue, but when more than a certain number
    of connections are in existence, the reaper starts reaping them after
    2s of idleness instead to get the numbers back down.

    It could also be made such that new call allocations are forced to
    wait until the number of outstanding connections subsides.

    Signed-off-by: David Howells

    David Howells
     
  • Impose a maximum on the number of client rxrpc calls that are allowed
    simultaneously. This will be in lieu of a maximum number of client
    connections as this is easier to administed as, unlike connections, calls
    aren't reusable (to be changed in a subsequent patch)..

    This doesn't affect the limits on service calls and connections.

    Signed-off-by: David Howells

    David Howells
     

08 Sep, 2020

1 commit


04 Sep, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

    2) Fix loss of RTT samples in rxrpc, from David Howells.

    3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

    4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

    5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

    6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

    7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

    8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

    9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

    10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

    11) Memory leak in rxkad_verify_response, from Dinghao Liu.

    12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

    13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

    14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

    15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
    net/smc: fix sock refcounting in case of termination
    net/smc: reset sndbuf_desc if freed
    net/smc: set rx_off for SMCR explicitly
    net/smc: fix toleration of fake add_link messages
    tg3: Fix soft lockup when tg3_reset_task() fails.
    doc: net: dsa: Fix typo in config code sample
    net: dp83867: Fix WoL SecureOn password
    nfp: flower: fix ABI mismatch between driver and firmware
    tipc: fix shutdown() of connectionless socket
    ipv6: Fix sysctl max for fib_multipath_hash_policy
    drivers/net/wan/hdlc: Change the default of hard_header_len to 0
    net: gemini: Fix another missing clk_disable_unprepare() in probe
    net: bcmgenet: fix mask check in bcmgenet_validate_flow()
    amd-xgbe: Add support for new port mode
    net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    vhost: fix typo in error message
    net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
    pktgen: fix error message with wrong function name
    net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
    cxgb4: fix thermal zone device registration
    ...

    Linus Torvalds
     

28 Aug, 2020

1 commit

  • Fix a memory leak in rxkad_verify_response() whereby the response buffer
    doesn't get freed if we fail to allocate a ticket buffer.

    Fixes: ef68622da9cc ("rxrpc: Handle temporary errors better in rxkad security")
    Signed-off-by: Dinghao Liu
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    Dinghao Liu
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

21 Aug, 2020

2 commits

  • Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned
    smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't
    useful.

    Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout")
    Signed-off-by: David Howells

    David Howells
     
  • The Rx protocol has a mechanism to help generate RTT samples that works by
    a client transmitting a REQUESTED-type ACK when it receives a DATA packet
    that has the REQUEST_ACK flag set.

    The peer, however, may interpose other ACKs before transmitting the
    REQUESTED-ACK, as can be seen in the following trace excerpt:

    rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07
    rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0
    rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0
    ...

    DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx). The incoming
    ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the
    sequence number of the DATA packet), causing it to be discarded from the Tx
    ring. The ACK that was requested (labelled REQ, r=xx references the serial
    of the DATA packet) comes after the ping, but the sk_buff holding the
    timestamp has gone and the RTT sample is lost.

    This is particularly noticeable on RPC calls used to probe the service
    offered by the peer. A lot of peers end up with an unknown RTT because we
    only ever sent a single RPC. This confuses the server rotation algorithm.

    Fix this by caching the information about the outgoing packet in RTT
    calculations in the rxrpc_call struct rather than looking in the Tx ring.

    A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and
    PING-ACK transmissions are recorded in there. When the appropriate
    response ACK is received, the buffer is checked for a match and, if found,
    an RTT sample is recorded.

    If a received ACK refers to a packet with a later serial number than an
    entry in the cache, that entry is presumed lost and the entry is made
    available to record a new transmission.

    ACKs types other than REQUESTED-type and PING-type cause any matching
    sample to be cancelled as they don't necessarily represent a useful
    measurement.

    If there's no space in the buffer on ping/data transmission, the sample
    base is discarded.

    Fixes: 50235c4b5a2f ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets")
    Signed-off-by: David Howells

    David Howells
     

20 Aug, 2020

1 commit


02 Aug, 2020

1 commit


31 Jul, 2020

1 commit

  • There's a race between rxrpc_sendmsg setting up a call, but then failing to
    send anything on it due to an error, and recvmsg() seeing the call
    completion occur and trying to return the state to the user.

    An assertion fails in rxrpc_recvmsg() because the call has already been
    released from the socket and is about to be released again as recvmsg deals
    with it. (The recvmsg_q queue on the socket holds a ref, so there's no
    problem with use-after-free.)

    We also have to be careful not to end up reporting an error twice, in such
    a way that both returns indicate to userspace that the user ID supplied
    with the call is no longer in use - which could cause the client to
    malfunction if it recycles the user ID fast enough.

    Fix this by the following means:

    (1) When sendmsg() creates a call after the point that the call has been
    successfully added to the socket, don't return any errors through
    sendmsg(), but rather complete the call and let recvmsg() retrieve
    them. Make sendmsg() return 0 at this point. Further calls to
    sendmsg() for that call will fail with ESHUTDOWN.

    Note that at this point, we haven't send any packets yet, so the
    server doesn't yet know about the call.

    (2) If sendmsg() returns an error when it was expected to create a new
    call, it means that the user ID wasn't used.

    (3) Mark the call disconnected before marking it completed to prevent an
    oops in rxrpc_release_call().

    (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate
    that the user ID is no longer known by the kernel.

    An oops like the following is produced:

    kernel BUG at net/rxrpc/recvmsg.c:605!
    ...
    RIP: 0010:rxrpc_recvmsg+0x256/0x5ae
    ...
    Call Trace:
    ? __init_waitqueue_head+0x2f/0x2f
    ____sys_recvmsg+0x8a/0x148
    ? import_iovec+0x69/0x9c
    ? copy_msghdr_from_user+0x5c/0x86
    ___sys_recvmsg+0x72/0xaa
    ? __fget_files+0x22/0x57
    ? __fget_light+0x46/0x51
    ? fdget+0x9/0x1b
    do_recvmmsg+0x15e/0x232
    ? _raw_spin_unlock+0xa/0xb
    ? vtime_delta+0xf/0x25
    __x64_sys_recvmmsg+0x2c/0x2f
    do_syscall_64+0x4c/0x78
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 357f5ef64628 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()")
    Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne
    Signed-off-by: David S. Miller

    David Howells
     

26 Jul, 2020

1 commit

  • The UDP reuseport conflict was a little bit tricky.

    The net-next code, via bpf-next, extracted the reuseport handling
    into a helper so that the BPF sk lookup code could invoke it.

    At the same time, the logic for reuseport handling of unconnected
    sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
    which changed the logic to carry on the reuseport result into the
    rest of the lookup loop if we do not return immediately.

    This requires moving the reuseport_has_conns() logic into the callers.

    While we are here, get rid of inline directives as they do not belong
    in foo.c files.

    The other changes were cases of more straightforward overlapping
    modifications.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Jul, 2020

1 commit

  • Rework the remaining setsockopt code to pass a sockptr_t instead of a
    plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
    outside of architecture specific code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Stefan Schmidt [ieee802154]
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

21 Jul, 2020

1 commit

  • rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
    rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.

    Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
    as this particular error doesn't get stored in ->sk_err by the networking
    core.

    Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
    errors (there's no way for it to report which call, if any, the error was
    caused by).

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

14 Jul, 2020

1 commit


21 Jun, 2020

1 commit

  • When preallocated service calls are being discarded, they're passed to
    ->discard_new_call() to have the caller clean up any attached higher-layer
    preallocated pieces before being marked completed. However, the act of
    marking them completed now invokes the call's notification function - which
    causes a problem because that function might assume that the previously
    freed pieces of memory are still there.

    Fix this by setting a dummy notification function on the socket after
    calling ->discard_new_call().

    This results in the following kasan message when the kafs module is
    removed.

    ==================================================================
    BUG: KASAN: use-after-free in afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
    Write of size 1 at addr ffff8880946c39e4 by task kworker/u4:1/21

    CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.8.0-rc1-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: netns cleanup_net
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x18f/0x20d lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
    __kasan_report mm/kasan/report.c:513 [inline]
    kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
    afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
    rxrpc_notify_socket+0x1db/0x5d0 net/rxrpc/recvmsg.c:40
    __rxrpc_set_call_completion.part.0+0x172/0x410 net/rxrpc/recvmsg.c:76
    __rxrpc_call_completed net/rxrpc/recvmsg.c:112 [inline]
    rxrpc_call_completed+0xca/0xf0 net/rxrpc/recvmsg.c:111
    rxrpc_discard_prealloc+0x781/0xab0 net/rxrpc/call_accept.c:233
    rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
    afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
    afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
    ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
    cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
    process_one_work+0x965/0x1690 kernel/workqueue.c:2269
    worker_thread+0x96/0xe10 kernel/workqueue.c:2415
    kthread+0x3b5/0x4a0 kernel/kthread.c:291
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

    Allocated by task 6820:
    save_stack+0x1b/0x40 mm/kasan/common.c:48
    set_track mm/kasan/common.c:56 [inline]
    __kasan_kmalloc mm/kasan/common.c:494 [inline]
    __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
    kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
    kmalloc include/linux/slab.h:555 [inline]
    kzalloc include/linux/slab.h:669 [inline]
    afs_alloc_call+0x55/0x630 fs/afs/rxrpc.c:141
    afs_charge_preallocation+0xe9/0x2d0 fs/afs/rxrpc.c:757
    afs_open_socket+0x292/0x360 fs/afs/rxrpc.c:92
    afs_net_init+0xa6c/0xe30 fs/afs/main.c:125
    ops_init+0xaf/0x420 net/core/net_namespace.c:151
    setup_net+0x2de/0x860 net/core/net_namespace.c:341
    copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
    create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:110
    unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
    ksys_unshare+0x43d/0x8e0 kernel/fork.c:2983
    __do_sys_unshare kernel/fork.c:3051 [inline]
    __se_sys_unshare kernel/fork.c:3049 [inline]
    __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
    do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Freed by task 21:
    save_stack+0x1b/0x40 mm/kasan/common.c:48
    set_track mm/kasan/common.c:56 [inline]
    kasan_set_free_info mm/kasan/common.c:316 [inline]
    __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
    __cache_free mm/slab.c:3426 [inline]
    kfree+0x109/0x2b0 mm/slab.c:3757
    afs_put_call+0x585/0xa40 fs/afs/rxrpc.c:190
    rxrpc_discard_prealloc+0x764/0xab0 net/rxrpc/call_accept.c:230
    rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
    afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
    afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
    ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
    cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
    process_one_work+0x965/0x1690 kernel/workqueue.c:2269
    worker_thread+0x96/0xe10 kernel/workqueue.c:2415
    kthread+0x3b5/0x4a0 kernel/kthread.c:291
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

    The buggy address belongs to the object at ffff8880946c3800
    which belongs to the cache kmalloc-1k of size 1024
    The buggy address is located 484 bytes inside of
    1024-byte region [ffff8880946c3800, ffff8880946c3c00)
    The buggy address belongs to the page:
    page:ffffea000251b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0xfffe0000000200(slab)
    raw: 00fffe0000000200 ffffea0002546508 ffffea00024fa248 ffff8880aa000c40
    raw: 0000000000000000 ffff8880946c3000 0000000100000002 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8880946c3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880946c3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8880946c3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8880946c3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880946c3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

    Reported-by: syzbot+d3eccef36ddbd02713e9@syzkaller.appspotmail.com
    Fixes: 5ac0d62226a0 ("rxrpc: Fix missing notification")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

18 Jun, 2020

2 commits

  • Commit 2ad6691d988c, which moved the modification of the status annotation
    for a packet in the Tx buffer prior to the retransmission moved the state
    clearance, but managed to lose the bit that set it to UNACK.

    Consequently, if a retransmission occurs, the packet is accidentally
    changed to the ACK state (ie. 0) by masking it off, which means that the
    packet isn't counted towards the tally of newly-ACK'd packets if it gets
    hard-ACK'd. This then prevents the congestion control algorithm from
    recovering properly.

    Fix by reinstating the change of state to UNACK.

    Spotted by the generic/460 xfstest.

    Fixes: 2ad6691d988c ("rxrpc: Fix race between incoming ACK parser and retransmitter")
    Signed-off-by: David Howells

    David Howells
     
  • The handling of the receive window size (rwind) from a received ACK packet
    is not correct. The rxrpc_input_ackinfo() function currently checks the
    current Tx window size against the rwind from the ACK to see if it has
    changed, but then limits the rwind size before storing it in the tx_winsize
    member and, if it increased, wake up the transmitting process. This means
    that if rwind > RXRPC_RXTX_BUFF_SIZE - 1, this path will always be
    followed.

    Fix this by limiting rwind before we compare it to tx_winsize.

    The effect of this can be seen by enabling the rxrpc_rx_rwind_change
    tracepoint.

    Fixes: 702f2ac87a9a ("rxrpc: Wake up the transmitter if Rx window size increases on the peer")
    Signed-off-by: David Howells

    David Howells
     

12 Jun, 2020

1 commit

  • There's a race between the retransmission code and the received ACK parser.
    The problem is that the retransmission loop has to drop the lock under
    which it is iterating through the transmission buffer in order to transmit
    a packet, but whilst the lock is dropped, the ACK parser can crank the Tx
    window round and discard the packets from the buffer.

    The retransmission code then updated the annotations for the wrong packet
    and a later retransmission thought it had to retransmit a packet that
    wasn't there, leading to a NULL pointer dereference.

    Fix this by:

    (1) Moving the annotation change to before we drop the lock prior to
    transmission. This means we can't vary the annotation depending on
    the outcome of the transmission, but that's fine - we'll retransmit
    again later if it failed now.

    (2) Skipping the packet if the skb pointer is NULL.

    The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 000000000000002d
    Workqueue: krxrpcd rxrpc_process_call
    RIP: 0010:rxrpc_get_skb+0x14/0x8a
    ...
    Call Trace:
    rxrpc_resend+0x331/0x41e
    ? get_vtime_delta+0x13/0x20
    rxrpc_process_call+0x3c0/0x4ac
    process_one_work+0x18f/0x27f
    worker_thread+0x1a3/0x247
    ? create_worker+0x17d/0x17d
    kthread+0xe6/0xeb
    ? kthread_delayed_work_timer_fn+0x83/0x83
    ret_from_fork+0x1f/0x30

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

09 Jun, 2020

1 commit

  • David Howells says:

    ====================
    rxrpc: Fix hang due to missing notification

    Here's a fix for AF_RXRPC. Occasionally calls hang because there are
    circumstances in which rxrpc generate a notification when a call is
    completed - primarily because initial packet transmission failed and the
    call was killed off and an error returned. But the AFS filesystem driver
    doesn't check this under all circumstances, expecting failure to be
    delivered by asynchronous notification.

    There are two patches: the first moves the problematic bits out-of-line and
    the second contains the fix.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Jun, 2020

1 commit

  • Pull AFS updates from David Howells:
    "There's some core VFS changes which affect a couple of filesystems:

    - Make the inode hash table RCU safe and providing some RCU-safe
    accessor functions. The search can then be done without taking the
    inode_hash_lock. Care must be taken because the object may be being
    deleted and no wait is made.

    - Allow iunique() to avoid taking the inode_hash_lock.

    - Allow AFS's callback processing to avoid taking the inode_hash_lock
    when using the inode table to find an inode to notify.

    - Improve Ext4's time updating. Konstantin Khlebnikov said "For now,
    I've plugged this issue with try-lock in ext4 lazy time update.
    This solution is much better."

    Then there's a set of changes to make a number of improvements to the
    AFS driver:

    - Improve callback (ie. third party change notification) processing
    by:

    (a) Relying more on the fact we're doing this under RCU and by
    using fewer locks. This makes use of the RCU-based inode
    searching outlined above.

    (b) Moving to keeping volumes in a tree indexed by volume ID
    rather than a flat list.

    (c) Making the server and volume records logically part of the
    cell. This means that a server record now points directly at
    the cell and the tree of volumes is there. This removes an N:M
    mapping table, simplifying things.

    - Improve keeping NAT or firewall channels open for the server
    callbacks to reach the client by actively polling the fileserver on
    a timed basis, instead of only doing it when we have an operation
    to process.

    - Improving detection of delayed or lost callbacks by including the
    parent directory in the list of file IDs to be queried when doing a
    bulk status fetch from lookup. We can then check to see if our copy
    of the directory has changed under us without us getting notified.

    - Determine aliasing of cells (such as a cell that is pointed to be a
    DNS alias). This allows us to avoid having ambiguity due to
    apparently different cells using the same volume and file servers.

    - Improve the fileserver rotation to do more probing when it detects
    that all of the addresses to a server are listed as non-responsive.
    It's possible that an address that previously stopped responding
    has become responsive again.

    Beyond that, lay some foundations for making some calls asynchronous:

    - Turn the fileserver cursor struct into a general operation struct
    and hang the parameters off of that rather than keeping them in
    local variables and hang results off of that rather than the call
    struct.

    - Implement some general operation handling code and simplify the
    callers of operations that affect a volume or a volume component
    (such as a file). Most of the operation is now done by core code.

    - Operations are supplied with a table of operations to issue
    different variants of RPCs and to manage the completion, where all
    the required data is held in the operation object, thereby allowing
    these to be called from a workqueue.

    - Put the standard "if (begin), while(select), call op, end" sequence
    into a canned function that just emulates the current behaviour for
    now.

    There are also some fixes interspersed:

    - Don't let the EACCES from ICMP6 mapping reach the user as such,
    since it's confusing as to whether it's a filesystem error. Convert
    it to EHOSTUNREACH.

    - Don't use the epoch value acquired through probing a server. If we
    have two servers with the same UUID but in different cells, it's
    hard to draw conclusions from them having different epoch values.

    - Don't interpret the argument to the CB.ProbeUuid RPC as a
    fileserver UUID and look up a fileserver from it.

    - Deal with servers in different cells having the same UUIDs. In the
    event that a CB.InitCallBackState3 RPC is received, we have to
    break the callback promises for every server record matching that
    UUID.

    - Don't let afs_statfs return values that go below 0.

    - Don't use running fileserver probe state to make server selection
    and address selection decisions on. Only make decisions on final
    state as the running state is cleared at the start of probing"

    Acked-by: Al Viro (fs/inode.c part)

    * tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (27 commits)
    afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly
    afs: Show more a bit more server state in /proc/net/afs/servers
    afs: Don't use probe running state to make decisions outside probe code
    afs: Fix afs_statfs() to not let the values go below zero
    afs: Fix the by-UUID server tree to allow servers with the same UUID
    afs: Reorganise volume and server trees to be rooted on the cell
    afs: Add a tracepoint to track the lifetime of the afs_volume struct
    afs: Detect cell aliases 3 - YFS Cells with a canonical cell name op
    afs: Detect cell aliases 2 - Cells with no root volumes
    afs: Detect cell aliases 1 - Cells with root volumes
    afs: Implement client support for the YFSVL.GetCellName RPC op
    afs: Retain more of the VLDB record for alias detection
    afs: Fix handling of CB.ProbeUuid cache manager op
    afs: Don't get epoch from a server because it may be ambiguous
    afs: Build an abstraction around an "operation" concept
    afs: Rename struct afs_fs_cursor to afs_operation
    afs: Remove the error argument from afs_protocol_error()
    afs: Set error flag rather than return error from file status decode
    afs: Make callback processing more efficient.
    afs: Show more information in /proc/net/afs/servers
    ...

    Linus Torvalds
     

05 Jun, 2020

2 commits

  • Under some circumstances, rxrpc will fail a transmit a packet through the
    underlying UDP socket (ie. UDP sendmsg returns an error). This may result
    in a call getting stuck.

    In the instance being seen, where AFS tries to send a probe to the Volume
    Location server, tracepoints show the UDP Tx failure (in this case returing
    error 99 EADDRNOTAVAIL) and then nothing more:

    afs_make_vl_call: c=0000015d VL.GetCapabilities
    rxrpc_call: c=0000015d NWc u=1 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000dd89ee8a
    rxrpc_call: c=0000015d Gus u=2 sp=rxrpc_new_client_call+0x14f/0x580 [rxrpc] a=00000000e20e4b08
    rxrpc_call: c=0000015d SEE u=2 sp=rxrpc_activate_one_channel+0x7b/0x1c0 [rxrpc] a=00000000e20e4b08
    rxrpc_call: c=0000015d CON u=2 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000e20e4b08
    rxrpc_tx_fail: c=0000015d r=1 ret=-99 CallDataNofrag

    The problem is that if the initial packet fails and the retransmission
    timer hasn't been started, the call is set to completed and an error is
    returned from rxrpc_send_data_packet() to rxrpc_queue_packet(). Though
    rxrpc_instant_resend() is called, this does nothing because the call is
    marked completed.

    So rxrpc_notify_socket() isn't called and the error is passed back up to
    rxrpc_send_data(), rxrpc_kernel_send_data() and thence to afs_make_call()
    and afs_vl_get_capabilities() where it is simply ignored because it is
    assumed that the result of a probe will be collected asynchronously.

    Fileserver probing is similarly affected via afs_fs_get_capabilities().

    Fix this by always issuing a notification in __rxrpc_set_call_completion()
    if it shifts a call to the completed state, even if an error is also
    returned to the caller through the function return value.

    Also put in a little bit of optimisation to avoid taking the call
    state_lock and disabling softirqs if the call is already in the completed
    state and remove some now redundant rxrpc_notify_socket() calls.

    Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
    Reported-by: Gerry Seidman
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne

    David Howells
     
  • Move the handling of call completion out of line so that the next patch can
    add more code in that area.

    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne

    David Howells
     

31 May, 2020

2 commits


29 May, 2020

1 commit