01 Oct, 2020

2 commits

  • [ Upstream commit 0771d7df819284d46cf5cfb57698621b503ec17f ]

    Upon receipt of a service subscription request from user via a topology
    connection, one 'sub' object will be allocated in kernel, so it will be
    able to send an event of the service if any to the user correspondingly
    then. Also, in case of any failure, the connection will be shutdown and
    all the pertaining 'sub' objects will be freed.

    However, there is a race condition as follows resulting in memory leak:

    receive-work connection send-work
    | | |
    sub-1 | orphan | |

    That is, the 'receive-work' may get the last subscription request while
    the 'send-work' is shutting down the connection due to peer close.

    We had a 'lock' on the connection, so the two actions cannot be carried
    out simultaneously. If the last subscription is allocated e.g. 'sub-n',
    before the 'send-work' closes the connection, there will be no issue at
    all, the 'sub' objects will be freed. In contrast the last subscription
    will become orphan since the connection was closed, and we released all
    references.

    This commit fixes the issue by simply adding one test if the connection
    remains in 'connected' state right after we obtain the connection lock,
    then a subscription object can be created as usual, otherwise we ignore
    it.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Reported-by: Thang Ngo
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Tuong Lien
     
  • [ Upstream commit 49afb806cb650dd1f06f191994f3aa657d264009 ]

    When a socket is suddenly shutdown or released, it will reject all the
    unreceived messages in its receive queue. This applies to a connected
    socket too, whereas there is only one 'FIN' message required to be sent
    back to its peer in this case.

    In case there are many messages in the queue and/or some connections
    with such messages are shutdown at the same time, the link layer will
    easily get overflowed at the 'TIPC_SYSTEM_IMPORTANCE' backlog level
    because of the message rejections. As a result, the link will be taken
    down. Moreover, immediately when the link is re-established, the socket
    layer can continue to reject the messages and the same issue happens...

    The commit refactors the '__tipc_shutdown()' function to only send one
    'FIN' in the situation mentioned above. For the connectionless case, it
    is unavoidable but usually there is no rejections for such socket
    messages because they are 'dest-droppable' by default.

    In addition, the new code makes the other socket states clear
    (e.g.'TIPC_LISTEN') and treats as a separate case to avoid misbehaving.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Tuong Lien
     

27 Sep, 2020

3 commits

  • [ Upstream commit ff48b6222e65ebdba5a403ef1deba6214e749193 ]

    In tipc_buf_append() it may change skb's frag_list, and it causes
    problems when this skb is cloned. skb_unclone() doesn't really
    make this skb's flag_list available to change.

    Shuang Li has reported an use-after-free issue because of this
    when creating quite a few macvlan dev over the same dev, where
    the broadcast packets will be cloned and go up to the stack:

    [ ] BUG: KASAN: use-after-free in pskb_expand_head+0x86d/0xea0
    [ ] Call Trace:
    [ ] dump_stack+0x7c/0xb0
    [ ] print_address_description.constprop.7+0x1a/0x220
    [ ] kasan_report.cold.10+0x37/0x7c
    [ ] check_memory_region+0x183/0x1e0
    [ ] pskb_expand_head+0x86d/0xea0
    [ ] process_backlog+0x1df/0x660
    [ ] net_rx_action+0x3b4/0xc90
    [ ]
    [ ] Allocated by task 1786:
    [ ] kmem_cache_alloc+0xbf/0x220
    [ ] skb_clone+0x10a/0x300
    [ ] macvlan_broadcast+0x2f6/0x590 [macvlan]
    [ ] macvlan_process_broadcast+0x37c/0x516 [macvlan]
    [ ] process_one_work+0x66a/0x1060
    [ ] worker_thread+0x87/0xb10
    [ ]
    [ ] Freed by task 3253:
    [ ] kmem_cache_free+0x82/0x2a0
    [ ] skb_release_data+0x2c3/0x6e0
    [ ] kfree_skb+0x78/0x1d0
    [ ] tipc_recvmsg+0x3be/0xa40 [tipc]

    So fix it by using skb_unshare() instead, which would create a new
    skb for the cloned frag and it'll be safe to change its frag_list.
    The similar things were also done in sctp_make_reassembled_event(),
    which is using skb_copy().

    Reported-by: Shuang Li
    Fixes: 37e22164a8a3 ("tipc: rename and move message reassembly function")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit a4b5cc9e10803ecba64a7d54c0f47e4564b4a980 ]

    I confirmed that the problem fixed by commit 2a63866c8b51a3f7 ("tipc: fix
    shutdown() of connectionless socket") also applies to stream socket.

    ----------
    #include
    #include
    #include

    int main(int argc, char *argv[])
    {
    int fds[2] = { -1, -1 };
    socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
    if (fork() == 0)
    _exit(read(fds[0], NULL, 1));
    shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
    wait(NULL); /* To be woken up by _exit(). */
    return 0;
    }
    ----------

    Since shutdown(SHUT_RDWR) should affect all processes sharing that socket,
    unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right
    behavior.

    Signed-off-by: Tetsuo Handa
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     
  • [ Upstream commit bb3a420d47ab00d7e1e5083286cab15235a96680 ]

    tipc_group_add_to_tree() returns silently if `key` matches `nkey` of an
    existing node, causing tipc_group_create_member() to leak memory. Let
    tipc_group_add_to_tree() return an error in such a case, so that
    tipc_group_create_member() can handle it properly.

    Fixes: 75da2163dbb6 ("tipc: introduce communication groups")
    Reported-and-tested-by: syzbot+f95d90c454864b3b5bc9@syzkaller.appspotmail.com
    Cc: Hillf Danton
    Link: https://syzkaller.appspot.com/bug?id=048390604fe1b60df34150265479202f10e13aff
    Signed-off-by: Peilin Ye
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Peilin Ye
     

12 Sep, 2020

1 commit

  • [ Upstream commit 2a63866c8b51a3f72cea388dfac259d0e14c4ba6 ]

    syzbot is reporting hung task at nbd_ioctl() [1], for there are two
    problems regarding TIPC's connectionless socket's shutdown() operation.

    ----------
    #include
    #include
    #include
    #include
    #include

    int main(int argc, char *argv[])
    {
    const int fd = open("/dev/nbd0", 3);
    alarm(5);
    ioctl(fd, NBD_SET_SOCK, socket(PF_TIPC, SOCK_DGRAM, 0));
    ioctl(fd, NBD_DO_IT, 0); /* To be interrupted by SIGALRM. */
    return 0;
    }
    ----------

    One problem is that wait_for_completion() from flush_workqueue() from
    nbd_start_device_ioctl() from nbd_ioctl() cannot be completed when
    nbd_start_device_ioctl() received a signal at wait_event_interruptible(),
    for tipc_shutdown() from kernel_sock_shutdown(SHUT_RDWR) from
    nbd_mark_nsock_dead() from sock_shutdown() from nbd_start_device_ioctl()
    is failing to wake up a WQ thread sleeping at wait_woken() from
    tipc_wait_for_rcvmsg() from sock_recvmsg() from sock_xmit() from
    nbd_read_stat() from recv_work() scheduled by nbd_start_device() from
    nbd_start_device_ioctl(). Fix this problem by always invoking
    sk->sk_state_change() (like inet_shutdown() does) when tipc_shutdown() is
    called.

    The other problem is that tipc_wait_for_rcvmsg() cannot return when
    tipc_shutdown() is called, for tipc_shutdown() sets sk->sk_shutdown to
    SEND_SHUTDOWN (despite "how" is SHUT_RDWR) while tipc_wait_for_rcvmsg()
    needs sk->sk_shutdown set to RCV_SHUTDOWN or SHUTDOWN_MASK. Fix this
    problem by setting sk->sk_shutdown to SHUTDOWN_MASK (like inet_shutdown()
    does) when the socket is connectionless.

    [1] https://syzkaller.appspot.com/bug?id=3fe51d307c1f0a845485cf1798aa059d12bf18b2

    Reported-by: syzbot
    Signed-off-by: Tetsuo Handa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     

03 Sep, 2020

1 commit

  • [ Upstream commit 47733f9daf4fe4f7e0eb9e273f21ad3a19130487 ]

    __tipc_nl_compat_dumpit() has two callers, and it expects them to
    pass a valid nlmsghdr via arg->data. This header is artificial and
    crafted just for __tipc_nl_compat_dumpit().

    tipc_nl_compat_publ_dump() does so by putting a genlmsghdr as well
    as some nested attribute, TIPC_NLA_SOCK. But the other caller
    tipc_nl_compat_dumpit() does not, this leaves arg->data uninitialized
    on this call path.

    Fix this by just adding a similar nlmsghdr without any payload in
    tipc_nl_compat_dumpit().

    This bug exists since day 1, but the recent commit 6ea67769ff33
    ("net: tipc: prepare attrs in __tipc_nl_compat_dumpit()") makes it
    easier to appear.

    Reported-and-tested-by: syzbot+0e7181deafa7e0b79923@syzkaller.appspotmail.com
    Fixes: d0796d1ef63d ("tipc: convert legacy nl bearer dump to nl compat")
    Cc: Jon Maloy
    Cc: Ying Xue
    Cc: Richard Alpe
    Signed-off-by: Cong Wang
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

03 Jun, 2020

1 commit

  • [ Upstream commit 1378817486d6860f6a927f573491afe65287abf1 ]

    dst_cache_get() documents it must be used with BH disabled.

    sysbot reported :

    BUG: using smp_processor_id() in preemptible [00000000] code: /21697
    caller is dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
    CPU: 0 PID: 21697 Comm: Not tainted 5.7.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x188/0x20d lib/dump_stack.c:118
    check_preemption_disabled lib/smp_processor_id.c:47 [inline]
    debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57
    dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
    tipc_udp_xmit.isra.0+0xb9/0xad0 net/tipc/udp_media.c:164
    tipc_udp_send_msg+0x3e6/0x490 net/tipc/udp_media.c:244
    tipc_bearer_xmit_skb+0x1de/0x3f0 net/tipc/bearer.c:526
    tipc_enable_bearer+0xb2f/0xd60 net/tipc/bearer.c:331
    __tipc_nl_bearer_enable+0x2bf/0x390 net/tipc/bearer.c:995
    tipc_nl_bearer_enable+0x1e/0x30 net/tipc/bearer.c:1003
    genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline]
    genl_family_rcv_msg net/netlink/genetlink.c:718 [inline]
    genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735
    netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
    genl_rcv+0x24/0x40 net/netlink/genetlink.c:746
    netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
    netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
    ___sys_sendmsg+0x100/0x170 net/socket.c:2416
    __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
    do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
    entry_SYSCALL_64_after_hwframe+0x49/0xb3
    RIP: 0033:0x45ca29

    Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media")
    Cc: Xin Long
    Cc: Jon Maloy
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

14 May, 2020

1 commit

  • [ Upstream commit 980d69276f3048af43a045be2925dacfb898a7be ]

    When an application connects to the TIPC topology server and subscribes
    to some services, a new connection is created along with some objects -
    'tipc_subscription' to store related data correspondingly...
    However, there is one omission in the connection handling that when the
    connection or application is orderly shutdown (e.g. via SIGQUIT, etc.),
    the connection is not closed in kernel, the 'tipc_subscription' objects
    are not freed too.
    This results in:
    - The maximum number of subscriptions (65535) will be reached soon, new
    subscriptions will be rejected;
    - TIPC module cannot be removed (unless the objects are somehow forced
    to release first);

    The commit fixes the issue by closing the connection if the 'recvmsg()'
    returns '0' i.e. when the peer is shutdown gracefully. It also includes
    the other unexpected cases.

    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tuong Lien
     

18 Mar, 2020

1 commit

  • [ Upstream commit 213320a67962ff6e7b83b704d55cbebc341426db ]

    Add missing attribute validation for TIPC_NLA_PROP_MTU
    to the netlink policy.

    Fixes: 901271e0403a ("tipc: implement configuration of UDP media MTU")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     

26 Jan, 2020

5 commits

  • commit 12db3c8083fcab4270866a88191933f2d9f24f89 upstream.

    In function __tipc_shutdown(), the timeout value passed to
    tipc_wait_for_cond() is not jiffies.

    This commit fixes it by converting that value from milliseconds
    to jiffies.

    Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
    Signed-off-by: Tung Nguyen
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tung Nguyen
     
  • commit 91a4a3eb433e4d786420c41f3c08d1d16c605962 upstream.

    When tipc_sk_timeout() is executed but user space is grabbing
    ownership, this function rearms itself and returns. However, the
    socket reference counter is not reduced. This causes potential
    unexpected behavior.

    This commit fixes it by calling sock_put() before tipc_sk_timeout()
    returns in the above-mentioned case.

    Fixes: afe8792fec69 ("tipc: refactor function tipc_sk_timeout()")
    Signed-off-by: Tung Nguyen
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tung Nguyen
     
  • commit 2fe97a578d7bad3116a89dc8a6692a51e6fc1d9c upstream.

    When initiating a connection message to a server side, the connection
    message is cloned and added to the socket write queue. However, if the
    cloning is failed, only the socket write queue is purged. It causes
    memory leak because the original connection message is not freed.

    This commit fixes it by purging the list of connection message when
    it cannot be cloned.

    Fixes: 6787927475e5 ("tipc: buffer overflow handling in listener socket")
    Reported-by: Hoang Le
    Signed-off-by: Tung Nguyen
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tung Nguyen
     
  • commit 46cb01eeeb86fca6afe24dda1167b0cb95424e29 upstream.

    In commit 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address
    hash values"), the 32-bit node address only generated after one second
    trial period expired. However the self's addr in struct tipc_monitor do
    not update according to node address generated. This lead to it is
    always zero as initial value. As result, sorting algorithm using this
    value does not work as expected, neither neighbor monitoring framework.

    In this commit, we add a fix to update self's addr when 32-bit node
    address generated.

    Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hoang Le
     
  • commit 426071f1f3995d7e9603246bffdcbf344cd31719 upstream.

    With huge cluster (e.g >200nodes), the amount of that flow:
    gap -> retransmit packet -> acked will take time in case of STATE_MSG
    dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance
    value criteria made link easy failure around 2nd, 3rd of failed
    retransmission attempts.

    Instead of re-introduced criteria of 99 faled retransmissions to fix the
    issue, we increase failure detection timer to ten times tolerance value.

    Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Acked-by: Jon
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hoang Le
     

23 Jan, 2020

2 commits

  • commit abc9b4e0549b93fdaff56e9532bc49a2d7b04955 upstream.

    When a user message is sent, TIPC will check if the socket has faced a
    congestion at link layer. If that happens, it will make a sleep to wait
    for the congestion to disappear. This leaves a gap for other users to
    take over the socket (e.g. multi threads) since the socket is released
    as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free
    to send messages to various destinations (e.g. via 'sendto()'), then
    the socket's preformatted header has to be updated correspondingly
    prior to the actual payload message building.

    Unfortunately, the latter action is done before the first action which
    causes a condition issue that the destination of a certain message can
    be modified incorrectly in the middle, leading to wrong destination
    when that message is built. Consequently, when the message is sent to
    the link layer, it gets stuck there forever because the peer node will
    simply reject it. After a number of retransmission attempts, the link
    is eventually taken down and the retransmission failure is reported.

    This commit fixes the problem by rearranging the order of actions to
    prevent the race condition from occurring, so the message building is
    'atomic' and its header will not be modified by anyone.

    Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tuong Lien
     
  • commit dca4a17d24ee9d878836ce5eb8dc25be1ffa5729 upstream.

    In commit c55c8edafa91 ("tipc: smooth change between replicast and
    broadcast"), we allow instant switching between replicast and broadcast
    by sending a dummy 'SYN' packet on the last used link to synchronize
    packets on the links. The 'SYN' message is an object of link congestion
    also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent
    back to the socket...
    However, in that commit, we simply use the same socket 'cong_link_cnt'
    counter for both the 'SYN' & normal payload message sending. Therefore,
    if both the replicast & broadcast links are congested, the counter will
    be not updated correctly but overwritten by the latter congestion.
    Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is
    reduced one by one and eventually overflowed. Consequently, further
    activities on the socket will only wait for the false congestion signal
    to disappear but never been met.

    Because sending the 'SYN' message is vital for the mechanism, it should
    be done anyway. This commit fixes the issue by marking the message with
    an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a
    link congestion, there is no need to touch the socket 'cong_link_cnt'
    either. In addition, in the event of any error (e.g. -ENOBUFS), we will
    purge the entire payload message queue and make a return immediately.

    Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast")
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tuong Lien
     

18 Dec, 2019

2 commits

  • [ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ]

    ipv6_stub uses the ip6_dst_lookup function to allow other modules to
    perform IPv6 lookups. However, this function skips the XFRM layer
    entirely.

    All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
    ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
    which calls xfrm_lookup_route(). This patch fixes this inconsistent
    behavior by switching the stub to ip6_dst_lookup_flow, which also calls
    xfrm_lookup_route().

    This requires some changes in all the callers, as these two functions
    take different arguments and have different return types.

    Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit 9cf1cd8ee3ee09ef2859017df2058e2f53c5347f ]

    In order to set/get/dump, the tipc uses the generic netlink
    infrastructure. So, when tipc module is inserted, init function
    calls genl_register_family().
    After genl_register_family(), set/get/dump commands are immediately
    allowed and these callbacks internally use the net_generic.
    net_generic is allocated by register_pernet_device() but this
    is called after genl_register_family() in the __init function.
    So, these callbacks would use un-initialized net_generic.

    Test commands:
    #SHELL1
    while :
    do
    modprobe tipc
    modprobe -rv tipc
    done

    #SHELL2
    while :
    do
    tipc link list
    done

    Splat looks like:
    [ 59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
    [ 59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
    [ 59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
    [ 59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
    [ 59.622550][ T2780] NET: Registered protocol family 30
    [ 59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
    [ 59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
    [ 59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
    [ 59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
    [ 59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
    [ 59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
    [ 59.624639][ T2788] FS: 00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
    [ 59.624645][ T2788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 59.625875][ T2780] tipc: Started in single node mode
    [ 59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
    [ 59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 59.636478][ T2788] Call Trace:
    [ 59.637025][ T2788] tipc_nl_add_bc_link+0x179/0x1470 [tipc]
    [ 59.638219][ T2788] ? lock_downgrade+0x6e0/0x6e0
    [ 59.638923][ T2788] ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
    [ 59.639533][ T2788] ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
    [ 59.640160][ T2788] ? mutex_lock_io_nested+0x1380/0x1380
    [ 59.640746][ T2788] tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
    [ 59.641356][ T2788] ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
    [ 59.642088][ T2788] ? __skb_ext_del+0x270/0x270
    [ 59.642594][ T2788] genl_lock_dumpit+0x85/0xb0
    [ 59.643050][ T2788] netlink_dump+0x49c/0xed0
    [ 59.643529][ T2788] ? __netlink_sendskb+0xc0/0xc0
    [ 59.644044][ T2788] ? __netlink_dump_start+0x190/0x800
    [ 59.644617][ T2788] ? __mutex_unlock_slowpath+0xd0/0x670
    [ 59.645177][ T2788] __netlink_dump_start+0x5a0/0x800
    [ 59.645692][ T2788] genl_rcv_msg+0xa75/0xe90
    [ 59.646144][ T2788] ? __lock_acquire+0xdfe/0x3de0
    [ 59.646692][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
    [ 59.647340][ T2788] ? genl_lock_dumpit+0xb0/0xb0
    [ 59.647821][ T2788] ? genl_unlock+0x20/0x20
    [ 59.648290][ T2788] ? genl_parallel_done+0xe0/0xe0
    [ 59.648787][ T2788] ? find_held_lock+0x39/0x1d0
    [ 59.649276][ T2788] ? genl_rcv+0x15/0x40
    [ 59.649722][ T2788] ? lock_contended+0xcd0/0xcd0
    [ 59.650296][ T2788] netlink_rcv_skb+0x121/0x350
    [ 59.650828][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
    [ 59.651491][ T2788] ? netlink_ack+0x940/0x940
    [ 59.651953][ T2788] ? lock_acquire+0x164/0x3b0
    [ 59.652449][ T2788] genl_rcv+0x24/0x40
    [ 59.652841][ T2788] netlink_unicast+0x421/0x600
    [ ... ]

    Fixes: 7e4369057806 ("tipc: fix a slab object leak")
    Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
    Signed-off-by: Taehee Yoo
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

05 Dec, 2019

1 commit

  • [ Upstream commit fd567ac20cb0377ff466d3337e6e9ac5d0cb15e4 ]

    In commit 4f07b80c9733 ("tipc: check msg->req data len in
    tipc_nl_compat_bearer_disable") the same patch code was copied into
    routines: tipc_nl_compat_bearer_disable(),
    tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats().
    The two link routine occurrences should have been modified to check
    the maximum link name length and not bearer name length.

    Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable")
    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Rutherford
     

15 Nov, 2019

1 commit

  • The tipc prefix for log messages generated by tipc was
    removed in commit 07f6c4bc048a ("tipc: convert tipc reference
    table to use generic rhashtable").

    This is still a useful prefix so add it back.

    Signed-off-by: Matt Bennett
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Matt Bennett
     

29 Oct, 2019

1 commit


10 Oct, 2019

2 commits

  • sk->sk_backlog.len can be written by BH handlers, and read
    from process contexts in a lockless way.

    Note the write side should also use WRITE_ONCE() or a variant.
    We need some agreement about the best way to do this.

    syzbot reported :

    BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0

    write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
    sk_add_backlog include/net/sock.h:934 [inline]
    tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
    tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
    ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
    virtnet_receive drivers/net/virtio_net.c:1323 [inline]
    virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
    napi_poll net/core/dev.c:6352 [inline]
    net_rx_action+0x3ae/0xa50 net/core/dev.c:6418

    read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
    tcp_space include/net/tcp.h:1373 [inline]
    tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
    tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
    tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
    tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
    sk_backlog_rcv include/net/sock.h:945 [inline]
    __release_sock+0x135/0x1e0 net/core/sock.c:2427
    release_sock+0x61/0x160 net/core/sock.c:2943
    tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
    inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
    sock_recvmsg_nosec net/socket.c:871 [inline]
    sock_recvmsg net/socket.c:889 [inline]
    sock_recvmsg+0x92/0xb0 net/socket.c:885
    sock_read_iter+0x15f/0x1e0 net/socket.c:967
    call_read_iter include/linux/fs.h:1864 [inline]
    new_sync_read+0x389/0x4f0 fs/read_write.c:414
    __vfs_read+0xb1/0xc0 fs/read_write.c:427
    vfs_read fs/read_write.c:461 [inline]
    vfs_read+0x143/0x2c0 fs/read_write.c:446

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • sk_add_backlog() callers usually read sk->sk_rcvbuf without
    owning the socket lock. This means sk_rcvbuf value can
    be changed by other cpus, and KCSAN complains.

    Add READ_ONCE() annotations to document the lockless nature
    of these reads.

    Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
    but this will be done in separate patches to ease stable
    backports (if we decide this is relevant for stable trees).

    BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg

    write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
    __sk_add_backlog include/net/sock.h:902 [inline]
    sk_add_backlog include/net/sock.h:933 [inline]
    tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
    tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
    ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
    virtnet_receive drivers/net/virtio_net.c:1323 [inline]
    virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
    napi_poll net/core/dev.c:6352 [inline]
    net_rx_action+0x3ae/0xa50 net/core/dev.c:6418

    read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
    tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
    inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
    sock_recvmsg_nosec net/socket.c:871 [inline]
    sock_recvmsg net/socket.c:889 [inline]
    sock_recvmsg+0x92/0xb0 net/socket.c:885
    sock_read_iter+0x15f/0x1e0 net/socket.c:967
    call_read_iter include/linux/fs.h:1864 [inline]
    new_sync_read+0x389/0x4f0 fs/read_write.c:414
    __vfs_read+0xb1/0xc0 fs/read_write.c:427
    vfs_read fs/read_write.c:461 [inline]
    vfs_read+0x143/0x2c0 fs/read_write.c:446
    ksys_read+0xd5/0x1b0 fs/read_write.c:587
    __do_sys_read fs/read_write.c:597 [inline]
    __se_sys_read fs/read_write.c:595 [inline]
    __x64_sys_read+0x4c/0x60 fs/read_write.c:595
    do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     

02 Oct, 2019

1 commit

  • We have identified a problem with the "oversubscription" policy in the
    link transmission code.

    When small messages are transmitted, and the sending link has reached
    the transmit window limit, those messages will be bundled and put into
    the link backlog queue. However, bundles of data messages are counted
    at the 'CRITICAL' level, so that the counter for that level, instead of
    the counter for the real, bundled message's level is the one being
    increased.
    Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
    to be tested against the unchanged counter for their own level, while
    contributing to an unrestrained increase at the CRITICAL backlog level.

    This leaves a gap in congestion control algorithm for small messages
    that can result in starvation for other users or a "real" CRITICAL
    user. Even that eventually can lead to buffer exhaustion & link reset.

    We fix this by keeping a 'target_bskb' buffer pointer at each levels,
    then when bundling, we only bundle messages at the same importance
    level only. This way, we know exactly how many slots a certain level
    have occupied in the queue, so can manage level congestion accurately.

    By bundling messages at the same level, we even have more benefits. Let
    consider this:
    - One socket sends 64-byte messages at the 'CRITICAL' level;
    - Another sends 4096-byte messages at the 'LOW' level;

    When a 64-byte message comes and is bundled the first time, we put the
    overhead of message bundle to it (+ 40-byte header, data copy, etc.)
    for later use, but the next message can be a 4096-byte one that cannot
    be bundled to the previous one. This means the last bundle carries only
    one payload message which is totally inefficient, as for the receiver
    also! Later on, another 64-byte message comes, now we make a new bundle
    and the same story repeats...

    With the new bundling algorithm, this will not happen, the 64-byte
    messages will be bundled together even when the 4096-byte message(s)
    comes in between. However, if the 4096-byte messages are sent at the
    same level i.e. 'CRITICAL', the bundling algorithm will again cause the
    same overhead.

    Also, the same will happen even with only one socket sending small
    messages at a rate close to the link transmit's one, so that, when one
    message is bundled, it's transmitted shortly. Then, another message
    comes, a new bundle is created and so on...

    We will solve this issue radically by another patch.

    Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
    Reported-by: Hoang Le
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     

15 Sep, 2019

1 commit


05 Sep, 2019

1 commit

  • Unlike kfree(p), kfree_rcu(p, rcu) won't do NULL pointer check. When
    tipc_nametbl_remove_publ returns NULL, the panic below happens:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
    RIP: 0010:__call_rcu+0x1d/0x290
    Call Trace:

    tipc_publ_notify+0xa9/0x170 [tipc]
    tipc_node_write_unlock+0x8d/0x100 [tipc]
    tipc_node_link_down+0xae/0x1d0 [tipc]
    tipc_node_check_dest+0x3ea/0x8f0 [tipc]
    ? tipc_disc_rcv+0x2c7/0x430 [tipc]
    tipc_disc_rcv+0x2c7/0x430 [tipc]
    ? tipc_rcv+0x6bb/0xf20 [tipc]
    tipc_rcv+0x6bb/0xf20 [tipc]
    ? ip_route_input_slow+0x9cf/0xb10
    tipc_udp_recv+0x195/0x1e0 [tipc]
    ? tipc_udp_is_known_peer+0x80/0x80 [tipc]
    udp_queue_rcv_skb+0x180/0x460
    udp_unicast_rcv_skb.isra.56+0x75/0x90
    __udp4_lib_rcv+0x4ce/0xb90
    ip_local_deliver_finish+0x11c/0x210
    ip_local_deliver+0x6b/0xe0
    ? ip_rcv_finish+0xa9/0x410
    ip_rcv+0x273/0x362

    Fixes: 97ede29e80ee ("tipc: convert name table read-write lock to RCU")
    Reported-by: Li Shuang
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

20 Aug, 2019

1 commit


19 Aug, 2019

1 commit

  • The policy for handling the skb list locks on the send and receive paths
    is simple.

    - On the send path we never need to grab the lock on the 'xmitq' list
    when the destination is an exernal node.

    - On the receive path we always need to grab the lock on the 'inputq'
    list, irrespective of source node.

    However, when transmitting node local messages those will eventually
    end up on the receive path of a local socket, meaning that the argument
    'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the
    function tipc_sk_rcv(). This has been handled by always initializing
    the spinlock of the 'xmitq' list at message creation, just in case it
    may end up on the receive path later, and despite knowing that the lock
    in most cases never will be used.

    This approach is inaccurate and confusing, and has also concealed the
    fact that the stated 'no lock grabbing' policy for the send path is
    violated in some cases.

    We now clean up this by never initializing the lock at message creation,
    instead doing this at the moment we find that the message actually will
    enter the receive path. At the same time we fix the four locations
    where we incorrectly access the spinlock on the send/error path.

    This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
    is initialised") which has now become redundant.

    CC: Eric Dumazet
    Reported-by: Chris Packham
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Reviewed-by: Xin Long
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Aug, 2019

1 commit

  • This commit eliminates the use of the link 'stale_limit' & 'prev_from'
    (besides the already removed - 'stale_cnt') variables in the detection
    of repeated retransmit failures as there is no proper way to initialize
    them to avoid a false detection, i.e. it is not really a retransmission
    failure but due to a garbage values in the variables.

    Instead, a jiffies variable will be added to individual skbs (like the
    way we restrict the skb retransmissions) in order to mark the first skb
    retransmit time. Later on, at the next retransmissions, the timestamp
    will be checked to see if the skb in the link transmq is "too stale",
    that is, the link tolerance time has passed, so that a link reset will
    be ordered. Note, just checking on the first skb in the queue is fine
    enough since it must be the oldest one.
    A counter is also added to keep track the actual skb retransmissions'
    number for later checking when the failure happens.

    The downside of this approach is that the skb->cb[] buffer is about to
    be exhausted, however it is always able to allocate another memory area
    and keep a reference to it when needed.

    Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
    Reported-by: Hoang Le
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     

12 Aug, 2019

1 commit

  • We set the field 'addr_trial_end' to 'jiffies', instead of the current
    value 0, at the moment the node address is initialized. This guarantees
    we don't inadvertently enter an address trial period when the node
    address is explicitly set by the user.

    Signed-off-by: Chris Packham
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Chris Packham
     

09 Aug, 2019

1 commit

  • Since node internal messages are passed directly to the socket, it is not
    possible to observe those messages via tcpdump or wireshark.

    We now remedy this by making it possible to clone such messages and send
    the clones to the loopback interface. The clones are dropped at reception
    and have no functional role except making the traffic visible.

    The feature is enabled if network taps are active for the loopback device.
    pcap filtering restrictions require the messages to be presented to the
    receiving side of the loopback device.

    v3 - Function dev_nit_active used to check for network taps.
    - Procedure netif_rx_ni used to send cloned messages to loopback device.

    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    John Rutherford
     

07 Aug, 2019

1 commit


02 Aug, 2019

2 commits

  • In commit 365ad353c256 ("tipc: reduce risk of user starvation during
    link congestion") we allowed senders to add exactly one list of extra
    buffers to the link backlog queues during link congestion (aka
    "oversubscription"). However, the criteria for when to stop adding
    wakeup messages to the input queue when the overload abates is
    inaccurate, and may cause starvation problems during very high load.

    Currently, we stop adding wakeup messages after 10 total failed attempts
    where we find that there is no space left in the backlog queue for a
    certain importance level. The counter for this is accumulated across all
    levels, which may lead the algorithm to leave the loop prematurely,
    although there may still be plenty of space available at some levels.
    The result is sometimes that messages near the wakeup queue tail are not
    added to the input queue as they should be.

    We now introduce a more exact algorithm, where we keep adding wakeup
    messages to a level as long as the backlog queue has free slots for
    the corresponding level, and stop at the moment there are no more such
    slots or when there are no more wakeup messages to dequeue.

    Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
    Reported-by: Tung Nguyen
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
    broke older tipc tools that use compat interface (e.g. tipc-config from
    tipcutils package):

    % tipc-config -p
    operation not supported

    The commit started to reject TIPC netlink compat messages that do not
    have attributes. It is too restrictive because some of such messages are
    valid (they don't need any arguments):

    % grep 'tx none' include/uapi/linux/tipc_config.h
    #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */
    #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */
    #define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */
    #define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */
    #define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */
    #define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */
    #define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */
    #define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */

    This patch relaxes the original fix and rejects messages without
    arguments only if such arguments are expected by a command (reg_type is
    non zero).

    Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
    Cc: stable@vger.kernel.org
    Signed-off-by: Taras Kondratiuk
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Taras Kondratiuk
     

31 Jul, 2019

1 commit

  • Our test suite somtimes provokes the following crash:

    Description of problem:
    [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
    [ 1092.605072] PGD 0 P4D 0
    [ 1092.607620] Oops: 0000 [#1] SMP PTI
    [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1
    [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018
    [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
    [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
    [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
    [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045
    [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000
    [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a
    [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000
    [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a
    [ 1092.692614] FS: 0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000
    [ 1092.700702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0
    [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 1092.727843] PKRU: 55555554
    [ 1092.730556] Call Trace:
    [ 1092.733010]
    [ 1092.735034] tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
    [ 1092.739828] ? __kmalloc_node_track_caller+0x1cb/0x290
    [ 1092.744974] ? dev_hard_start_xmit+0xa5/0x210
    [ 1092.749332] tipc_sk_rcv+0x389/0x640 [tipc]
    [ 1092.753519] tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
    [ 1092.758224] tipc_rcv+0x57a/0xf20 [tipc]
    [ 1092.762154] ? ktime_get_real_ts64+0x40/0xe0
    [ 1092.766432] ? tpacket_rcv+0x50/0x9f0
    [ 1092.770098] tipc_l2_rcv_msg+0x4a/0x70 [tipc]
    [ 1092.774452] __netif_receive_skb_core+0xb62/0xbd0
    [ 1092.779164] ? enqueue_entity+0xf6/0x630
    [ 1092.783084] ? kmem_cache_alloc+0x158/0x1c0
    [ 1092.787272] ? __build_skb+0x25/0xd0
    [ 1092.790849] netif_receive_skb_internal+0x42/0xf0
    [ 1092.795557] napi_gro_receive+0xba/0xe0
    [ 1092.799417] mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
    [ 1092.804564] mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
    [ 1092.809536] mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
    [ 1092.814415] ? __wake_up_common_lock+0x89/0xc0
    [ 1092.818861] net_rx_action+0x149/0x3b0
    [ 1092.822616] __do_softirq+0xe3/0x30a
    [ 1092.826193] irq_exit+0x100/0x110
    [ 1092.829512] do_IRQ+0x85/0xd0
    [ 1092.832483] common_interrupt+0xf/0xf
    [ 1092.836147]
    [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
    [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
    [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
    [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f
    [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000
    [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940
    [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78
    [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000
    [ 1092.905196] ? cpuidle_enter_state+0x92/0x2a0
    [ 1092.909555] do_idle+0x236/0x280
    [ 1092.912785] cpu_startup_entry+0x6f/0x80
    [ 1092.916715] start_secondary+0x1a7/0x200
    [ 1092.920642] secondary_startup_64+0xb7/0xc0
    [...]

    The reason is that the skb list tipc_socket::mc_method.deferredq only
    is initialized for connectionless sockets, while nothing stops arriving
    multicast messages from being filtered by connection oriented sockets,
    with subsequent access to the said list.

    We fix this by initializing the list unconditionally at socket creation.
    This eliminates the crash, while the message still is dropped further
    down in tipc_sk_filter_rcv() as it should be.

    Reported-by: Li Shuang
    Signed-off-by: Jon Maloy
    Reviewed-by: Xin Long
    Signed-off-by: David S. Miller

    Jon Maloy
     

26 Jul, 2019

2 commits

  • In conjunction with changing the interfaces' MTU (e.g. especially in
    the case of a bonding) where the TIPC links are brought up and down
    in a short time, a couple of issues were detected with the current link
    changeover mechanism:

    1) When one link is up but immediately forced down again, the failover
    procedure will be carried out in order to failover all the messages in
    the link's transmq queue onto the other working link. The link and node
    state is also set to FAILINGOVER as part of the process. The message
    will be transmited in form of a FAILOVER_MSG, so its size is plus of 40
    bytes (= the message header size). There is no problem if the original
    message size is not larger than the link's MTU - 40, and indeed this is
    the max size of a normal payload messages. However, in the situation
    above, because the link has just been up, the messages in the link's
    transmq are almost SYNCH_MSGs which had been generated by the link
    synching procedure, then their size might reach the max value already!
    When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size
    will exceed the link's MTU. As a result, the messages are dropped
    silently and the failover procedure will never end up, the link will
    not be able to exit the FAILINGOVER state, so cannot be re-established.

    2) The same scenario above can happen more easily in case the MTU of
    the links is set differently or when changing. In that case, as long as
    a large message in the failure link's transmq queue was built and
    fragmented with its link's MTU > the other link's one, the issue will
    happen (there is no need of a link synching in advance).

    3) The link synching procedure also faces with the same issue but since
    the link synching is only started upon receipt of a SYNCH_MSG, dropping
    the message will not result in a state deadlock, but it is not expected
    as design.

    The 1) & 3) issues are resolved by the last commit that only a dummy
    SYNCH_MSG (i.e. without data) is generated at the link synching, so the
    size of a FAILOVER_MSG if any then will never exceed the link's MTU.

    For the 2) issue, the only solution is trying to fragment the messages
    in the failure link's transmq queue according to the working link's MTU
    so they can be failovered then. A new function is made to accomplish
    this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the
    original message size is too large, it will be fragmented & reassembled
    at the receiving side.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • This commit along with the next one are to resolve the issues with the
    link changeover mechanism. See that commit for details.

    Basically, for the link synching, from now on, we will send only one
    single ("dummy") SYNCH message to peer. The SYNCH message does not
    contain any data, just a header conveying the synch point to the peer.

    A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for
    backward compatible!

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Suggested-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     

22 Jul, 2019

1 commit


20 Jul, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Fix AF_XDP cq entry leak, from Ilya Maximets.

    2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.

    3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.

    4) Fix handling of neigh timers wrt. entries added by userspace, from
    Lorenzo Bianconi.

    5) Various cases of missing of_node_put(), from Nishka Dasgupta.

    6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.

    7) Various RDS layer fixes, from Gerd Rausch.

    8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
    Cong Wang.

    9) Fix FIB source validation checks over loopback, also from Cong Wang.

    10) Use promisc for unsupported number of filters, from Justin Chen.

    11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
    tcp: fix tcp_set_congestion_control() use from bpf hook
    ag71xx: fix return value check in ag71xx_probe()
    ag71xx: fix error return code in ag71xx_probe()
    usb: qmi_wwan: add D-Link DWM-222 A2 device ID
    bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
    net: dsa: sja1105: Fix missing unlock on error in sk_buff()
    gve: replace kfree with kvfree
    selftests/bpf: fix test_xdp_noinline on s390
    selftests/bpf: fix "valid read map access into a read-only array 1" on s390
    net/mlx5: Replace kfree with kvfree
    MAINTAINERS: update netsec driver
    ipv6: Unlink sibling route in case of failure
    liquidio: Replace vmalloc + memset with vzalloc
    udp: Fix typo in net/ipv4/udp.c
    net: bcmgenet: use promisc for unsupported filters
    ipv6: rt6_check should return NULL if 'from' is NULL
    tipc: initialize 'validated' field of received packets
    selftests: add a test case for rp_filter
    fib: relax source validation check for loopback packets
    mlxsw: spectrum: Do not process learned records with a dummy FID
    ...

    Linus Torvalds