06 Mar, 2019

1 commit

  • Fix regression bug introduced in
    commit 365ad353c256 ("tipc: reduce risk of user starvation during link
    congestion")

    Only signal -EDESTADDRREQ for RDM/DGRAM if we don't have a cached
    sockaddr.

    Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     

03 Mar, 2019

1 commit


27 Feb, 2019

1 commit

  • When sending multicast messages via blocking socket,
    if sending link is congested (tsk->cong_link_cnt is set to 1),
    the sending thread will be put into sleeping state. However,
    tipc_sk_filter_rcv() is called under socket spin lock but
    tipc_wait_for_cond() is not. So, there is no guarantee that
    the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in
    CPU-1 will be perceived by CPU-0. If that is the case, the sending
    thread in CPU-0 after being waken up, will continue to see
    tsk->cong_link_cnt as 1 and put the sending thread into sleeping
    state again. The sending thread will sleep forever.

    CPU-0 | CPU-1
    tipc_wait_for_cond() |
    { |
    // condition_ = !tsk->cong_link_cnt |
    while ((rc_ = !(condition_))) { |
    ... |
    release_sock(sk_); |
    wait_woken(); |
    | if (!sock_owned_by_user(sk))
    | tipc_sk_filter_rcv()
    | {
    | ...
    | tipc_sk_proto_rcv()
    | {
    | ...
    | tsk->cong_link_cnt--;
    | ...
    | sk->sk_write_space(sk);
    | ...
    | }
    | ...
    | }
    sched_annotate_sleep(); |
    lock_sock(sk_); |
    remove_wait_queue(); |
    } |
    } |

    This commit fixes it by adding memory barrier to tipc_sk_proto_rcv()
    and tipc_wait_for_cond().

    Acked-by: Jon Maloy
    Signed-off-by: Tung Nguyen
    Signed-off-by: David S. Miller

    Tung Nguyen
     

25 Feb, 2019

1 commit

  • Three conflicts, one of which, for marvell10g.c is non-trivial and
    requires some follow-up from Heiner or someone else.

    The issue is that Heiner converted the marvell10g driver over to
    use the generic c45 code as much as possible.

    However, in 'net' a bug fix appeared which makes sure that a new
    local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
    is cleared.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Feb, 2019

2 commits

  • This commit replaces schedule_timeout() with wait_woken()
    in function tipc_wait_for_rcvmsg(). wait_woken() uses
    memory barriers in its implementation to avoid potential
    race condition when putting a process into sleeping state
    and then waking it up.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tung Nguyen
    Signed-off-by: David S. Miller

    Tung Nguyen
     
  • Commit 844cf763fba6 ("tipc: make macro tipc_wait_for_cond() smp safe")
    replaced finish_wait() with remove_wait_queue() but still used
    prepare_to_wait(). This causes unnecessary conditional
    checking before adding to wait queue in prepare_to_wait().

    This commit replaces prepare_to_wait() with add_wait_queue()
    as the pair function with remove_wait_queue().

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tung Nguyen
    Signed-off-by: David S. Miller

    Tung Nguyen
     

16 Feb, 2019

1 commit

  • The netfilter conflicts were rather simple overlapping
    changes.

    However, the cls_tcindex.c stuff was a bit more complex.

    On the 'net' side, Cong is fixing several races and memory
    leaks. Whilst on the 'net-next' side we have Vlad adding
    the rtnl-ness support.

    What I've decided to do, in order to resolve this, is revert the
    conversion over to using a workqueue that Cong did, bringing us back
    to pure RCU. I did it this way because I believe that either Cong's
    races don't apply with have Vlad did things, or Cong will have to
    implement the race fix slightly differently.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Feb, 2019

2 commits

  • When a link endpoint is re-created (e.g. after a node reboot or
    interface reset), the link session number is varied by random, the peer
    endpoint will be synced with this new session number before the link is
    re-established.

    However, there is a shortcoming in this mechanism that can lead to the
    link never re-established or faced with a failure then. It happens when
    the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
    well as the 'in_session' flag have been set, but suddenly this link
    endpoint leaves. When it comes back with a random session number, there
    are two situations possible:

    1/ If the random session number is larger than (or equal to) the
    previous one, the peer endpoint will be updated with this new session
    upon receipt of a RESET_MSG from this endpoint, and the link can be re-
    established as normal. Otherwise, all the RESET_MSGs from this endpoint
    will be rejected by the peer. In turn, when this link endpoint receives
    one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
    to send STATE_MSGs, but again these messages will be dropped by the
    peer due to wrong session.
    The peer link endpoint can still become ESTABLISHED after receiving a
    traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
    NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
    will be forced down sooner or later!

    Even in case the random session number is larger than the previous one,
    it can be that the ACTIVATE_MSG from the peer arrives first, and this
    link endpoint moves quickly to ESTABLISHED without sending out any
    RESET_MSG yet. Consequently, the peer link will not be updated with the
    new session number, and the same link failure scenario as above will
    happen.

    2/ Another situation can be that, the peer link endpoint was reset due
    to any reasons in the meantime, its link state was set to RESET from
    ESTABLISHING but still in session, i.e. the 'in_session' flag is not
    reset...
    Now, if the random session number from this endpoint is less than the
    previous one, all the RESET_MSGs from this endpoint will be rejected by
    the peer. In the other direction, when this link endpoint receives a
    RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
    ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
    As a result, the link cannot be re-established but gets stuck with this
    link endpoint in state ESTABLISHING and the peer in RESET!

    Solution:

    ===========

    This link endpoint should not go directly to ESTABLISHED when getting
    ACTIVATE_MSG from the peer which may belong to the old session if the
    link was re-created. To ensure the session to be correct before the
    link is re-established, the peer endpoint in ESTABLISHING state will
    send back the last session number in ACTIVATE_MSG for a verification at
    this endpoint. Then, if needed, a new and more appropriate session
    number will be regenerated to force a re-synch first.

    In addition, when a link in ESTABLISHING state is reset, its state will
    move to RESET according to the link FSM, along with resetting the
    'in_session' flag (and the other data) as a normal link reset, it will
    also be deleted if requested.

    The solution is backward compatible.

    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • When we free skb at tipc_data_input, we return a 'false' boolean.
    Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,

    1303 int tipc_link_rcv:
    ...
    1354 if (!tipc_data_input(l, skb, l->inputq))
    1355 rc |= tipc_link_input(l, skb, l->inputq);

    Fix it by simple changing to a 'true' boolean when skb is being free-ed.
    Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
    condition.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     

25 Jan, 2019

1 commit


24 Jan, 2019

1 commit

  • In preparation to enabling -Wimplicit-fallthrough, mark switch cases
    where we are expecting to fall through.

    This patch fixes the following warnings:

    net/tipc/link.c:1125:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    net/tipc/socket.c:736:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    net/tipc/socket.c:2418:7: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Warning level 3 was used: -Wimplicit-fallthrough=3

    This patch is part of the ongoing efforts to enabling
    -Wimplicit-fallthrough.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

22 Jan, 2019

1 commit


18 Jan, 2019

1 commit


16 Jan, 2019

6 commits

  • BUG: KMSAN: uninit-value in tipc_nl_compat_doit+0x404/0xa10 net/tipc/netlink_compat.c:335
    CPU: 0 PID: 4514 Comm: syz-executor485 Not tainted 4.16.0+ #87
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
    tipc_nl_compat_doit+0x404/0xa10 net/tipc/netlink_compat.c:335
    tipc_nl_compat_recv+0x164b/0x2700 net/tipc/netlink_compat.c:1153
    genl_family_rcv_msg net/netlink/genetlink.c:599 [inline]
    genl_rcv_msg+0x1686/0x1810 net/netlink/genetlink.c:624
    netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2447
    genl_rcv+0x63/0x80 net/netlink/genetlink.c:635
    netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
    netlink_unicast+0x166b/0x1740 net/netlink/af_netlink.c:1337
    netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmsg net/socket.c:2080 [inline]
    SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
    SyS_sendmsg+0x54/0x80 net/socket.c:2087
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x43fda9
    RSP: 002b:00007ffd0c184ba8 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fda9
    RDX: 0000000000000000 RSI: 0000000020023000 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000004002c8 R11: 0000000000000213 R12: 00000000004016d0
    R13: 0000000000401760 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
    kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
    slab_post_alloc_hook mm/slab.h:445 [inline]
    slab_alloc_node mm/slub.c:2737 [inline]
    __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:984 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline]
    netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmsg net/socket.c:2080 [inline]
    SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
    SyS_sendmsg+0x54/0x80 net/socket.c:2087
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    In tipc_nl_compat_recv(), when the len variable returned by
    nlmsg_attrlen() is 0, the message is still treated as a valid one,
    which is obviously unresonable. When len is zero, it means the
    message not only doesn't contain any valid TLV payload, but also
    TLV header is not included. Under this stituation, tlv_type field
    in TLV header is still accessed in tipc_nl_compat_dumpit() or
    tipc_nl_compat_doit(), but the field space is obviously illegal.
    Of course, it is not initialized.

    Reported-by: syzbot+bca0dc46634781f08b38@syzkaller.appspotmail.com
    Reported-by: syzbot+6bdb590321a7ae40c1a6@syzkaller.appspotmail.com
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • syzbot reported:

    BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
    BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x4a8/0xba0 net/tipc/netlink_compat.c:826
    CPU: 0 PID: 6290 Comm: syz-executor848 Not tainted 4.19.0-rc8+ #70
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x306/0x460 lib/dump_stack.c:113
    kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
    __msan_warning+0x7c/0xe0 mm/kmsan/kmsan_instr.c:500
    __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    __fswab32 include/uapi/linux/swab.h:59 [inline]
    tipc_nl_compat_name_table_dump+0x4a8/0xba0 net/tipc/netlink_compat.c:826
    __tipc_nl_compat_dumpit+0x59e/0xdb0 net/tipc/netlink_compat.c:205
    tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:270
    tipc_nl_compat_handle net/tipc/netlink_compat.c:1151 [inline]
    tipc_nl_compat_recv+0x1402/0x2760 net/tipc/netlink_compat.c:1210
    genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
    genl_rcv_msg+0x185c/0x1a20 net/netlink/genetlink.c:626
    netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2454
    genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
    netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
    netlink_unicast+0x166d/0x1720 net/netlink/af_netlink.c:1343
    netlink_sendmsg+0x1391/0x1420 net/netlink/af_netlink.c:1908
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe47/0x1200 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x307/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbe/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x440179
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffecec49318 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440179
    RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
    R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401a00
    R13: 0000000000401a90 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:255 [inline]
    kmsan_internal_poison_shadow+0xc8/0x1d0 mm/kmsan/kmsan.c:180
    kmsan_kmalloc+0xa4/0x120 mm/kmsan/kmsan_hooks.c:104
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:113
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2727 [inline]
    __kmalloc_node_track_caller+0xb43/0x1400 mm/slub.c:4360
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x422/0xe90 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:996 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
    netlink_sendmsg+0xcaf/0x1420 net/netlink/af_netlink.c:1883
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe47/0x1200 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x307/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbe/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    We cannot take for granted the thing that the length of data contained
    in TLV is longer than the size of struct tipc_name_table_query in
    tipc_nl_compat_name_table_dump().

    Reported-by: syzbot+06e771a754829716a327@syzkaller.appspotmail.com
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • syzbot reports following splat:

    BUG: KMSAN: uninit-value in strlen+0x3b/0xa0 lib/string.c:486
    CPU: 1 PID: 9306 Comm: syz-executor172 Not tainted 4.20.0-rc7+ #2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
    __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
    strlen+0x3b/0xa0 lib/string.c:486
    nla_put_string include/net/netlink.h:1154 [inline]
    __tipc_nl_compat_link_set net/tipc/netlink_compat.c:708 [inline]
    tipc_nl_compat_link_set+0x929/0x1220 net/tipc/netlink_compat.c:744
    __tipc_nl_compat_doit net/tipc/netlink_compat.c:311 [inline]
    tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:344
    tipc_nl_compat_handle net/tipc/netlink_compat.c:1107 [inline]
    tipc_nl_compat_recv+0x14d7/0x2760 net/tipc/netlink_compat.c:1210
    genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
    genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
    netlink_rcv_skb+0x444/0x640 net/netlink/af_netlink.c:2477
    genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0xf40/0x1020 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xdb9/0x11b0 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    The uninitialised access happened in
    nla_put_string(skb, TIPC_NLA_LINK_NAME, lc->name)

    This is because lc->name string is not validated before it's used.

    Reported-by: syzbot+d78b8a29241a195aefb8@syzkaller.appspotmail.com
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • syzbot reported:

    BUG: KMSAN: uninit-value in strlen+0x3b/0xa0 lib/string.c:484
    CPU: 1 PID: 6371 Comm: syz-executor652 Not tainted 4.19.0-rc8+ #70
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x306/0x460 lib/dump_stack.c:113
    kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
    __msan_warning+0x7c/0xe0 mm/kmsan/kmsan_instr.c:500
    strlen+0x3b/0xa0 lib/string.c:484
    nla_put_string include/net/netlink.h:1011 [inline]
    tipc_nl_compat_bearer_enable+0x238/0x7b0 net/tipc/netlink_compat.c:389
    __tipc_nl_compat_doit net/tipc/netlink_compat.c:311 [inline]
    tipc_nl_compat_doit+0x39f/0xae0 net/tipc/netlink_compat.c:344
    tipc_nl_compat_recv+0x147c/0x2760 net/tipc/netlink_compat.c:1107
    genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
    genl_rcv_msg+0x185c/0x1a20 net/netlink/genetlink.c:626
    netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2454
    genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
    netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
    netlink_unicast+0x166d/0x1720 net/netlink/af_netlink.c:1343
    netlink_sendmsg+0x1391/0x1420 net/netlink/af_netlink.c:1908
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe47/0x1200 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x307/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbe/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x440179
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fffef7beee8 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440179
    RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
    R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401a00
    R13: 0000000000401a90 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:255 [inline]
    kmsan_internal_poison_shadow+0xc8/0x1d0 mm/kmsan/kmsan.c:180
    kmsan_kmalloc+0xa4/0x120 mm/kmsan/kmsan_hooks.c:104
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:113
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2727 [inline]
    __kmalloc_node_track_caller+0xb43/0x1400 mm/slub.c:4360
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x422/0xe90 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:996 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
    netlink_sendmsg+0xcaf/0x1420 net/netlink/af_netlink.c:1883
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe47/0x1200 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x307/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbe/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    The root cause is that we don't validate whether bear name is a valid
    string in tipc_nl_compat_bearer_enable().

    Meanwhile, we also fix the same issue in the following functions:
    tipc_nl_compat_bearer_disable()
    tipc_nl_compat_link_stat_dump()
    tipc_nl_compat_media_set()
    tipc_nl_compat_bearer_set()

    Reported-by: syzbot+b33d5cae0efd35dbfe77@syzkaller.appspotmail.com
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • syzbot reports following splat:

    BUG: KMSAN: uninit-value in strlen+0x3b/0xa0 lib/string.c:486
    CPU: 1 PID: 11057 Comm: syz-executor0 Not tainted 4.20.0-rc7+ #2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
    __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
    strlen+0x3b/0xa0 lib/string.c:486
    nla_put_string include/net/netlink.h:1154 [inline]
    tipc_nl_compat_link_reset_stats+0x1f0/0x360 net/tipc/netlink_compat.c:760
    __tipc_nl_compat_doit net/tipc/netlink_compat.c:311 [inline]
    tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:344
    tipc_nl_compat_handle net/tipc/netlink_compat.c:1107 [inline]
    tipc_nl_compat_recv+0x14d7/0x2760 net/tipc/netlink_compat.c:1210
    genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
    genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
    netlink_rcv_skb+0x444/0x640 net/netlink/af_netlink.c:2477
    genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0xf40/0x1020 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xdb9/0x11b0 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x457ec9
    Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f2557338c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
    RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f25573396d4
    R13: 00000000004cb478 R14: 00000000004d86c8 R15: 00000000ffffffff

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
    kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
    kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
    kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2759 [inline]
    __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
    __kmalloc_reserve net/core/skbuff.c:137 [inline]
    __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
    alloc_skb include/linux/skbuff.h:998 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
    netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xdb9/0x11b0 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    The uninitialised access happened in tipc_nl_compat_link_reset_stats:
    nla_put_string(skb, TIPC_NLA_LINK_NAME, name)

    This is because name string is not validated before it's used.

    Reported-by: syzbot+e01d94b5a4c266be6e4c@syzkaller.appspotmail.com
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • syzbot reported:

    BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
    CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: tipc_rcv tipc_conn_recv_work
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
    tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
    tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
    tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
    process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
    worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
    kthread+0x539/0x720 kernel/kthread.c:239
    ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412

    Local variable description: ----s.i@tipc_conn_recv_work
    Variable was created at:
    tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419
    process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145

    In tipc_conn_rcv_from_sock(), it always supposes the length of message
    received from sock_recvmsg() is not smaller than the size of struct
    tipc_subscr. However, this assumption is false. Especially when the
    length of received message is shorter than struct tipc_subscr size,
    we will end up touching uninitialized fields in tipc_conn_rcv_sub().

    Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com
    Reported-by: syzbot+75e6e042c5bbf691fc82@syzkaller.appspotmail.com
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

08 Jan, 2019

1 commit


28 Dec, 2018

2 commits


25 Dec, 2018

2 commits


21 Dec, 2018

1 commit


20 Dec, 2018

6 commits

  • When sending broadcast message on high load system, there are a lot of
    unnecessary packets restranmission. That issue was caused by missing in
    initial criteria for retransmission.

    To prevent this happen, just initialize this criteria for retransmission
    in next 10 milliseconds.

    Fixes: 31c4f4cc32f7 ("tipc: improve broadcast retransmission algorithm")
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     
  • The commit adds the new trace_event for TIPC bearer, L2 device event:

    trace_tipc_l2_device_event()

    Also, it puts the trace at the tipc_l2_device_event() function, then
    the device/bearer events and related info can be traced out during
    runtime when needed.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • The commit adds the new trace_events for TIPC node object:

    trace_tipc_node_create()
    trace_tipc_node_delete()
    trace_tipc_node_lost_contact()
    trace_tipc_node_timeout()
    trace_tipc_node_link_up()
    trace_tipc_node_link_down()
    trace_tipc_node_reset_links()
    trace_tipc_node_fsm_evt()
    trace_tipc_node_check_state()

    Also, enables the traces for the following cases:
    - When a node is created/deleted;
    - When a node contact is lost;
    - When a node timer is timed out;
    - When a node link is up/down;
    - When all node links are reset;
    - When node state is changed;
    - When a skb comes and node state needs to be checked/updated.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • The commit adds the new trace_events for TIPC socket object:

    trace_tipc_sk_create()
    trace_tipc_sk_poll()
    trace_tipc_sk_sendmsg()
    trace_tipc_sk_sendmcast()
    trace_tipc_sk_sendstream()
    trace_tipc_sk_filter_rcv()
    trace_tipc_sk_advance_rx()
    trace_tipc_sk_rej_msg()
    trace_tipc_sk_drop_msg()
    trace_tipc_sk_release()
    trace_tipc_sk_shutdown()
    trace_tipc_sk_overlimit1()
    trace_tipc_sk_overlimit2()

    Also, enables the traces for the following cases:
    - When user creates a TIPC socket;
    - When user calls poll() on TIPC socket;
    - When user sends a dgram/mcast/stream message.
    - When a message is put into the socket 'sk_receive_queue';
    - When a message is released from the socket 'sk_receive_queue';
    - When a message is rejected (e.g. due to no port, invalid, etc.);
    - When a message is dropped (e.g. due to wrong message type);
    - When socket is released;
    - When socket is shutdown;
    - When socket rcvq's allocation is overlimit (> 90%);
    - When socket rcvq + bklq's allocation is overlimit (> 90%);
    - When the 'TIPC_ERR_OVERLOAD/2' issue happens;

    Note:
    a) All the socket traces are designed to be able to trace on a specific
    socket by either using the 'event filtering' feature on a known socket
    'portid' value or the sysctl file:

    /proc/sys/net/tipc/sk_filter

    The file determines a 'tuple' for what socket should be traced:

    (portid, sock type, name type, name lower, name upper)

    where:
    + 'portid' is the socket portid generated at socket creating, can be
    found in the trace outputs or the 'tipc socket list' command printouts;
    + 'sock type' is the socket type (1 = SOCK_TREAM, ...);
    + 'name type', 'name lower' and 'name upper' are the service name being
    connected to or published by the socket.

    Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e.
    the traces happen for every sockets with no filter.

    b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event
    which happens when the socket receive queue (and backlog queue) is
    about to be overloaded, when the queue allocation is > 90%. Then, when
    the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2
    issue can be traced.

    The trace event is designed as an 'upper watermark' notification that
    the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or
    actions can be triggerred in the meanwhile to see what is going on with
    the socket queue.

    In addition, the 'trace_tipc_sk_dump()' is also placed at the
    'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped
    for post-analysis.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • The commit adds the new trace_events for TIPC link object:

    trace_tipc_link_timeout()
    trace_tipc_link_fsm()
    trace_tipc_link_reset()
    trace_tipc_link_too_silent()
    trace_tipc_link_retrans()
    trace_tipc_link_bc_ack()
    trace_tipc_link_conges()

    And the traces for PROTOCOL messages at building and receiving:

    trace_tipc_proto_build()
    trace_tipc_proto_rcv()

    Note:
    a) The 'tipc_link_too_silent' event will only happen when the
    'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
    event is enabled). The benefit for this kind of event is that we can
    get an early indication about TIPC link loss issue due to timeout, then
    can do some necessary actions for troubleshooting.

    For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
    event occurs:

    echo 'enable_event:tipc:tipc_proto_rcv' > \
    events/tipc/tipc_link_too_silent/trigger

    And disable it when TIPC link is reset:

    echo 'disable_event:tipc:tipc_proto_rcv' > \
    events/tipc/tipc_link_reset/trigger

    b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
    trace TIPC retransmission issues.

    In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
    'retransmission failure' case. Then, if the issue occurs, the link
    'transmq' along with the link data can be dumped for post-analysis.
    These dump events should be enabled by default since it will only take
    effect when the failure happens.

    The same approach is also applied for the faulty case that the
    validation of protocol message is failed.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • As for the sake of debugging/tracing, the commit enables tracepoints in
    TIPC along with some general trace_events as shown below. It also
    defines some 'tipc_*_dump()' functions that allow to dump TIPC object
    data whenever needed, that is, for general debug purposes, ie. not just
    for the trace_events.

    The following trace_events are now available:

    - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
    e.g. message type, user, droppable, skb truesize, cloned skb, etc.

    - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
    queues, e.g. TIPC link transmq, socket receive queue, etc.

    - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
    sk state, sk type, connection type, rmem_alloc, socket queues, etc.

    - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
    link state, silent_intv_cnt, gap, bc_gap, link queues, etc.

    - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
    node state, active links, capabilities, link entries, etc.

    How to use:
    Put the trace functions at any places where we want to dump TIPC data
    or events.

    Note:
    a) The dump functions will generate raw data only, that is, to offload
    the trace event's processing, it can require a tool or script to parse
    the data but this should be simple.

    b) The trace_tipc_*_dump() should be reserved for a failure cases only
    (e.g. the retransmission failure case) or where we do not expect to
    happen too often, then we can consider enabling these events by default
    since they will almost not take any effects under normal conditions,
    but once the rare condition or failure occurs, we get the dumped data
    fully for post-analysis.

    For other trace purposes, we can reuse these trace classes as template
    but different events.

    c) A trace_event is only effective when we enable it. To enable the
    TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
    directory in the 'debugfs' file system. Normally, they are located at:

    /sys/kernel/debug/tracing/events/tipc/

    For example:

    To enable the tipc_link_dump event:

    echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable

    To enable all the TIPC trace_events:

    echo 1 > /sys/kernel/debug/tracing/events/tipc/enable

    To collect the trace data:

    cat trace

    or

    cat trace_pipe > /trace.out &

    To disable all the TIPC trace_events:

    echo 0 > /sys/kernel/debug/tracing/events/tipc/enable

    To clear the trace buffer:

    echo > trace

    d) Like the other trace_events, the feature like 'filter' or 'trigger'
    is also usable for the tipc trace_events.
    For more details, have a look at:

    Documentation/trace/ftrace.txt

    MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     

19 Dec, 2018

2 commits

  • NAME_DISTRIBUTOR messages are transmitted through unicast link on TIPC
    2.0, by contrast, the messages are delivered through broadcast link on
    TIPC 1.7. But at present, NAME_DISTRIBUTOR messages received by
    broadcast link cannot be handled in tipc_rcv() until an unicast message
    arrives, which may lead to a significant delay to update name table.

    To avoid this delay, we will also deal with broadcast NAME_DISTRIBUTOR
    message on broadcast receive path.

    Signed-off-by: Zhenbo Gao
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Zhenbo Gao
     
  • Similar to commit 143ece654f9f ("tipc: check tsk->group in tipc_wait_for_cond()")
    we have to reload grp->dests too after we re-take the sock lock.
    This means we need to move the dsts check after tipc_wait_for_cond()
    too.

    Fixes: 75da2163dbb6 ("tipc: introduce communication groups")
    Reported-and-tested-by: syzbot+99f20222fc5018d2b97a@syzkaller.appspotmail.com
    Cc: Ying Xue
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Cong Wang
     

15 Dec, 2018

4 commits

  • tipc_wait_for_cond() drops socket lock before going to sleep,
    but tsk->group could be freed right after that release_sock().
    So we have to re-check and reload tsk->group after it wakes up.

    After this patch, tipc_wait_for_cond() returns -ERESTARTSYS when
    tsk->group is NULL, instead of continuing with the assumption of
    a non-NULL tsk->group.

    (It looks like 'dsts' should be re-checked and reloaded too, but
    it is a different bug.)

    Similar for tipc_send_group_unicast() and tipc_send_group_anycast().

    Reported-by: syzbot+10a9db47c3a0e13eb31c@syzkaller.appspotmail.com
    Fixes: b7d42635517f ("tipc: introduce flow control for group broadcast messages")
    Fixes: ee106d7f942d ("tipc: introduce group anycast messaging")
    Fixes: 27bd9ec027f3 ("tipc: introduce group unicast messaging")
    Cc: Ying Xue
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Cong Wang
     
  • When TIPC_NLA_UDP_REMOTE is an IPv6 mcast address but
    TIPC_NLA_UDP_LOCAL is an IPv4 address, a NULL-ptr deref is triggered
    as the UDP tunnel sock is initialized to IPv4 or IPv6 sock merely
    based on the protocol in local address.

    We should just error out when the remote address and local address
    have different protocols.

    Reported-by: syzbot+eb4da3a20fad2e52555d@syzkaller.appspotmail.com
    Cc: Ying Xue
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Cong Wang
     
  • tipc_udp_xmit() drops the packet on error, there is no
    need to drop it again.

    Fixes: ef20cd4dd163 ("tipc: introduce UDP replicast")
    Reported-and-tested-by: syzbot+eae585ba2cc2752d3704@syzkaller.appspotmail.com
    Cc: Ying Xue
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • lock_sock() must be used in process context to be race-free with
    other lock_sock() callers, for example, tipc_release(). Otherwise
    using the spinlock directly can't serialize a parallel tipc_release().

    As it is blocking, we have to hold the sock refcnt before
    rhashtable_walk_stop() and release it after rhashtable_walk_start().

    Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to use generic rhashtable")
    Reported-by: Dmitry Vyukov
    Cc: Ying Xue
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Dec, 2018

1 commit

  • When setting LINK tolerance, node timer interval will be calculated
    base on the LINK with lowest tolerance.

    But when calculated, the old node timer interval only updated if current
    setting value (tolerance/4) less than old ones regardless of number of
    links as well as links' lowest tolerance value.

    This caused to two cases missing if tolerance changed as following:
    Case 1:
    1.1/ There is one link (L1) available in the system
    1.2/ Set L1's tolerance from 1500ms => lower (i.e 500ms)
    1.3/ Then, fallback to default (1500ms) or higher (i.e 2000ms)

    Expected:
    node timer interval is 1500/4=375ms after 1.3

    Result:
    node timer interval will not being updated after changing tolerance at 1.3
    since its value 1500/4=375ms is not less than 500/4=125ms at 1.2.

    Case 2:
    2.1/ There are two links (L1, L2) available in the system
    2.2/ L1 and L2 tolerance value are 2000ms as initial
    2.3/ Set L2's tolerance from 2000ms => lower 1500ms
    2.4/ Disable link L2 (bring down its bearer)

    Expected:
    node timer interval is 2000ms/4=500ms after 2.4

    Result:
    node timer interval will not being updated after disabling L2 since
    its value 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3
    although L2 is already not available in the system.

    To fix this, we start the node interval calculation by initializing it to
    a value larger than any conceivable calculated value. This way, the link
    with the lowest tolerance will always determine the calculated value.

    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     

29 Nov, 2018

1 commit


28 Nov, 2018

1 commit

  • We see the following lockdep warning:

    [ 2284.078521] ======================================================
    [ 2284.078604] WARNING: possible circular locking dependency detected
    [ 2284.078604] 4.19.0+ #42 Tainted: G E
    [ 2284.078604] ------------------------------------------------------
    [ 2284.078604] rmmod/254 is trying to acquire lock:
    [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
    [ 2284.078604]
    [ 2284.078604] but task is already holding lock:
    [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
    [ 2284.078604]
    [ 2284.078604] which lock already depends on the new lock.
    [ 2284.078604]
    [ 2284.078604]
    [ 2284.078604] the existing dependency chain (in reverse order) is:
    [ 2284.078604]
    [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
    [ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc]
    [ 2284.078604] call_timer_fn+0xa1/0x280
    [ 2284.078604] run_timer_softirq+0x1f2/0x4d0
    [ 2284.078604] __do_softirq+0xfc/0x413
    [ 2284.078604] irq_exit+0xb5/0xc0
    [ 2284.078604] smp_apic_timer_interrupt+0xac/0x210
    [ 2284.078604] apic_timer_interrupt+0xf/0x20
    [ 2284.078604] default_idle+0x1c/0x140
    [ 2284.078604] do_idle+0x1bc/0x280
    [ 2284.078604] cpu_startup_entry+0x19/0x20
    [ 2284.078604] start_secondary+0x187/0x1c0
    [ 2284.078604] secondary_startup_64+0xa4/0xb0
    [ 2284.078604]
    [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
    [ 2284.078604] del_timer_sync+0x34/0xa0
    [ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc]
    [ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc]
    [ 2284.078604] tipc_net_stop+0x154/0x170 [tipc]
    [ 2284.078604] tipc_exit_net+0x16/0x30 [tipc]
    [ 2284.078604] ops_exit_list.isra.8+0x36/0x70
    [ 2284.078604] unregister_pernet_operations+0x87/0xd0
    [ 2284.078604] unregister_pernet_subsys+0x1d/0x30
    [ 2284.078604] tipc_exit+0x11/0x6f2 [tipc]
    [ 2284.078604] __x64_sys_delete_module+0x1df/0x240
    [ 2284.078604] do_syscall_64+0x66/0x460
    [ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 2284.078604]
    [ 2284.078604] other info that might help us debug this:
    [ 2284.078604]
    [ 2284.078604] Possible unsafe locking scenario:
    [ 2284.078604]
    [ 2284.078604] CPU0 CPU1
    [ 2284.078604] ---- ----
    [ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
    [ 2284.078604] lock((&n->timer)#2);
    [ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
    [ 2284.078604] lock((&n->timer)#2);
    [ 2284.078604]
    [ 2284.078604] *** DEADLOCK ***
    [ 2284.078604]
    [ 2284.078604] 3 locks held by rmmod/254:
    [ 2284.078604] #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
    [ 2284.078604] #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
    [ 2284.078604] #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
    [...}

    The reason is that the node timer handler sometimes needs to delete a
    node which has been disconnected for too long. To do this, it grabs
    the lock 'node_list_lock', which may at the same time be held by the
    generic node cleanup function, tipc_node_stop(), during module removal.
    Since the latter is calling del_timer_sync() inside the same lock, we
    have a potential deadlock.

    We fix this letting the timer cleanup function use spin_trylock()
    instead of just spin_lock(), and when it fails to grab the lock it
    just returns so that the timer handler can terminate its execution.
    This is safe to do, since tipc_node_stop() anyway is about to
    delete both the timer and the node instance.

    Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance")
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy