31 Mar, 2020

1 commit


30 Mar, 2020

2 commits

  • Under certain circumstances, depending on the order of addresses on the
    interfaces, it could be that sctp_v[46]_get_dst() would return a dst
    with a mismatched struct flowi.

    For example, if when walking through the bind addresses and the first
    one is not a match, it saves the dst as a fallback (added in
    410f03831c07), but not the flowi. Then if the next one is also not a
    match, the previous dst will be returned but with the flowi information
    for the 2nd address, which is wrong.

    The fix is to use a locally stored flowi that can be used for such
    attempts, and copy it to the parameter only in case it is a possible
    match, together with the corresponding dst entry.

    The patch updates IPv6 code mostly just to be in sync. Even though the issue
    is also present there, it fallback is not expected to work with IPv6.

    Fixes: 410f03831c07 ("sctp: add routing output fallback")
    Reported-by: Jin Meng
    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Xin Long
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • We should iterate over the datamsgs to move
    all chunks(skbs) to newsk.

    The following case cause the bug:
    for the trouble SKB, it was in outq->transmitted list

    sctp_outq_sack
    sctp_check_transmitted
    SKB was moved to outq->sacked list
    then throw away the sack queue
    SKB was deleted from outq->sacked
    (but it was held by datamsg at sctp_datamsg_to_asoc
    So, sctp_wfree was not called here)

    then migrate happened

    sctp_for_each_tx_datachunk(
    sctp_clear_owner_w);
    sctp_assoc_migrate();
    sctp_for_each_tx_datachunk(
    sctp_set_owner_w);
    SKB was not in the outq, and was not changed to newsk

    finally

    __sctp_outq_teardown
    sctp_chunk_put (for another skb)
    sctp_datamsg_put
    __kfree_skb(msg->frag_list)
    sctp_wfree (for SKB)
    SKB->sk was still oldsk (skb->sk != asoc->base.sk).

    Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
    Signed-off-by: Qiujun Huang
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Qiujun Huang
     

13 Mar, 2020

1 commit


09 Mar, 2020

1 commit

  • In commit 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and
    fallback to priority") croup classid reporting was fixed. But this works
    only for TCP sockets because for other socket types icsk parameter can
    be NULL and classid code path is skipped. This change moves classid
    handling to inet_diag_msg_attrs_fill() function.

    Also inet_diag_msg_attrs_size() helper was added and addends in
    nlmsg_new() were reordered to save order from inet_sk_diag_fill().

    Fixes: 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and fallback to priority")
    Signed-off-by: Dmitry Yakunin
    Reviewed-by: Konstantin Khlebnikov
    Signed-off-by: David S. Miller

    Dmitry Yakunin
     

01 Mar, 2020

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2020-02-28

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 41 non-merge commits during the last 7 day(s) which contain
    a total of 49 files changed, 1383 insertions(+), 499 deletions(-).

    The main changes are:

    1) BPF and Real-Time nicely co-exist.

    2) bpftool feature improvements.

    3) retrieve bpf_sk_storage via INET_DIAG.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Feb, 2020

2 commits

  • The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
    the "dump()" is re-started.

    In a latter patch, it will also need to parse the new
    INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
    patch takes this chance to store the parsed nlattr in cb->data
    during the "start" time of a dump.

    By doing this, the "bc" argument also becomes unnecessary
    and is removed. Also, the two copies of the INET_DIAG_REQ_BYTECODE
    parsing-audit logic between compat/current version can be
    consolidated to one.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20200225230415.1975555-1-kafai@fb.com

    Martin KaFai Lau
     
  • In a latter patch, there is a need to update "cb->min_dump_alloc"
    in inet_sk_diag_fill() as it learns the diffierent bpf_sk_storages
    stored in a sk while dumping all sk(s) (e.g. tcp_hashinfo).

    The inet_sk_diag_fill() currently does not take the "cb" as an argument.
    One of the reason is inet_sk_diag_fill() is used by both dump_one()
    and dump() (which belong to the "struct inet_diag_handler". The dump_one()
    interface does not pass the "cb" along.

    This patch is to make dump_one() pass a "cb". The "cb" is created in
    inet_diag_cmd_exact(). The "nlh" and "in_skb" are stored in "cb" as
    the dump() interface does. The total number of args in
    inet_sk_diag_fill() is also cut from 10 to 7 and
    that helps many callers to pass fewer args.

    In particular,
    "struct user_namespace *user_ns", "u32 pid", and "u32 seq"
    can be replaced by accessing "cb->nlh" and "cb->skb".

    A similar argument reduction is also made to
    inet_twsk_diag_fill() and inet_req_diag_fill().

    inet_csk_diag_dump() and inet_csk_diag_fill() are also removed.
    They are mostly equivalent to inet_sk_diag_fill(). Their repeated
    usages are very limited. Thus, inet_sk_diag_fill() is directly used
    in those occasions.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20200225230409.1975173-1-kafai@fb.com

    Martin KaFai Lau
     

25 Feb, 2020

3 commits


18 Feb, 2020

1 commit

  • When T2 timer is to be stopped, the asoc should also be deleted,
    otherwise, there will be no chance to call sctp_association_free
    and the asoc could last in memory forever.

    However, in sctp_sf_shutdown_sent_abort(), after adding the cmd
    SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the
    format error from __sctp_sf_do_9_1_abort() and miss adding
    SCTP_CMD_ASSOC_FAILED where the asoc will be deleted.

    This patch is to fix it by moving the format error check out of
    __sctp_sf_do_9_1_abort(), and do it before adding the cmd
    SCTP_CMD_TIMER_STOP for T2 timer.

    Thanks Hangbin for reporting this issue by the fuzz testing.

    v1->v2:
    - improve the comment in the code as Marcelo's suggestion.

    Fixes: 96ca468b86b0 ("sctp: check invalid value of length parameter in error cause")
    Reported-by: Hangbin Liu
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

10 Jan, 2020

1 commit


07 Jan, 2020

1 commit

  • This patch is to fix a memleak caused by no place to free cmd->obj.chunk
    for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
    process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
    with an allocated chunk in cmd->obj.chunk.

    So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
    the cmd seq when any cmd returns error. While at it, also remove 'nomem'
    label.

    Reported-by: syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

01 Jan, 2020

1 commit


31 Dec, 2019

1 commit

  • sctp_outq_sack is the main function handles SACK, it is called very
    frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack"
    added below code to this function, sctp tracepoint is disabled most of time,
    but the loop of transport list will be always called even though the
    tracepoint is disabled, this is unnecessary.

    + /* SCTP path tracepoint for congestion control debugging. */
    + list_for_each_entry(transport, transport_list, transports) {
    + trace_sctp_probe_path(transport, asoc);
    + }

    This patch is to add tracepoint enabled check at outside of the loop of
    transport list, and avoid traversing the loop when trace is disabled,
    it is a small optimization.

    Signed-off-by: Kevin Kou
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Kevin Kou
     

28 Dec, 2019

1 commit

  • The function sctp_sf_eat_sack_6_2 now performs the Verification
    Tag validation, Chunk length validation, Bogu check, and also
    the detection of out-of-order SACK based on the RFC2960
    Section 6.2 at the beginning, and finally performs the further
    processing of SACK. The trace_sctp_probe now triggered before
    the above necessary validation and check.

    this patch is to do the trace_sctp_probe after the chunk sanity
    tests, but keep doing trace if the SACK received is out of order,
    for the out-of-order SACK is valuable to congestion control
    debugging.

    v1->v2:
    - keep doing SCTP trace if the SACK is out of order as Marcelo's
    suggestion.
    v2->v3:
    - regenerate the patch as v2 generated on top of v1, and add
    'net-next' tag to the new one as Marcelo's comments.

    Signed-off-by: Kevin Kou
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Kevin Kou
     

27 Dec, 2019

1 commit

  • The original patch bringed in the "SCTP ACK tracking trace event"
    feature was committed at Dec.20, 2017, it replaced jprobe usage
    with trace events, and bringed in two trace events, one is
    TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
    The original patch intended to trigger the trace_sctp_probe_path in
    TRACE_EVENT(sctp_probe) as below code,

    +TRACE_EVENT(sctp_probe,
    +
    + TP_PROTO(const struct sctp_endpoint *ep,
    + const struct sctp_association *asoc,
    + struct sctp_chunk *chunk),
    +
    + TP_ARGS(ep, asoc, chunk),
    +
    + TP_STRUCT__entry(
    + __field(__u64, asoc)
    + __field(__u32, mark)
    + __field(__u16, bind_port)
    + __field(__u16, peer_port)
    + __field(__u32, pathmtu)
    + __field(__u32, rwnd)
    + __field(__u16, unack_data)
    + ),
    +
    + TP_fast_assign(
    + struct sk_buff *skb = chunk->skb;
    +
    + __entry->asoc = (unsigned long)asoc;
    + __entry->mark = skb->mark;
    + __entry->bind_port = ep->base.bind_addr.port;
    + __entry->peer_port = asoc->peer.port;
    + __entry->pathmtu = asoc->pathmtu;
    + __entry->rwnd = asoc->peer.rwnd;
    + __entry->unack_data = asoc->unack_data;
    +
    + if (trace_sctp_probe_path_enabled()) {
    + struct sctp_transport *sp;
    +
    + list_for_each_entry(sp, &asoc->peer.transport_addr_list,
    + transports) {
    + trace_sctp_probe_path(sp, asoc);
    + }
    + }
    + ),

    But I found it did not work when I did testing, and trace_sctp_probe_path
    had no output, I finally found that there is trace buffer lock
    operation(trace_event_buffer_reserve) in include/trace/trace_events.h:

    static notrace void \
    trace_event_raw_event_##call(void *__data, proto) \
    { \
    struct trace_event_file *trace_file = __data; \
    struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
    struct trace_event_buffer fbuffer; \
    struct trace_event_raw_##call *entry; \
    int __data_size; \
    \
    if (trace_trigger_soft_disabled(trace_file)) \
    return; \
    \
    __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
    \
    entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
    sizeof(*entry) + __data_size); \
    \
    if (!entry) \
    return; \
    \
    tstruct \
    \
    { assign; } \
    \
    trace_event_buffer_commit(&fbuffer); \
    }

    The reason caused no output of trace_sctp_probe_path is that
    trace_sctp_probe_path written in TP_fast_assign part of
    TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
    trace_event_buffer_reserve() when compiler expands Macro,

    entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
    sizeof(*entry) + __data_size); \
    \
    if (!entry) \
    return; \
    \
    tstruct \
    \
    { assign; } \

    so trace_sctp_probe_path finally can not acquire trace_event_buffer
    and return no output, that is to say the nest of tracepoint entry function
    is not allowed. The function call flow is:

    trace_sctp_probe()
    -> trace_event_raw_event_sctp_probe()
    -> lock buffer
    -> trace_sctp_probe_path()
    -> trace_event_raw_event_sctp_probe_path() --nested
    -> buffer has been locked and return no output.

    This patch is to remove trace_sctp_probe_path from the TP_fast_assign
    part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
    and trigger sctp_probe_path_trace in sctp_outq_sack.

    After this patch, you can enable both events individually,
    # cd /sys/kernel/debug/tracing
    # echo 1 > events/sctp/sctp_probe/enable
    # echo 1 > events/sctp/sctp_probe_path/enable

    Or, you can enable all the events under sctp.

    # echo 1 > events/sctp/enable

    Signed-off-by: Kevin Kou
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Kevin Kou
     

25 Dec, 2019

2 commits

  • The MTU update code is supposed to be invoked in response to real
    networking events that update the PMTU. In IPv6 PMTU update function
    __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
    confirmed time.

    But for tunnel code, it will call pmtu before xmit, like:
    - tnl_update_pmtu()
    - skb_dst_update_pmtu()
    - ip6_rt_update_pmtu()
    - __ip6_rt_update_pmtu()
    - dst_confirm_neigh()

    If the tunnel remote dst mac address changed and we still do the neigh
    confirm, we will not be able to update neigh cache and ping6 remote
    will failed.

    So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
    should not be invoking dst_confirm_neigh() as we have no evidence
    of successful two-way communication at this point.

    On the other hand it is also important to keep the neigh reachability fresh
    for TCP flows, so we cannot remove this dst_confirm_neigh() call.

    To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
    to choose whether we should do neigh update or not. I will add the parameter
    in this patch and set all the callers to true to comply with the previous
    way, and fix the tunnel code one by one on later patches.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Suggested-by: David Miller
    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • The fix on 951c6db954a1 fixed the issued reported there but introduced
    another. When the allocation fails within sctp_stream_init() it is
    okay/necessary to free the genradix. But it is also called when adding
    new streams, from sctp_send_add_streams() and
    sctp_process_strreset_addstrm_in() and in those situations it cannot
    just free the genradix because by then it is a fully operational
    association.

    The fix here then is to only free the genradix in sctp_stream_init()
    and on those other call sites move on with what it already had and let
    the subsequent error handling to handle it.

    Tested with the reproducers from this report and the previous one,
    with lksctp-tools and sctp-tests.

    Reported-by: syzbot+9a1bc632e78a1a98488b@syzkaller.appspotmail.com
    Fixes: 951c6db954a1 ("sctp: fix memleak on err handling of stream initialization")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

23 Dec, 2019

1 commit


18 Dec, 2019

1 commit

  • syzbot reported a memory leak when an allocation fails within
    genradix_prealloc() for output streams. That's because
    genradix_prealloc() leaves initialized members initialized when the
    issue happens and SCTP stack will abort the current initialization but
    without cleaning up such members.

    The fix here is to always call genradix_free() when genradix_prealloc()
    fails, for output and also input streams, as it suffers from the same
    issue.

    Reported-by: syzbot+772d9e36c490b18d51d1@syzkaller.appspotmail.com
    Fixes: 2075e50caf5e ("sctp: convert to genradix")
    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Xin Long
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

10 Dec, 2019

2 commits

  • Commit 312434617cb1 ("sctp: cache netns in sctp_ep_common") set netns
    in asoc and ep base since they're created, and it will never change.
    It's a better way to get netns from asoc and ep base, comparing to
    calling sock_net().

    This patch is to replace them.

    v1->v2:
    - no change.

    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Syzbot found a crash:

    BUG: KMSAN: uninit-value in crc32_body lib/crc32.c:112 [inline]
    BUG: KMSAN: uninit-value in crc32_le_generic lib/crc32.c:179 [inline]
    BUG: KMSAN: uninit-value in __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
    Call Trace:
    crc32_body lib/crc32.c:112 [inline]
    crc32_le_generic lib/crc32.c:179 [inline]
    __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
    chksum_update+0xb2/0x110 crypto/crc32c_generic.c:90
    crypto_shash_update+0x4c5/0x530 crypto/shash.c:107
    crc32c+0x150/0x220 lib/libcrc32c.c:47
    sctp_csum_update+0x89/0xa0 include/net/sctp/checksum.h:36
    __skb_checksum+0x1297/0x12a0 net/core/skbuff.c:2640
    sctp_compute_cksum include/net/sctp/checksum.h:59 [inline]
    sctp_packet_pack net/sctp/output.c:528 [inline]
    sctp_packet_transmit+0x40fb/0x4250 net/sctp/output.c:597
    sctp_outq_flush_transports net/sctp/outqueue.c:1146 [inline]
    sctp_outq_flush+0x1823/0x5d80 net/sctp/outqueue.c:1194
    sctp_outq_uncork+0xd0/0xf0 net/sctp/outqueue.c:757
    sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1781 [inline]
    sctp_side_effects net/sctp/sm_sideeffect.c:1184 [inline]
    sctp_do_sm+0x8fe1/0x9720 net/sctp/sm_sideeffect.c:1155
    sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:185
    sctp_apply_peer_addr_params+0x212/0x1d40 net/sctp/socket.c:2433
    sctp_setsockopt_peer_addr_params net/sctp/socket.c:2686 [inline]
    sctp_setsockopt+0x189bb/0x19090 net/sctp/socket.c:4672

    The issue was caused by transport->ipaddr set with uninit addr param, which
    was passed by:

    sctp_transport_init net/sctp/transport.c:47 [inline]
    sctp_transport_new+0x248/0xa00 net/sctp/transport.c:100
    sctp_assoc_add_peer+0x5ba/0x2030 net/sctp/associola.c:611
    sctp_process_param net/sctp/sm_make_chunk.c:2524 [inline]

    where 'addr' is set by sctp_v4_from_addr_param(), and it doesn't initialize
    the padding of addr->v4.

    Later when calling sctp_make_heartbeat(), hbinfo.daddr(=transport->ipaddr)
    will become the part of skb, and the issue occurs.

    This patch is to fix it by initializing the padding of addr->v4 in
    sctp_v4_from_addr_param(), as well as other functions that do the similar
    thing, and these functions shouldn't trust that the caller initializes the
    memory, as Marcelo suggested.

    Reported-by: syzbot+6dcbfea81cd3d4dd0b02@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

05 Dec, 2019

1 commit

  • This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
    as some modules currently pass a net argument without a socket to
    ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change
    ipv6_stub_impl.ipv6_dst_lookup to take net argument").

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

27 Nov, 2019

2 commits

  • Note that the sysctl write accessor functions guarantee that:
    net->ipv4.sysctl_ip_prot_sock ipv4.ip_local_ports.range[0]
    invariant is maintained, and as such the max() in selinux hooks is actually spurious.

    ie. even though
    if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) {
    per logic is the same as
    if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) {
    it is actually functionally equivalent to:
    if (snum < low || snum > high) {
    which is equivalent to:
    if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) {
    even though the first clause is spurious.

    But we want to hold on to it in case we ever want to change what what
    inet_port_requires_bind_service() means (for example by changing
    it from a, by default, [0..1024) range to some sort of set).

    Test: builds, git 'grep inet_prot_sock' finds no other references
    Cc: Eric Dumazet
    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     
  • It already existed in part of the function, but move it
    to a higher level and use it consistently throughout.

    Safe since sk is never written to.

    Signed-off-by: Maciej Żenczykowski
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

26 Nov, 2019

1 commit


24 Nov, 2019

2 commits

  • This patch is to fix a data-race reported by syzbot:

    BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj

    write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
    sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
    sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
    sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
    inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
    __sys_accept4+0x224/0x430 net/socket.c:1754
    __do_sys_accept net/socket.c:1795 [inline]
    __se_sys_accept net/socket.c:1792 [inline]
    __x64_sys_accept+0x4e/0x60 net/socket.c:1792
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
    sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
    rht_key_get_hash include/linux/rhashtable.h:133 [inline]
    rht_key_hashfn include/linux/rhashtable.h:159 [inline]
    rht_head_hashfn include/linux/rhashtable.h:174 [inline]
    head_hashfn lib/rhashtable.c:41 [inline]
    rhashtable_rehash_one lib/rhashtable.c:245 [inline]
    rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
    rhashtable_rehash_table lib/rhashtable.c:316 [inline]
    rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
    is changing its value. However, what rhashtable wants is netns from asoc
    base.sk, and for an asoc, its netns won't change once set. So we can
    simply fix it by caching netns since created.

    Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable")
    Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Jakub Kicinski

    Xin Long
     
  • In the implementation of sctp_sf_do_5_2_4_dupcook() the allocated
    new_asoc is leaked if security_sctp_assoc_request() fails. Release it
    via sctp_association_free().

    Fixes: 2277c7cd75e3 ("sctp: Add LSM hooks")
    Signed-off-by: Navid Emamdoost
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Jakub Kicinski

    Navid Emamdoost
     

09 Nov, 2019

5 commits

  • Section 7.2 of rfc7829: "Peer Address Thresholds (SCTP_PEER_ADDR_THLDS)
    Socket Option" extends 'struct sctp_paddrthlds' with 'spt_pathcpthld'
    added to allow a user to change ps_retrans per sock/asoc/transport, as
    other 2 paddrthlds: pf_retrans, pathmaxrxt.

    Note: to not break the user's program, here to support pf_retrans dump
    and setting by adding a new sockopt SCTP_PEER_ADDR_THLDS_V2, and a new
    structure sctp_paddrthlds_v2 instead of extending sctp_paddrthlds.

    Also, when setting ps_retrans, the value is not allowed to be greater
    than pf_retrans.

    v1->v2:
    - use SCTP_PEER_ADDR_THLDS_V2 to set/get pf_retrans instead,
    as Marcelo and David Laight suggested.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • This is a new feature defined in section 5 of rfc7829: "Primary Path
    Switchover". By introducing a new tunable parameter:

    Primary.Switchover.Max.Retrans (PSMR)

    The primary path will be changed to another active path when the path
    error counter on the old primary path exceeds PSMR, so that "the SCTP
    sender is allowed to continue data transmission on a new working path
    even when the old primary destination address becomes active again".

    This patch is to add this tunable parameter, 'ps_retrans' per netns,
    sock, asoc and transport. It also allows a user to change ps_retrans
    per netns by sysctl, and ps_retrans per sock/asoc/transport will be
    initialized with it.

    The check will be done in sctp_do_8_2_transport_strike() when this
    feature is enabled.

    Note this feature is disabled by initializing 'ps_retrans' per netns
    as 0xffff by default, and its value can't be less than 'pf_retrans'
    when changing by sysctl.

    v3->v4:
    - add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of
    sysctl 'ps_retrans'.
    - add a new entry for ps_retrans on ip-sysctl.txt.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • This is a sockopt defined in section 7.3 of rfc7829: "Exposing
    the Potentially Failed Path State", by which users can change
    pf_expose per sock and asoc.

    The new sockopt SCTP_EXPOSE_POTENTIALLY_FAILED_STATE is also
    known as SCTP_EXPOSE_PF_STATE for short.

    v2->v3:
    - return -EINVAL if params.assoc_value > SCTP_PF_EXPOSE_MAX.
    - define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.
    v3->v4:
    - improve changelog.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • SCTP Quick failover draft section 5.1, point 5 has been removed
    from rfc7829. Instead, "the sender SHOULD (i) notify the Upper
    Layer Protocol (ULP) about this state transition", as said in
    section 3.2, point 8.

    So this patch is to add SCTP_ADDR_POTENTIALLY_FAILED, defined
    in section 7.1, "which is reported if the affected address
    becomes PF". Also remove transport cwnd's update when moving
    from PF back to ACTIVE , which is no longer in rfc7829 either.

    Note that ulp_notify will be set to false if asoc->expose is
    not 'enabled', according to last patch.

    v2->v3:
    - define SCTP_ADDR_PF SCTP_ADDR_POTENTIALLY_FAILED.
    v3->v4:
    - initialize spc_state with SCTP_ADDR_AVAILABLE, as Marcelo suggested.
    - check asoc->pf_expose in sctp_assoc_control_transport(), as Marcelo
    suggested.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • As said in rfc7829, section 3, point 12:

    The SCTP stack SHOULD expose the PF state of its destination
    addresses to the ULP as well as provide the means to notify the
    ULP of state transitions of its destination addresses from
    active to PF, and vice versa. However, it is recommended that
    an SCTP stack implementing SCTP-PF also allows for the ULP to be
    kept ignorant of the PF state of its destinations and the
    associated state transitions, thus allowing for retention of the
    simpler state transition model of [RFC4960] in the ULP.

    Not only does it allow to expose the PF state to ULP, but also
    allow to ignore sctp-pf to ULP.

    So this patch is to add pf_expose per netns, sock and asoc. And in
    sctp_assoc_control_transport(), ulp_notify will be set to false if
    asoc->expose is not 'enabled' in next patch.

    It also allows a user to change pf_expose per netns by sysctl, and
    pf_expose per sock and asoc will be initialized with it.

    Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
    to not allow a user to query the state of a sctp-pf peer address
    when pf_expose is 'disabled', as said in section 7.3.

    v1->v2:
    - Fix a build warning noticed by Nathan Chancellor.
    v2->v3:
    - set pf_expose to UNUSED by default to keep compatible with old
    applications.
    v3->v4:
    - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
    - change this patch to 1/5, and move sctp_assoc_control_transport
    change into 2/5, as Marcelo suggested.
    - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
    set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

07 Nov, 2019

3 commits


03 Nov, 2019

1 commit


02 Nov, 2019

1 commit

  • Historically linux tried to stick to RFC 791, 1122, 2003
    for IPv4 ID field generation.

    RFC 6864 made clear that no matter how hard we try,
    we can not ensure unicity of IP ID within maximum
    lifetime for all datagrams with a given source
    address/destination address/protocol tuple.

    Linux uses a per socket inet generator (inet_id), initialized
    at connection startup with a XOR of 'jiffies' and other
    fields that appear clear on the wire.

    Thiemo Nagel pointed that this strategy is a privacy
    concern as this provides 16 bits of entropy to fingerprint
    devices.

    Let's switch to a random starting point, this is just as
    good as far as RFC 6864 is concerned and does not leak
    anything critical.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: Thiemo Nagel
    Signed-off-by: David S. Miller

    Eric Dumazet