15 Nov, 2020

1 commit

  • A call trace was found in Hangbin's Codenomicon testing with debug kernel:

    [ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp]
    [ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0
    [ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0
    [ 2616.191533] Call Trace:
    [ 2616.194265]
    [ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0
    [ 2616.207336] slab_free_freelist_hook+0xeb/0x140
    [ 2616.220971] kfree+0xd6/0x2c0
    [ 2616.224293] rcu_do_batch+0x3bd/0xc70
    [ 2616.243096] rcu_core+0x8b9/0xd00
    [ 2616.256065] __do_softirq+0x23d/0xacd
    [ 2616.260166] irq_exit+0x236/0x2a0
    [ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620
    [ 2616.269138] apic_timer_interrupt+0xf/0x20
    [ 2616.273711]

    This is because it holds asoc when transport->proto_unreach_timer starts
    and puts asoc when the timer stops, and without holding transport the
    transport could be freed when the timer is still running.

    So fix it by holding/putting transport instead for proto_unreach_timer
    in transport, just like other timers in transport.

    v1->v2:
    - Also use sctp_transport_put() for the "out_unlock:" path in
    sctp_generate_proto_unreach_event(), as Marcelo noticed.

    Fixes: 50b5d6ad6382 ("sctp: Fix a race between ICMP protocol unreachable and connect()")
    Reported-by: Hangbin Liu
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com
    Signed-off-by: Jakub Kicinski

    Xin Long
     

01 Jan, 2020

1 commit


25 Dec, 2019

1 commit

  • The MTU update code is supposed to be invoked in response to real
    networking events that update the PMTU. In IPv6 PMTU update function
    __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
    confirmed time.

    But for tunnel code, it will call pmtu before xmit, like:
    - tnl_update_pmtu()
    - skb_dst_update_pmtu()
    - ip6_rt_update_pmtu()
    - __ip6_rt_update_pmtu()
    - dst_confirm_neigh()

    If the tunnel remote dst mac address changed and we still do the neigh
    confirm, we will not be able to update neigh cache and ping6 remote
    will failed.

    So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
    should not be invoking dst_confirm_neigh() as we have no evidence
    of successful two-way communication at this point.

    On the other hand it is also important to keep the neigh reachability fresh
    for TCP flows, so we cannot remove this dst_confirm_neigh() call.

    To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
    to choose whether we should do neigh update or not. I will add the parameter
    in this patch and set all the callers to true to comply with the previous
    way, and fix the tunnel code one by one on later patches.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Suggested-by: David Miller
    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

10 Dec, 2019

1 commit

  • Commit 312434617cb1 ("sctp: cache netns in sctp_ep_common") set netns
    in asoc and ep base since they're created, and it will never change.
    It's a better way to get netns from asoc and ep base, comparing to
    calling sock_net().

    This patch is to replace them.

    v1->v2:
    - no change.

    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

31 Jul, 2019

1 commit

  • 'addr' passed to sctp_transport_init is not always a whole size
    of union sctp_addr, like the path:

    sctp_sendmsg() ->
    sctp_sendmsg_new_asoc() ->
    sctp_assoc_add_peer() ->
    sctp_transport_new() -> sctp_transport_init()

    In the next patches, we will also pass the address length of data
    only to sctp_assoc_add_peer().

    So sctp_transport_init() should copy the only available data from
    addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
    cause slab-out-of-bounds.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this sctp implementation is free software you can redistribute it
    and or modify it under the terms of the gnu general public license
    as published by the free software foundation either version 2 or at
    your option any later version this sctp implementation is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with gnu cc see the file copying if not see
    http www gnu org licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 42 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

23 Feb, 2019

1 commit

  • hb_timer might not start at all for a particular transport because its
    start is conditional. In a result a node is not sending heartbeats.

    Function sctp_transport_reset_hb_timer has two roles:
    - initial start of hb_timer for a given transport,
    - update expire date of hb_timer for a given transport.
    The function is optimized to update timer's expire only if it is before
    a new calculated one but this comparison is invalid for a timer which
    has not yet started. Such a timer has expire == 0 and if a new expire
    value is bigger than (MAX_JIFFIES / 2 + 2) then "time_before" macro will
    fail and timer will not start resulting in no heartbeat packets send by
    the node.

    This was found when association was initialized within first 5 mins
    after system boot due to jiffies init value which is near to MAX_JIFFIES.

    Test kernel version: 4.9.154 (ARCH=arm)
    hb_timer.expire = 0; //initialized, not started timer
    new_expire = MAX_JIFFIES / 2 + 2; //or more
    time_before(hb_timer.expire, new_expire) == false

    Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often")
    Reported-by: Marcin Stojek
    Tested-by: Marcin Stojek
    Signed-off-by: Maciej Kwiecien
    Reviewed-by: Alexander Sverdlin
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Maciej Kwiecien
     

21 Sep, 2018

1 commit

  • When processing pmtu update from an icmp packet, it calls .update_pmtu
    with sk instead of skb in sctp_transport_update_pmtu.

    However for sctp, the daddr in the transport might be different from
    inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or
    create the route cache. The incorrect daddr will cause a different
    route cache created for the path.

    So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr
    should be updated with the daddr in the transport, and update it back
    after it's done.

    The issue has existed since route exceptions introduction.

    Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
    Reported-by: ian.periam@dialogic.com
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

04 Jul, 2018

1 commit

  • After commit b6c5734db070 ("sctp: fix the handling of ICMP Frag Needed
    for too small MTUs"), sctp_transport_update_pmtu would refetch pathmtu
    from the dst and set it to transport's pathmtu without any check.

    The new pathmtu may be lower than MINSEGMENT if the dst is obsolete and
    updated by .get_dst() in sctp_transport_update_pmtu. In this case, it
    could have a smaller MTU as well, and thus we should validate it
    against MINSEGMENT instead.

    Syzbot reported a warning in sctp_mtu_payload caused by this.

    This patch refetches the pathmtu by calling sctp_dst_mtu where it does
    the check against MINSEGMENT.

    v1->v2:
    - refetch the pathmtu by calling sctp_dst_mtu instead as Marcelo's
    suggestion.

    Fixes: b6c5734db070 ("sctp: fix the handling of ICMP Frag Needed for too small MTUs")
    Reported-by: syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com
    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

05 Jun, 2018

1 commit

  • syzbot reported a rcu_sched self-detected stall on CPU which is caused
    by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
    value, hb_timer will get stuck there, as in its timer handler it starts
    this timer again with this value, then goes to the timer handler again.

    This problem is there since very beginning, and thanks to Eric for the
    reproducer shared from a syzbot mail.

    This patch fixes it by not allowing sctp_transport_timeout to return a
    smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.

    Note that it doesn't fix this issue by limiting rto_min, as some users
    are still using small rto and no proper value was found for it yet.

    Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

28 Apr, 2018

3 commits


09 Jan, 2018

1 commit

  • syzbot reported a hang involving SCTP, on which it kept flooding dmesg
    with the message:
    [ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
    low, using default minimum of 512

    That happened because whenever SCTP hits an ICMP Frag Needed, it tries
    to adjust to the new MTU and triggers an immediate retransmission. But
    it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
    allowed (512) would not cause the PMTU to change, and issued the
    retransmission anyway (thus leading to another ICMP Frag Needed, and so
    on).

    As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
    are higher than that, sctp_transport_update_pmtu() is changed to
    re-fetch the PMTU that got set after our request, and with that, detect
    if there was an actual change or not.

    The fix, thus, skips the immediate retransmission if the received ICMP
    resulted in no change, in the hope that SCTP will select another path.

    Note: The value being used for the minimum MTU (512,
    SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
    SCTP_MIN_PMTU), but such change belongs to another patch.

    Changes from v1:
    - do not disable PMTU discovery, in the light of commit
    06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
    and as suggested by Xin Long.
    - changed the way to break the rtx loop by detecting if the icmp
    resulted in a change or not
    Changes from v2:
    none

    See-also: https://lkml.org/lkml/2017/12/22/811
    Reported-by: syzbot
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

25 Oct, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Vlad Yasevich
    Cc: Neil Horman
    Cc: "David S. Miller"
    Cc: linux-sctp@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

07 Aug, 2017

1 commit


05 Jul, 2017

1 commit


26 Jun, 2017

4 commits

  • RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd
    because otherwise it could cause the transport to lock into congestion
    avoidance phase specially if ssthresh was previously reduced by some
    packet drop, leading to poor performance.

    The Errata says to adjust ssthresh to cwnd only once, though the same
    goal is achieved by updating it every time we update cwnd too. The
    caveat is that we could take longer to get back up to speed but that
    should be compensated by the fact that we don't adjust on RTO basis (as
    RFC says) but based on Heartbeats, which are usually way longer.

    See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • RFC4960 Errata 3.26 identified that at the same time RFC4960 states that
    cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was
    underspecified and as described could allow increasing cwnd more than
    that.

    This patch updates it so partial_bytes_acked is maxed to cwnd if
    flight_size doesn't reach cwnd, protecting it from such case.

    See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • As per RFC4960 Errata 3.22, this condition is not needed anymore as it
    could cause the partial_bytes_acked to not consider the TSNs acked in
    the Gap Ack Blocks although they were received by the peer successfully.

    This patch thus drops the check for new Cumulative TSN Ack Point,
    leaving just the flight_size < cwnd one.

    See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • RFC4960 Errata 3.12 says RFC4960 is unclear about the order of
    adjustments applied to partial_bytes_acked and cwnd in the congestion
    avoidance phase, and that the actual order should be:
    partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd
    is increased by MTU.

    We were first increasing cwnd, and then subtracting the new value pba,
    which leads to a different result as pba is smaller than what it should
    and could cause cwnd to not grow as much.

    See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

05 Apr, 2017

1 commit

  • This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU
    updates to accomodate route invalidation."). As t->asoc can't be NULL
    in sctp_transport_update_pmtu, it could get sk from asoc, and no need
    to pass sk into that function.

    It is also to remove some duplicated codes from that function.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

28 Feb, 2017

1 commit

  • Fix typos and add the following to the scripts/spelling.txt:

    varible||variable

    While we are here, tidy up the comment blocks that fit in a single line
    for drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c and
    net/sctp/transport.c.

    Link: http://lkml.kernel.org/r/1481573103-11329-11-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

08 Feb, 2017

1 commit

  • Add new transport flag to allow sockets to confirm neighbour.
    When same struct dst_entry can be used for many different
    neighbours we can not use it for pending confirmations.
    The flag is propagated from transport to every packet.
    It is reset when cached dst is reset.

    Reported-by: YueHaibing
    Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
    Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Julian Anastasov
     

19 Jan, 2017

1 commit

  • This patch is to add a per transport timer based on sctp timer frame
    for stream reconf chunk retransmission. It would start after sending
    a reconf request chunk, and stop after receiving the response chunk.

    If the timer expires, besides retransmitting the reconf request chunk,
    it would also do the same thing with data RTO timer. like to increase
    the appropriate error counts, and perform threshold management, possibly
    destroying the asoc if sctp retransmission thresholds are exceeded, just
    as section 5.1.1 describes.

    This patch is also to add asoc strreset_chunk, it is used to save the
    reconf request chunk, so that it can be retransmitted, and to check if
    the response is really for this request by comparing the information
    inside with the response chunk as well.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

26 Dec, 2016

1 commit

  • ktime_set(S,N) was required for the timespec storage type and is still
    useful for situations where a Seconds and Nanoseconds part of a time value
    needs to be converted. For anything where the Seconds argument is 0, this
    is pointless and can be replaced with a simple assignment.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

22 Sep, 2016

1 commit

  • To something more meaningful these days, specially because this is
    working on packet headers or lengths and which are not tied to any CPU
    arch but to the protocol itself.

    So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.

    Reported-by: David Laight
    Reported-by: David Miller
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

11 Apr, 2016

1 commit

  • Currently on high rate SCTP streams the heartbeat timer refresh can
    consume quite a lot of resources as timer updates are costly and it
    contains a random factor, which a) is also costly and b) invalidates
    mod_timer() optimization for not editing a timer to the same value.
    It may even cause the timer to be slightly advanced, for no good reason.

    As suggested by David Laight this patch now removes this timer update
    from hot path by leaving the timer on and re-evaluating upon its
    expiration if the heartbeat is still needed or not, similarly to what is
    done for TCP. If it's not needed anymore the timer is re-scheduled to
    the new timeout, considering the time already elapsed.

    For this, we now record the last tx timestamp per transport, updated in
    the same spots as hb timer was restarted on tx. Also split up
    sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
    sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
    heartbeat one.

    On loopback with MTU of 65535 and data chunks with 1636, so that we
    have a considerable amount of chunks without stressing system calls,
    netperf -t SCTP_STREAM -l 30, perf looked like this before:

    Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
    Overhead Command Shared Object Symbol
    + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
    - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
    - _raw_write_unlock_irqrestore
    - 96,54% _raw_spin_unlock_irqrestore
    - 36,14% mod_timer
    + 97,24% sctp_transport_reset_timers
    + 2,76% sctp_do_sm
    + 33,65% __wake_up_sync_key
    + 28,77% sctp_ulpq_tail_event
    + 1,40% del_timer
    - 1,84% mod_timer
    + 99,03% sctp_transport_reset_timers
    + 0,97% sctp_do_sm
    + 1,50% sctp_ulpq_tail_event

    And after this patch, now with netperf -l 60:

    Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
    Overhead Command Shared Object Symbol
    + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms
    + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
    - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
    - _raw_spin_unlock_irqrestore
    + 49,89% __wake_up_sync_key
    + 45,68% sctp_ulpq_tail_event
    - 2,85% mod_timer
    + 76,51% sctp_transport_reset_t3_rtx
    + 23,49% sctp_do_sm
    + 1,55% del_timer
    + 2,50% netperf [sctp] [k] sctp_datamsg_from_user
    + 2,26% netperf [sctp] [k] sctp_sendmsg

    Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
    ~3.7%.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

21 Mar, 2016

1 commit

  • SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
    MTU can sometimes return values that are not aligned, like for loopback,
    which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
    will cause the last non-aligned bytes to never be used and can cause
    issues with congestion control.

    So it's better to just consider a lower MTU and keep congestion control
    calcs saner as they are based on PMTU.

    Same applies to icmp frag needed messages, which is also fixed by this
    patch.

    One other effect of this is the inability to send MTU-sized packet
    without queueing or fragmentation and without hitting Nagle. As the
    check performed at sctp_packet_can_append_data():

    if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
    /* Enough data queued to fill a packet */
    return SCTP_XMIT_OK;

    with the above example of MTU, if there are no other messages queued,
    one cannot send a packet that just fits one packet (65532 bytes) and
    without causing DATA chunk fragmentation or a delay.

    v2:
    - Added WORD_TRUNC macro

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

14 Mar, 2016

1 commit

  • prior to this patch, at the beginning if we have two paths in one assoc,
    they may have the same params other than the last_time_heard, it will try
    the paths like this:

    1st cycle
    try trans1 fail.
    then trans2 is selected.(cause it's last_time_heard is after trans1).

    2nd cycle:
    try trans2 fail
    then trans2 is selected.(cause it's last_time_heard is after trans1).

    3rd cycle:
    try trans2 fail
    then trans2 is selected.(cause it's last_time_heard is after trans1).

    ....

    trans1 will never have change to be selected, which is not what we expect.
    we should keeping round robin all the paths if they are just added at the
    beginning.

    So at first every tranport's last_time_heard should be initialized 0, so
    that we ensure they have the same value at the beginning, only by this,
    all the transports could get equal chance to be selected.

    Then for sctp_trans_elect_best, it should return the trans_next one when
    *trans == *trans_next, so that we can try next if it fails, but now it
    always return trans. so we can fix it by exchanging these two params when
    we calls sctp_trans_elect_tie().

    Fixes: 4c47af4d5eb2 ('net: sctp: rework multihoming retransmission path selection to rfc4960')
    Signed-off-by: Xin Long
    Acked-by: Daniel Borkmann
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

29 Jan, 2016

2 commits

  • After we use refcnt to check if transport is alive, the dead can be
    removed from sctp_transport.

    The traversal of transport_addr_list in procfs dump is using
    list_for_each_entry_rcu, no need to check if it has been freed.

    sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
    protected by sock lock, it's not necessary to check dead, either.
    also, the timers are cancelled when sctp_transport_free() is
    called, that it doesn't wait for refcnt to reach 0 to cancel them.

    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • Now when __sctp_lookup_association is running in BH, it will try to
    check if t->dead is set, but meanwhile other CPUs may be freeing this
    transport and this assoc and if it happens that
    __sctp_lookup_association checked t->dead a bit too early, it may think
    that the association is still good while it was already freed.

    So we fix this race by using atomic_add_unless in sctp_transport_hold.
    After we get one transport from hashtable, we will hold it only when
    this transport's refcnt is not 0, so that we can make sure t->asoc
    cannot be freed before we hold the asoc again.

    Note that sctp association is not freed using RCU so we can't use
    atomic_add_unless() with it as it may just be too late for that either.

    Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path")
    Reported-by: Vlad Yasevich
    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

10 Nov, 2015

1 commit

  • Switch everything to the new and more capable implementation of abs().
    Mainly to give the new abs() a bit of a workout.

    Cc: Michal Nazarewicz
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Aug, 2014

1 commit

  • The SCTP socket extensions API document describes the v4mapping option as
    follows:

    8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)

    This socket option is a Boolean flag which turns on or off the
    mapping of IPv4 addresses. If this option is turned on, then IPv4
    addresses will be mapped to V6 representation. If this option is
    turned off, then no mapping will be done of V4 addresses and a user
    will receive both PF_INET6 and PF_INET type addresses on the socket.
    See [RFC3542] for more details on mapped V6 addresses.

    This description isn't really in line with what the code does though.

    Introduce addr_to_user (renamed addr_v4map), which should be called
    before any sockaddr is passed back to user space. The new function
    places the sockaddr into the correct format depending on the
    SCTP_I_WANT_MAPPED_V4_ADDR option.

    Audit all places that touched v4mapped and either sanely construct
    a v4 or v6 address then call addr_to_user, or drop the
    unnecessary v4mapped check entirely.

    Audit all places that call addr_to_user and verify they are on a sycall
    return path.

    Add a custom getname that formats the address properly.

    Several bugs are addressed:
    - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
    addresses to user space
    - The addr_len returned from recvmsg was not correct when
    returning AF_INET on a v6 socket
    - flowlabel and scope_id were not zerod when promoting
    a v4 to v6
    - Some syscalls like bind and connect behaved differently
    depending on v4mapped

    Tested bind, getpeername, getsockname, connect, and recvmsg for proper
    behaviour in v4mapped = 1 and 0 cases.

    Signed-off-by: Neil Horman
    Tested-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: David S. Miller

    Jason Gunthorpe
     

03 Jul, 2014

1 commit

  • RFC4960, section 8.3 says:

    On an idle destination address that is allowed to heartbeat,
    it is recommended that a HEARTBEAT chunk is sent once per RTO
    of that destination address plus the protocol parameter
    'HB.interval', with jittering of +/- 50% of the RTO value,
    and exponential backoff of the RTO if the previous HEARTBEAT
    is unanswered.

    Currently, we calculate jitter via sctp_jitter() function first,
    and then add its result to the current RTO for the new timeout:

    TMO = RTO + (RAND() % RTO) - (RTO / 2)
    `------------------------^-=> sctp_jitter()

    Instead, we can just simplify all this by directly calculating:

    TMO = (RTO / 2) + (RAND() % RTO)

    With the help of prandom_u32_max(), we don't need to open code
    our own global PRNG, but can instead just make use of the per
    CPU implementation of prandom with better quality numbers. Also,
    we can now spare us the conditional for divide by zero check
    since no div or mod operation needs to be used. Note that
    prandom_u32_max() won't emit the same result as a mod operation,
    but we really don't care here as we only want to have a random
    number scaled into RTO interval.

    Note, exponential RTO backoff is handeled elsewhere, namely in
    sctp_do_8_2_transport_strike().

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Jun, 2014

1 commit

  • Be more precise in transport path selection and use ktime
    helpers instead of jiffies to compare and pick the better
    primary and secondary recently used transports. This also
    avoids any side-effects during a possible roll-over, and
    could lead to better path decision-making.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

14 Feb, 2014

1 commit

  • One of my pet coding style peeves is the practice of
    adding extra return; at the end of function.
    Kill several instances of this in network code.

    I suppose some coccinelle wizardy could do this automatically.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

10 Dec, 2013

1 commit


07 Dec, 2013

1 commit

  • Several files refer to an old address for the Free Software Foundation
    in the file header comment. Resolve by replacing the address with
    the URL so that we do not have to keep
    updating the header comments anytime the address changes.

    CC: Vlad Yasevich
    CC: Neil Horman
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jeff Kirsher
     

06 Dec, 2013

1 commit

  • As Michael pointed out that when max_burst is 0, it just disable
    max_burst. It declared in rfc6458#section-8.1.24. so add the check
    in sctp_transport_burst_limited, when it 0, just do nothing.

    Reviewed-by: Daniel Borkmann
    Suggested-by: Vlad Yasevich
    Suggested-by: Michael Tuexen
    Signed-off-by: Wang Weidong
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    wangweidong