Eric Lee / smarc-fsl-linux-kernel

15 Nov, 2020

1 commit

057a10fa1 sctp: change to hold/put transport for proto_unreach_timer ... Browse Code »

A call trace was found in Hangbin's Codenomicon testing with debug kernel:

[ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp]
[ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0
[ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0
[ 2616.191533] Call Trace:
[ 2616.194265]
[ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0
[ 2616.207336] slab_free_freelist_hook+0xeb/0x140
[ 2616.220971] kfree+0xd6/0x2c0
[ 2616.224293] rcu_do_batch+0x3bd/0xc70
[ 2616.243096] rcu_core+0x8b9/0xd00
[ 2616.256065] __do_softirq+0x23d/0xacd
[ 2616.260166] irq_exit+0x236/0x2a0
[ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620
[ 2616.269138] apic_timer_interrupt+0xf/0x20
[ 2616.273711]

This is because it holds asoc when transport->proto_unreach_timer starts
and puts asoc when the timer stops, and without holding transport the
transport could be freed when the timer is still running.

So fix it by holding/putting transport instead for proto_unreach_timer
in transport, just like other timers in transport.

v1->v2:
- Also use sctp_transport_put() for the "out_unlock:" path in
sctp_generate_proto_unreach_event(), as Marcelo noticed.

Fixes: 50b5d6ad6382 ("sctp: Fix a race between ICMP protocol unreachable and connect()")
Reported-by: Hangbin Liu
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski

Xin Long
2020-11-15 03:57:12 +0800

01 Jan, 2020

1 commit

31d518f35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Simple overlapping changes in bpf land wrt. bpf_helper_defs.h
handling.

Signed-off-by: David S. Miller

David S. Miller
2020-01-01 05:37:13 +0800

25 Dec, 2019

1 commit

bd085ef67 net: add bool confirm_neigh parameter for dst_ops.update_pmtu ... Browse Code »

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Suggested-by: David Miller
Reviewed-by: Guillaume Nault
Acked-by: David Ahern
Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller

Hangbin Liu
2019-12-25 14:28:54 +0800

10 Dec, 2019

1 commit

4e7696d90 sctp: get netns from asoc and ep base ... Browse Code »

Commit 312434617cb1 ("sctp: cache netns in sctp_ep_common") set netns
in asoc and ep base since they're created, and it will never change.
It's a better way to get netns from asoc and ep base, comparing to
calling sock_net().

This patch is to replace them.

v1->v2:
- no change.

Suggested-by: Marcelo Ricardo Leitner
Signed-off-by: Xin Long
Acked-by: Neil Horman
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2019-12-10 12:14:01 +0800

31 Jul, 2019

1 commit

4c31bc6b1 sctp: only copy the available addr data in sctp_transport_init ... Browse Code »

'addr' passed to sctp_transport_init is not always a whole size
of union sctp_addr, like the path:

sctp_sendmsg() ->
sctp_sendmsg_new_asoc() ->
sctp_assoc_add_peer() ->
sctp_transport_new() -> sctp_transport_init()

In the next patches, we will also pass the address length of data
only to sctp_assoc_add_peer().

So sctp_transport_init() should copy the only available data from
addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
cause slab-out-of-bounds.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2019-07-31 05:18:14 +0800

24 May, 2019

1 commit

47505b8bc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104 ... Browse Code »

Based on 1 normalized pattern(s):

this sctp implementation is free software you can redistribute it
and or modify it under the terms of the gnu general public license
as published by the free software foundation either version 2 or at
your option any later version this sctp implementation is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with gnu cc see the file copying if not see
http www gnu org licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 42 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Kate Stewart
Reviewed-by: Richard Fontana
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-24 23:39:00 +0800

23 Feb, 2019

1 commit

d1f20c03f sctp: don't compare hb_timer expire date before starting it ... Browse Code »

hb_timer might not start at all for a particular transport because its
start is conditional. In a result a node is not sending heartbeats.

Function sctp_transport_reset_hb_timer has two roles:
- initial start of hb_timer for a given transport,
- update expire date of hb_timer for a given transport.
The function is optimized to update timer's expire only if it is before
a new calculated one but this comparison is invalid for a timer which
has not yet started. Such a timer has expire == 0 and if a new expire
value is bigger than (MAX_JIFFIES / 2 + 2) then "time_before" macro will
fail and timer will not start resulting in no heartbeat packets send by
the node.

This was found when association was initialized within first 5 mins
after system boot due to jiffies init value which is near to MAX_JIFFIES.

Test kernel version: 4.9.154 (ARCH=arm)
hb_timer.expire = 0; //initialized, not started timer
new_expire = MAX_JIFFIES / 2 + 2; //or more
time_before(hb_timer.expire, new_expire) == false

Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often")
Reported-by: Marcin Stojek
Tested-by: Marcin Stojek
Signed-off-by: Maciej Kwiecien
Reviewed-by: Alexander Sverdlin
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Maciej Kwiecien
2019-02-23 03:11:54 +0800

21 Sep, 2018

1 commit

d7ab5cdce sctp: update dst pmtu with the correct daddr ... Browse Code »

When processing pmtu update from an icmp packet, it calls .update_pmtu
with sk instead of skb in sctp_transport_update_pmtu.

However for sctp, the daddr in the transport might be different from
inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or
create the route cache. The incorrect daddr will cause a different
route cache created for the path.

So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr
should be updated with the daddr in the transport, and update it back
after it's done.

The issue has existed since route exceptions introduction.

Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Reported-by: ian.periam@dialogic.com
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2018-09-21 02:29:30 +0800

04 Jul, 2018

1 commit

a65925475 sctp: fix the issue that pathmtu may be set lower than MINSEGMENT ... Browse Code »

After commit b6c5734db070 ("sctp: fix the handling of ICMP Frag Needed
for too small MTUs"), sctp_transport_update_pmtu would refetch pathmtu
from the dst and set it to transport's pathmtu without any check.

The new pathmtu may be lower than MINSEGMENT if the dst is obsolete and
updated by .get_dst() in sctp_transport_update_pmtu. In this case, it
could have a smaller MTU as well, and thus we should validate it
against MINSEGMENT instead.

Syzbot reported a warning in sctp_mtu_payload caused by this.

This patch refetches the pathmtu by calling sctp_dst_mtu where it does
the check against MINSEGMENT.

v1->v2:
- refetch the pathmtu by calling sctp_dst_mtu instead as Marcelo's
suggestion.

Fixes: b6c5734db070 ("sctp: fix the handling of ICMP Frag Needed for too small MTUs")
Reported-by: syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com
Suggested-by: Marcelo Ricardo Leitner
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Xin Long
2018-07-04 20:36:34 +0800

05 Jun, 2018

1 commit

1d88ba1eb sctp: not allow transport timeout value less than HZ/5 for hb_timer ... Browse Code »

syzbot reported a rcu_sched self-detected stall on CPU which is caused
by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
value, hb_timer will get stuck there, as in its timer handler it starts
this timer again with this value, then goes to the timer handler again.

This problem is there since very beginning, and thanks to Eric for the
reproducer shared from a syzbot mail.

This patch fixes it by not allowing sctp_transport_timeout to return a
smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.

Note that it doesn't fix this issue by limiting rto_min, as some users
are still using small rto and no proper value was found for it yet.

Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
Suggested-by: Marcelo Ricardo Leitner
Signed-off-by: Xin Long
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Xin Long
2018-06-05 22:22:45 +0800

28 Apr, 2018

3 commits

6e91b578b sctp: re-use sctp_transport_pmtu in sctp_transport_route ... Browse Code »

sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.

This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.

Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2018-04-28 02:35:23 +0800
6ff0f871c sctp: introduce sctp_dst_mtu ... Browse Code »

Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2018-04-28 02:35:23 +0800
800e00c12 sctp: move transport pathmtu calc away of sctp_assoc_add_peer ... Browse Code »

There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2018-04-28 02:35:22 +0800

09 Jan, 2018

1 commit

b6c5734db sctp: fix the handling of ICMP Frag Needed for too small MTUs ... Browse Code »

syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512

That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).

As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.

The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.

Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.

Changes from v1:
- do not disable PMTU discovery, in the light of commit
06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none

See-also: https://lkml.org/lkml/2017/12/22/811
Reported-by: syzbot
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2018-01-09 03:19:13 +0800

25 Oct, 2017

1 commit

9c3b57518 net: sctp: Convert timers to use timer_setup() ... Browse Code »

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Vlad Yasevich
Cc: Neil Horman
Cc: "David S. Miller"
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook
Signed-off-by: David S. Miller

Kees Cook
2017-10-25 11:02:09 +0800

07 Aug, 2017

1 commit

233e7936c sctp: remove the typedef sctp_lower_cwnd_t ... Browse Code »

This patch is to remove the typedef sctp_lower_cwnd_t, and
replace with enum sctp_lower_cwnd in the places where it's
using this typedef.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2017-08-07 12:33:41 +0800

05 Jul, 2017

1 commit

a4b2b58ef net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-05 05:35:19 +0800

26 Jun, 2017

4 commits

a02d036c0 sctp: adjust ssthresh when transport is idle ... Browse Code »

RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd
because otherwise it could cause the transport to lock into congestion
avoidance phase specially if ssthresh was previously reduced by some
packet drop, leading to poor performance.

The Errata says to adjust ssthresh to cwnd only once, though the same
goal is achieved by updating it every time we update cwnd too. The
caveat is that we could take longer to get back up to speed but that
should be compensated by the fact that we don't adjust on RTO basis (as
RFC says) but based on Heartbeats, which are usually way longer.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2017-06-26 02:43:53 +0800
4ccbd0b0b sctp: adjust cwnd increase in Congestion Avoidance phase ... Browse Code »

RFC4960 Errata 3.26 identified that at the same time RFC4960 states that
cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was
underspecified and as described could allow increasing cwnd more than
that.

This patch updates it so partial_bytes_acked is maxed to cwnd if
flight_size doesn't reach cwnd, protecting it from such case.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2017-06-26 02:43:53 +0800
e56f777af sctp: allow increasing cwnd regardless of ctsn moving or not ... Browse Code »

As per RFC4960 Errata 3.22, this condition is not needed anymore as it
could cause the partial_bytes_acked to not consider the TSNs acked in
the Gap Ack Blocks although they were received by the peer successfully.

This patch thus drops the check for new Cumulative TSN Ack Point,
leaving just the flight_size < cwnd one.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2017-06-26 02:43:53 +0800
d0b53f409 sctp: update order of adjustments of partial_bytes_acked and cwnd ... Browse Code »

RFC4960 Errata 3.12 says RFC4960 is unclear about the order of
adjustments applied to partial_bytes_acked and cwnd in the congestion
avoidance phase, and that the actual order should be:
partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd
is increased by MTU.

We were first increasing cwnd, and then subtracting the new value pba,
which leads to a different result as pba is smaller than what it should
and could cause cwnd to not grow as much.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2017-06-26 02:43:53 +0800

05 Apr, 2017

1 commit

3ebfdf082 sctp: get sock from transport in sctp_transport_update_pmtu ... Browse Code »

This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU
updates to accomodate route invalidation."). As t->asoc can't be NULL
in sctp_transport_update_pmtu, it could get sk from asoc, and no need
to pass sk into that function.

It is also to remove some duplicated codes from that function.

Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2017-04-05 22:20:06 +0800

28 Feb, 2017

1 commit

b564d62e6 scripts/spelling.txt: add "varible" pattern and fix typo instances ... Browse Code »

Fix typos and add the following to the scripts/spelling.txt:

varible||variable

While we are here, tidy up the comment blocks that fit in a single line
for drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c and
net/sctp/transport.c.

Link: http://lkml.kernel.org/r/1481573103-11329-11-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2017-02-28 10:43:47 +0800

08 Feb, 2017

1 commit

c86a773c7 sctp: add dst_pending_confirm flag ... Browse Code »

Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov
Acked-by: Eric Dumazet
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Julian Anastasov
2017-02-08 02:07:46 +0800

19 Jan, 2017

1 commit

7b9438de0 sctp: add stream reconf timer ... Browse Code »

This patch is to add a per transport timer based on sctp timer frame
for stream reconf chunk retransmission. It would start after sending
a reconf request chunk, and stop after receiving the response chunk.

If the timer expires, besides retransmitting the reconf request chunk,
it would also do the same thing with data RTO timer. like to increase
the appropriate error counts, and perform threshold management, possibly
destroying the asoc if sctp retransmission thresholds are exceeded, just
as section 5.1.1 describes.

This patch is also to add asoc strreset_chunk, it is used to save the
reconf request chunk, so that it can be retransmitted, and to check if
the response is really for this request by comparing the information
inside with the response chunk as well.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2017-01-19 03:55:10 +0800

26 Dec, 2016

1 commit

8b0e19531 ktime: Cleanup ktime_set() usage ... Browse Code »

ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra

Thomas Gleixner
2016-12-26 00:21:22 +0800

22 Sep, 2016

1 commit

e2f036a97 sctp: rename WORD_TRUNC/ROUND macros ... Browse Code »

To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.

So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.

Reported-by: David Laight
Reported-by: David Miller
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-09-22 15:13:26 +0800

11 Apr, 2016

1 commit

ba6f5e33b sctp: avoid refreshing heartbeat timer too often ... Browse Code »

Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.

As suggested by David Laight this patch now removes this timer update
from hot path by leaving the timer on and re-evaluating upon its
expiration if the heartbeat is still needed or not, similarly to what is
done for TCP. If it's not needed anymore the timer is re-scheduled to
the new timeout, considering the time already elapsed.

For this, we now record the last tx timestamp per transport, updated in
the same spots as hb timer was restarted on tx. Also split up
sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
heartbeat one.

On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:

Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
Overhead Command Shared Object Symbol
+ 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
- _raw_write_unlock_irqrestore
- 96,54% _raw_spin_unlock_irqrestore
- 36,14% mod_timer
+ 97,24% sctp_transport_reset_timers
+ 2,76% sctp_do_sm
+ 33,65% __wake_up_sync_key
+ 28,77% sctp_ulpq_tail_event
+ 1,40% del_timer
- 1,84% mod_timer
+ 99,03% sctp_transport_reset_timers
+ 0,97% sctp_do_sm
+ 1,50% sctp_ulpq_tail_event

And after this patch, now with netperf -l 60:

Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
Overhead Command Shared Object Symbol
+ 5,65% netperf [kernel.vmlinux] [k] memcpy_erms
+ 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
- _raw_spin_unlock_irqrestore
+ 49,89% __wake_up_sync_key
+ 45,68% sctp_ulpq_tail_event
- 2,85% mod_timer
+ 76,51% sctp_transport_reset_t3_rtx
+ 23,49% sctp_do_sm
+ 1,55% del_timer
+ 2,50% netperf [sctp] [k] sctp_datamsg_from_user
+ 2,26% netperf [sctp] [k] sctp_sendmsg

Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
~3.7%.

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-04-11 10:22:34 +0800

21 Mar, 2016

1 commit

3822a5ff4 sctp: align MTU to a word ... Browse Code »

SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
MTU can sometimes return values that are not aligned, like for loopback,
which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
will cause the last non-aligned bytes to never be used and can cause
issues with congestion control.

So it's better to just consider a lower MTU and keep congestion control
calcs saner as they are based on PMTU.

Same applies to icmp frag needed messages, which is also fixed by this
patch.

One other effect of this is the inability to send MTU-sized packet
without queueing or fragmentation and without hitting Nagle. As the
check performed at sctp_packet_can_append_data():

if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
/* Enough data queued to fill a packet */
return SCTP_XMIT_OK;

with the above example of MTU, if there are no other messages queued,
one cannot send a packet that just fits one packet (65532 bytes) and
without causing DATA chunk fragmentation or a delay.

v2:
- Added WORD_TRUNC macro

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-03-21 04:31:12 +0800

14 Mar, 2016

1 commit

39d2adebf sctp: fix the transports round robin issue when init is retransmitted ... Browse Code »

prior to this patch, at the beginning if we have two paths in one assoc,
they may have the same params other than the last_time_heard, it will try
the paths like this:

1st cycle
try trans1 fail.
then trans2 is selected.(cause it's last_time_heard is after trans1).

2nd cycle:
try trans2 fail
then trans2 is selected.(cause it's last_time_heard is after trans1).

3rd cycle:
try trans2 fail
then trans2 is selected.(cause it's last_time_heard is after trans1).

....

trans1 will never have change to be selected, which is not what we expect.
we should keeping round robin all the paths if they are just added at the
beginning.

So at first every tranport's last_time_heard should be initialized 0, so
that we ensure they have the same value at the beginning, only by this,
all the transports could get equal chance to be selected.

Then for sctp_trans_elect_best, it should return the trans_next one when
*trans == *trans_next, so that we can try next if it fails, but now it
always return trans. so we can fix it by exchanging these two params when
we calls sctp_trans_elect_tie().

Fixes: 4c47af4d5eb2 ('net: sctp: rework multihoming retransmission path selection to rfc4960')
Signed-off-by: Xin Long
Acked-by: Daniel Borkmann
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-03-14 09:52:50 +0800

29 Jan, 2016

2 commits

47faa1e4c sctp: remove the dead field of sctp_transport ... Browse Code »

After we use refcnt to check if transport is alive, the dead can be
removed from sctp_transport.

The traversal of transport_addr_list in procfs dump is using
list_for_each_entry_rcu, no need to check if it has been freed.

sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
protected by sock lock, it's not necessary to check dead, either.
also, the timers are cancelled when sctp_transport_free() is
called, that it doesn't wait for refcnt to reach 0 to cancel them.

Signed-off-by: Xin Long
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-01-29 07:59:32 +0800
1eed67793 sctp: fix the transport dead race check by using atomic_add_unless on refcnt ... Browse Code »

Now when __sctp_lookup_association is running in BH, it will try to
check if t->dead is set, but meanwhile other CPUs may be freeing this
transport and this assoc and if it happens that
__sctp_lookup_association checked t->dead a bit too early, it may think
that the association is still good while it was already freed.

So we fix this race by using atomic_add_unless in sctp_transport_hold.
After we get one transport from hashtable, we will hold it only when
this transport's refcnt is not 0, so that we can make sure t->asoc
cannot be freed before we hold the asoc again.

Note that sctp association is not freed using RCU so we can't use
atomic_add_unless() with it as it may just be too late for that either.

Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path")
Reported-by: Vlad Yasevich
Signed-off-by: Xin Long
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-01-29 07:59:32 +0800

10 Nov, 2015

1 commit

79211c8ed remove abs64() ... Browse Code »

Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz
Cc: John Stultz
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2015-11-10 07:11:24 +0800

01 Aug, 2014

1 commit

299ee123e sctp: Fixup v4mapped behaviour to comply with Sock API ... Browse Code »

The SCTP socket extensions API document describes the v4mapping option as
follows:

8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)

This socket option is a Boolean flag which turns on or off the
mapping of IPv4 addresses. If this option is turned on, then IPv4
addresses will be mapped to V6 representation. If this option is
turned off, then no mapping will be done of V4 addresses and a user
will receive both PF_INET6 and PF_INET type addresses on the socket.
See [RFC3542] for more details on mapped V6 addresses.

This description isn't really in line with what the code does though.

Introduce addr_to_user (renamed addr_v4map), which should be called
before any sockaddr is passed back to user space. The new function
places the sockaddr into the correct format depending on the
SCTP_I_WANT_MAPPED_V4_ADDR option.

Audit all places that touched v4mapped and either sanely construct
a v4 or v6 address then call addr_to_user, or drop the
unnecessary v4mapped check entirely.

Audit all places that call addr_to_user and verify they are on a sycall
return path.

Add a custom getname that formats the address properly.

Several bugs are addressed:
- SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
addresses to user space
- The addr_len returned from recvmsg was not correct when
returning AF_INET on a v6 socket
- flowlabel and scope_id were not zerod when promoting
a v4 to v6
- Some syscalls like bind and connect behaved differently
depending on v4mapped

Tested bind, getpeername, getsockname, connect, and recvmsg for proper
behaviour in v4mapped = 1 and 0 cases.

Signed-off-by: Neil Horman
Tested-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
Signed-off-by: David S. Miller

Jason Gunthorpe
2014-08-01 12:49:06 +0800

03 Jul, 2014

1 commit

8f61059a9 net: sctp: improve timer slack calculation for transport HBs ... Browse Code »

RFC4960, section 8.3 says:

On an idle destination address that is allowed to heartbeat,
it is recommended that a HEARTBEAT chunk is sent once per RTO
of that destination address plus the protocol parameter
'HB.interval', with jittering of +/- 50% of the RTO value,
and exponential backoff of the RTO if the previous HEARTBEAT
is unanswered.

Currently, we calculate jitter via sctp_jitter() function first,
and then add its result to the current RTO for the new timeout:

TMO = RTO + (RAND() % RTO) - (RTO / 2)
`------------------------^-=> sctp_jitter()

Instead, we can just simplify all this by directly calculating:

TMO = (RTO / 2) + (RAND() % RTO)

With the help of prandom_u32_max(), we don't need to open code
our own global PRNG, but can instead just make use of the per
CPU implementation of prandom with better quality numbers. Also,
we can now spare us the conditional for divide by zero check
since no div or mod operation needs to be used. Note that
prandom_u32_max() won't emit the same result as a mod operation,
but we really don't care here as we only want to have a random
number scaled into RTO interval.

Note, exponential RTO backoff is handeled elsewhere, namely in
sctp_do_8_2_transport_strike().

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2014-07-03 09:44:07 +0800

12 Jun, 2014

1 commit

e575235fc net: sctp: migrate most recently used transport to ktime ... Browse Code »

Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2014-06-12 03:23:17 +0800

14 Feb, 2014

1 commit

2045ceaed net: remove unnecessary return's ... Browse Code »

One of my pet coding style peeves is the practice of
adding extra return; at the end of function.
Kill several instances of this in network code.

I suppose some coccinelle wizardy could do this automatically.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
2014-02-14 07:33:38 +0800

10 Dec, 2013

1 commit

34f9f4371 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.

Signed-off-by: David S. Miller

David S. Miller
2013-12-10 09:20:14 +0800

07 Dec, 2013

1 commit

4b2f13a25 sctp: Fix FSF address in file headers ... Browse Code »

Several files refer to an old address for the Free Software Foundation
in the file header comment. Resolve by replacing the address with
the URL so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich
CC: Neil Horman
Signed-off-by: Jeff Kirsher
Signed-off-by: David S. Miller

Jeff Kirsher
2013-12-07 01:37:56 +0800

06 Dec, 2013

1 commit

78ac814f1 sctp: disable max_burst when the max_burst is 0 ... Browse Code »

As Michael pointed out that when max_burst is 0, it just disable
max_burst. It declared in rfc6458#section-8.1.24. so add the check
in sctp_transport_burst_limited, when it 0, just do nothing.

Reviewed-by: Daniel Borkmann
Suggested-by: Vlad Yasevich
Suggested-by: Michael Tuexen
Signed-off-by: Wang Weidong
Acked-by: Neil Horman
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller

wangweidong
2013-12-06 09:55:54 +0800