29 Apr, 2016
7 commits
-
This is never called with a NULL "buf" and anyway, we dereference 's' on
the lines before so it would Oops before we reach the check.Signed-off-by: Dan Carpenter
Acked-by: Ying Xue
Signed-off-by: David S. Miller -
There's no need to calculate rps hash if it was not enabled. So this
patch export rps_needed and check it before trying to get rps
hash. Tests (using pktgen to inject packets to guest) shows this can
improve pps about 13% (when rps is disabled).Before:
~1150000 pps
After:
~1300000 ppsCc: Michael S. Tsirkin
Signed-off-by: Jason Wang
----
Changes from V1:
- Fix build when CONFIG_RPS is not set
Signed-off-by: David S. Miller -
When fragmenting a skb, the next_skb should carry
the eor from prev_skb. The eor of prev_skb should
also be reset.Packetdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 00.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 00.200 sendto(4, ..., 15330, MSG_EOR, ..., ...) = 15330
0.200 sendto(4, ..., 730, 0, ..., ...) = 7300.200 > . 1:7301(7300) ack 1
0.200 > . 7301:14601(7300) ack 10.300 < . 1:1(0) ack 14601 win 257
0.300 > P. 14601:15331(730) ack 1
0.300 > P. 15331:16061(730) ack 10.400 < . 1:1(0) ack 16061 win 257
0.400 close(4) = 0
0.400 > F. 16061:16061(0) ack 1
0.400 < F. 1:1(0) ack 16062 win 257
0.400 > . 16062:16062(0) ack 2Signed-off-by: Martin KaFai Lau
Cc: Eric Dumazet
Cc: Neal Cardwell
Cc: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Cc: Yuchung Cheng
Acked-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller -
This patch:
1. Prevent next_skb from coalescing to the prev_skb if
TCP_SKB_CB(prev_skb)->eor is set
2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is
allowedPacketdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 00.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 00.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 write(4, ..., 11680) = 116800.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 10.300 < . 1:1(0) ack 1 win 257
0.300 > P. 1:731(730) ack 1
0.300 > P. 731:1461(730) ack 1
0.400 < . 1:1(0) ack 13141 win 2570.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2Signed-off-by: Martin KaFai Lau
Cc: Eric Dumazet
Cc: Neal Cardwell
Cc: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Cc: Yuchung Cheng
Acked-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller -
This patch adds an eor bit to the TCP_SKB_CB. When MSG_EOR
is passed to tcp_sendmsg, the eor bit will be set at the skb
containing the last byte of the userland's msg. The eor bit
will prevent data from appending to that skb in the future.The change in do_tcp_sendpages is to honor the eor set
during the previous tcp_sendmsg(MSG_EOR) call.This patch handles the tcp_sendmsg case. The followup patches
will handle other skb coalescing and fragment cases.One potential use case is to use MSG_EOR with
SOF_TIMESTAMPING_TX_ACK to get a more accurate
TCP ack timestamping on application protocol with
multiple outgoing response messages (e.g. HTTP2).Packetdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 00.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 00.200 write(4, ..., 14600) = 14600
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 7300.200 > . 1:7301(7300) ack 1
0.200 > P. 7301:14601(7300) ack 10.300 < . 1:1(0) ack 14601 win 257
0.300 > P. 14601:15331(730) ack 1
0.300 > P. 15331:16061(730) ack 10.400 < . 1:1(0) ack 16061 win 257
0.400 close(4) = 0
0.400 > F. 16061:16061(0) ack 1
0.400 < F. 1:1(0) ack 16062 win 257
0.400 > . 16062:16062(0) ack 2Signed-off-by: Martin KaFai Lau
Cc: Eric Dumazet
Cc: Neal Cardwell
Cc: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Cc: Yuchung Cheng
Suggested-by: Eric Dumazet
Acked-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller -
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
the timestamp of the TCP acknowledgement should be reported on
error queue. Since accessing skb_shinfo is likely to incur a
cache-line miss at the time of receiving the ack, the
txstamp_ack bit was added in tcp_skb_cb, which is set iff
the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
SKBTX_ACK_TSTAMP flag redundant.Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
everywhere.Note that this frees one bit in shinfo->tx_flags.
Signed-off-by: Soheil Hassas Yeganeh
Acked-by: Martin KaFai Lau
Suggested-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
Remove the redundant check for sk->sk_tsflags in tcp_tx_timestamp.
tcp_tx_timestamp() receives the tsflags as a parameter. As a
result the "sk->sk_tsflags || tsflags" is redundant, since
tsflags already includes sk->sk_tsflags plus overrides from
control messages.Signed-off-by: Soheil Hassas Yeganeh
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
28 Apr, 2016
18 commits
-
There is nothing related to BH in SNMP counters anymore,
since linux-3.0.Rename helpers to use __ prefix instead of _BH prefix,
for contexts where preemption is disabled.This more closely matches convention used to update
percpu variables.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
IPv6 ICMP stats are atomics anyway.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename IP6_UPD_PO_STATS_BH() to __IP6_UPD_PO_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename IP6_INC_STATS_BH() to __IP6_INC_STATS()
and IP6_ADD_STATS_BH() to __IP6_ADD_STATS()Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename IP_UPD_PO_STATS_BH() to __IP_UPD_PO_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename IP_ADD_STATS_BH() to __IP_ADD_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to
better express this is used in non preemptible context.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Remove misleading _BH suffix.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename TCP_INC_STATS_BH() to __TCP_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename DCCP_INC_STATS_BH() to __DCCP_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Minor overlapping changes in the conflicts.
In the macsec case, the change of the default ID macro
name overlapped with the 64-bit netlink attribute alignment
fixes in net-next.Signed-off-by: David S. Miller
-
Similar to 3bfd847203c6 ("net: Use passed in table for nexthop lookups")
for IPv4, if the route spec contains a table id use that to lookup the
next hop first and fall back to a full lookup if it fails (per the fix
4c9bcd117918b ("net: Fix nexthop lookups")).Example:
root@kenny:~# ip -6 ro ls table red
local 2100:1::1 dev lo proto none metric 0 pref medium
2100:1::/120 dev eth1 proto kernel metric 256 pref medium
local 2100:2::1 dev lo proto none metric 0 pref medium
2100:2::/120 dev eth2 proto kernel metric 256 pref medium
local fe80::e0:f9ff:fe09:3cac dev lo proto none metric 0 pref medium
local fe80::e0:f9ff:fe1c:b974 dev lo proto none metric 0 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth2 proto kernel metric 256 pref medium
ff00::/8 dev red metric 256 pref medium
ff00::/8 dev eth1 metric 256 pref medium
ff00::/8 dev eth2 metric 256 pref medium
unreachable default dev lo metric 240 error -113 pref mediumroot@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
RTNETLINK answers: No route to hostRoute add fails even though 2100:1::64 is a reachable next hop:
root@kenny:~# ping6 -I red 2100:1::64
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::64(2100:1::64) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::64: icmp_seq=1 ttl=64 time=1.33 msWith this patch:
root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
root@kenny:~# ip -6 ro ls table red
local 2100:1::1 dev lo proto none metric 0 pref medium
2100:1::/120 dev eth1 proto kernel metric 256 pref medium
local 2100:2::1 dev lo proto none metric 0 pref medium
2100:2::/120 dev eth2 proto kernel metric 256 pref medium
2100:3::/64 via 2100:1::64 dev eth1 metric 1024 pref medium
local fe80::e0:f9ff:fe09:3cac dev lo proto none metric 0 pref medium
local fe80::e0:f9ff:fe1c:b974 dev lo proto none metric 0 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth2 proto kernel metric 256 pref medium
ff00::/8 dev red metric 256 pref medium
ff00::/8 dev eth1 metric 256 pref medium
ff00::/8 dev eth2 metric 256 pref medium
unreachable default dev lo metric 240 error -113 pref mediumSigned-off-by: David Ahern
Signed-off-by: David S. Miller
27 Apr, 2016
8 commits
-
No more users in the tree, remove NETDEV_TX_LOCKED support.
Adds another hole in softnet_stats struct, but better than keeping
the unused collision counter around.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller -
For sctp assoc, when rcvbuf_policy is set, it will has it's own
rmem_alloc, when we dump asoc info in sctp_diag, we should use that
value on RMEM_ALLOC as well, just like WMEM_ALLOC.Signed-off-by: Xin Long
Signed-off-by: David S. Miller -
…etooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-04-26Here's another set of Bluetooth & 802.15.4 patches for the 4.7 kernel:
- Cleanups & refactoring of ieee802154 & 6lowpan code
- Security related additions to ieee802154 and mrf24j40 driver
- Memory corruption fix to Bluetooth 6lowpan code
- Race condition fix in vhci driver
- Enhancements to the atusb 802.15.4 driverPlease let me know if there are any issues pulling. Thanks.
====================Signed-off-by: David S. Miller <davem@davemloft.net>
-
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller -
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller -
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller -
I also fix commit 8b32ab9e6ef1: use nla_total_size_64bit() for
OVS_FLOW_ATTR_USED in ovs_flow_cmd_msg_size().Fixes: 8b32ab9e6ef1 ("ovs: use nla_put_u64_64bit()")
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller -
I also fix the value of INET_DIAG_MAX. It's wrong since commit 8f840e47f190
which is only in net-next right now, thus I didn't make a separate patch.Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
26 Apr, 2016
7 commits
-
It was a simple idea -- save IPv6 configured addresses on a link down
so that IPv6 behaves similar to IPv4. As always the devil is in the
details and the IPv6 stack as too many behavioral differences from IPv4
making the simple idea more complicated than it needs to be.The current implementation for keeping IPv6 addresses can panic or spit
out a warning in one of many paths:1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
rt6_fill_node while handling a route dump request.2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
fib6_del3. Panic in fib6_purge_rt because rt6i_ref count is not 1.
The root cause of all these is references related to the host route for
an address that is retained.So, this patch deletes the host route every time the ifdown loop runs.
Since the host route is deleted and will be re-generated an up there is
no longer a need for the l3mdev fix up. On the 'admin up' side move
addrconf_permanent_addr into the NETDEV_UP event handling so that it
runs only once versus on UP and CHANGE events.All of the current panics and warnings appear to be related to
addresses on the loopback device, but given the catastrophic nature when
a bug is triggered this patch takes the conservative approach and evicts
all host routes rather than trying to determine when it can be re-used
and when it can not. That can be a later optimizaton if desired.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
This reverts commit 841645b5f2dfceac69b78fcd0c9050868d41ea61.
Ok, this puts the feature back. I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.Signed-off-by: David S. Miller
-
Support checksum neutral ILA as described in the ILA draft. The low
order 16 bits of the identifier are used to contain the checksum
adjustment value.The csum-mode parameter is added to described checksum processing. There
are three values:
- adjust transport checksum (previous behavior)
- do checksum neutral mapping
- do nothingOn output the csum-mode in the ila_params is checked and acted on. If
mode is checksum neutral mapping then to mapping and set C-bit.On input, C-bit is checked. If it is set checksum-netural mapping is
done (regardless of csum-mode in ila params) and C-bit will be cleared.
If it is not set then action in csum-mode is taken.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
Change model of xlat to be used only for input where lookup is done on
the locator part of an address (comparing to locator_match as key
in rhashtable). This is needed for checksum neutral translation
which obfuscates the low order 16 bits of the identifier. It also
permits hosts to be in muliple ILA domains (each locator can map
to a different SIR address). A check is also added to disallow
translating non-ILA addresses (check of type in identifier).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
Add structures for identifiers, locators, and an ila address which
is composed of a locator and identifier and in6_addr can be cast to
it. This includes a three bit type field and enums for the types defined
in ILA I-D.In ILA lwt don't allow user to set a translation for a non-ILA
address (type of identifier is zero meaning it is an IID). This also
requires that the destination prefix is at least 65 bytes (64
bit locator and first byte of identifier).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
The memcpy of ipv6 header destination address to the skb control block
(sbk->cb) in header_create() results in currupted memory when bt_xmit()
is issued. The skb->cb is "released" in the return of header_create()
making room for lower layer to minipulate the skb->cb.The value retrieved in bt_xmit is not persistent across header creation
and sending, and the lower layer will overwrite portions of skb->cb,
making the copied destination address wrong.The memory corruption will lead to non-working multicast as the first 4
bytes of the copied destination address is replaced by a value that
resolves into a non-multicast prefix.This fix removes the dependency on the skb control block between header
creation and send, by moving the destination address memcpy to the send
function path (setup_create, which is called from bt_xmit).Signed-off-by: Glenn Ruben Bakke
Acked-by: Jukka Rissanen
Signed-off-by: Marcel Holtmann
Cc: stable@vger.kernel.org # 4.5+ -
rds-stress experiments with request size 256 bytes, 8K acks,
using 16 threads show a 40% improvment when pskb_extract()
replaces the {skb_clone(..); pskb_pull(..); pskb_trim(..);}
pattern in the Rx path, so we leverage the perf gain with
this commit.Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller