18 Oct, 2018
1 commit
-
[ Upstream commit 7e823644b60555f70f241274b8d0120dd919269a ]
Commit 2276f58ac589 ("udp: use a separate rx queue for packet reception")
turned static inline __skb_recv_udp() from being a trivial helper around
__skb_recv_datagram() into a UDP specific implementaion, making it
EXPORT_SYMBOL_GPL() at the same time.There are external modules that got broken by __skb_recv_udp() not being
visible to them. Let's unbreak them by making __skb_recv_udp EXPORT_SYMBOL().Rationale (one of those) why this is actually "technically correct" thing
to do: __skb_recv_udp() used to be an inline wrapper around
__skb_recv_datagram(), which itself (still, and correctly so, I believe)
is EXPORT_SYMBOL().Cc: Paolo Abeni
Cc: Eric Dumazet
Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
Signed-off-by: Jiri Kosina
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
29 Sep, 2018
1 commit
-
[ Upstream commit 2b5a921740a55c00223a797d075b9c77c42cb171 ]
commit 2abb7cdc0dc8 ("udp: Add support for doing checksum
unnecessary conversion") left out the early demux path for
connected sockets. As a result IP_CMSG_CHECKSUM gives wrong
values for such socket when GRO is not enabled/available.This change addresses the issue by moving the csum conversion to a
common helper and using such helper in both the default and the
early demux rx path.Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
26 Jun, 2018
1 commit
-
[ Upstream commit 6c206b20092a3623184cff9470dba75d21507874 ]
After commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()")
the sk_rmem_alloc field does not measure exactly anymore the
receive queue length, because we batch the rmem release. The issue
is really apparent only after commit 0d4a6608f68c ("udp: do rmem bulk
free even if the rx sk queue is empty"): the user space can easily
check for an empty socket with not-0 queue length reported by the 'ss'
tool or the procfs interface.We need to use a custom UDP helper to report the correct queue length,
taking into account the forward allocation deficit.Reported-by: trevor.francis@46labs.com
Fixes: 6b229cf77d68 ("UDP: add batching to udp_rmem_release()")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
19 May, 2018
2 commits
-
[ Upstream commit 69678bcd4d2dedbc3e8fcd6d7d99f283d83c531a ]
Damir reported a breakage of SO_BINDTODEVICE for UDP sockets.
In absence of VRF devices, after commit fb74c27735f0 ("net:
ipv4: add second dif to udp socket lookups") the dif mismatch
isn't fatal anymore for UDP socket lookup with non null
sk_bound_dev_if, breaking SO_BINDTODEVICE semantics.This changeset addresses the issue making the dif match mandatory
again in the above scenario.Reported-by: Damir Mansurov
Fixes: fb74c27735f0 ("net: ipv4: add second dif to udp socket lookups")
Fixes: 1801b570dd2a ("net: ipv6: add second dif to udp socket lookups")
Signed-off-by: Paolo Abeni
Acked-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1b97013bfb11d66f041de691de6f0fec748ce016 ]
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed
earlier in 919483096bfe.* udp_sendmsg one was there since the beginning when linux sources were
first added to git;
* ping_v4_sendmsg one was copy/pasted in c319b4d76b9e.Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options
have to be freed if they were allocated previously.Add label so that future callers (if any) can use it instead of kfree()
before return that is easy to forget.Fixes: c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind)
Signed-off-by: Andrey Ignatov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
09 Mar, 2018
1 commit
-
[ Upstream commit 15f35d49c93f4fa9875235e7bf3e3783d2dd7a1b ]
Since UDP-Lite is always using checksum, the following path is
triggered when calculating pseudo header for it:udp4_csum_init() or udp6_csum_init()
skb_checksum_init_zero_check()
__skb_checksum_validate_complete()The problem can appear if skb->len is less than CHECKSUM_BREAK. In
this particular case __skb_checksum_validate_complete() also invokes
__skb_checksum_complete(skb). If UDP-Lite is using partial checksum
that covers only part of a packet, the function will return bad
checksum and the packet will be dropped.It can be fixed if we skip skb_checksum_init_zero_check() and only
set the required pseudo header checksum for UDP-Lite with partial
checksum before udp4_csum_init()/udp6_csum_init() functions return.Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4")
Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6")
Signed-off-by: Alexey Kodanev
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
22 Oct, 2017
1 commit
-
Syzkaller stumbled upon a way to trigger
WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39There are two initialization paths for the sock_reuseport structure in a
socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
SO_ATTACH_REUSEPORT_[CE]BPF before bind. The existing implementation
assumedthat the socket lock protected both of these paths when it actually
only protects the SO_ATTACH_REUSEPORT path. Syzkaller triggered this
double allocation by running these paths concurrently.This patch moves the check for double allocation into the reuseport_alloc
function which is protected by a global spin lock.Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: Craig Gallek
Signed-off-by: David S. Miller
21 Oct, 2017
1 commit
-
In the UDP code there are two leftover error messages with very few meaning.
Replace them with a more descriptive error message as some users
reported them as "strange network error".Signed-off-by: Matteo Croce
Signed-off-by: David S. Miller
10 Oct, 2017
1 commit
-
The commit bc044e8db796 ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.Reported-by: Hannes Frederic Sowa
Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
01 Oct, 2017
2 commits
-
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.This still gives a measurable, but limited performance
regression:rp_filter = 0 rp_filter = 1
edmux disabled: 1182 Kpps 1127 Kpps
edmux before: 2238 Kpps 2238 Kpps
edmux after: 2037 Kpps 2019 KppsThe above figures are on top of current net tree.
Applying the net-next commit 6e617de84e87 ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller -
Currently no error is emitted, but this infrastructure will
used by the next patch to allow source address validation
for mcast sockets.
Since early demux can do a route lookup and an ipv4 route
lookup can return an error code this is consistent with the
current ipv4 route infrastructure.Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
08 Sep, 2017
1 commit
-
After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required
for IP options processing") we clear the skb head state as soon
as the skb carrying them is first processed.Since the same skb can be processed several times when MSG_PEEK
is used, we can end up lacking the required head states, and
eventually oopsing.Fix this clearing the skb head state only when processing the
last skb reference.Reported-by: Eric Dumazet
Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
07 Sep, 2017
1 commit
-
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
04 Sep, 2017
1 commit
-
Conflicts:
mm/page_alloc.cSigned-off-by: Ingo Molnar
02 Sep, 2017
2 commits
-
Three cases of simple overlapping changes.
Signed-off-by: David S. Miller
-
After commit dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
we preserve the secpath for the whole skb lifecycle, but we also
end up leaking a reference to it.We must clear the head state on skb reception, if secpath is
present.Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
Signed-off-by: Yossi Kuperman
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
26 Aug, 2017
1 commit
-
Currently, in the udp6 code, the dst cookie is not initialized/updated
concurrently with the RX dst used by early demux.As a result, the dst_check() in the early_demux path always fails,
the rx dst cache is always invalidated, and we can't really
leverage significant gain from the demux lookup.Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
to set the dst cookie when the dst entry is really changed.The issue is there since the introduction of early demux for ipv6.
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Acked-by: Hannes Frederic Sowa
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
25 Aug, 2017
1 commit
-
Signed-off-by: Ingo Molnar
23 Aug, 2017
1 commit
-
Remove two references to ufo in the udp send path that are no longer
reachable now that ufo has been removed.Commit 85f1bd9a7b5a ("udp: consistently apply ufo or fragmentation")
is a fix to ufo. It is safe to revert what remains of it.Also, no skb can enter ip_append_page with skb_is_gso true now that
skb_shinfo(skb)->gso_type is no longer set in ip_append_page/_data.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
22 Aug, 2017
1 commit
19 Aug, 2017
1 commit
-
Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove
headers from UDP packets before queueing"), when udp packets are being
peeked the requested extra offset is always 0 as there is no need to skip
the udp header. However, when the offset is 0 and the next skb is
of length 0, it is only returned once. The behaviour can be seen with
the following python script:from socket import *;
f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
f.bind(('::', 0));
addr=('::1', f.getsockname()[1]);
g.sendto(b'', addr)
g.sendto(b'b', addr)
print(f.recvfrom(10, MSG_PEEK));
print(f.recvfrom(10, MSG_PEEK));Where the expected output should be the empty string twice.
Instead, make sk_peek_offset return negative values, and pass those values
to __skb_try_recv_datagram/__skb_try_recv_from_queue. If the passed offset
to __skb_try_recv_from_queue is negative, the checked skb is never skipped.
__skb_try_recv_from_queue will then ensure the offset is reset back to 0
if a peek is requested without an offset, unless no packets are found.Also simplify the if condition in __skb_try_recv_from_queue. If _off is
greater then 0, and off is greater then or equal to skb->len, then
(_off || skb->len) must always be true assuming skb->len >= 0 is always
true.Also remove a redundant check around a call to sk_peek_offset in af_unix.c,
as it double checked if MSG_PEEK was set in the flags.V2:
- Moved the negative fixup into __skb_try_recv_from_queue, and remove now
redundant checks
- Fix peeking in udp{,v6}_recvmsg to report the right value when the
offset is 0V3:
- Marked new branch in __skb_try_recv_from_queue as unlikely.Signed-off-by: Matthew Dawson
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
11 Aug, 2017
3 commits
-
Conflicts:
include/linux/mm_types.h
mm/huge_memory.cI removed the smp_mb__before_spinlock() like the following commit does:
8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
and fixed up the affected commits.
Signed-off-by: Ingo Molnar
-
Mainline had UFO fixes, but UFO is removed in net-next so we
take the HEAD hunks.Minor context conflict in bcmsysport statistics bug fix.
Signed-off-by: David S. Miller
-
When iteratively building a UDP datagram with MSG_MORE and that
datagram exceeds MTU, consistently choose UFO or fragmentation.Once skb_is_gso, always apply ufo. Conversely, once a datagram is
split across multiple skbs, do not consider ufo.Sendpage already maintains the first invariant, only add the second.
IPv6 does not have a sendpage implementation to modify.A gso skb must have a partial checksum, do not follow sk_no_check_tx
in udp_send_skb.Found by syzkaller.
Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Reported-by: Andrey Konovalov
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
10 Aug, 2017
1 commit
-
Any use of key->enabled (that is static_key_enabled and static_key_count)
outside jump_label_lock should handle its own serialization. The only
two that are not doing so are the UDP encapsulation static keys. Change
them to use static_key_enable, which now correctly tests key->enabled under
the jump label lock.Signed-off-by: Paolo Bonzini
Signed-off-by: Peter Zijlstra (Intel)
Cc: Eric Dumazet
Cc: Jason Baron
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1501601046-35683-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar
08 Aug, 2017
3 commits
-
Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Add a second device index, sdif, to inet socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after the cb move
in tcp_v4_rcv the tcp_v4_sdif helper is used.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Add a second device index, sdif, to udp socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.Early demux lookups are handled in the next patch as part of INET_MATCH
changes.Signed-off-by: David Ahern
Signed-off-by: David S. Miller
07 Aug, 2017
1 commit
-
__ip_options_echo() does not need anymore skb->dst, so we can
avoid explicitly preserving it for its own sake.This is almost a revert of commit 0ddf3fb2c43d ("udp: preserve
skb->dst if required for IP options processing") plus some
lifting to fit later changes.Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
30 Jul, 2017
1 commit
-
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".The newly added code is derived from the current ipv4 code for the
similar path.v1 -> v2:
fixed the __udp6_lib_rcv() return code for resubmission,
as suggested by EricReported-by: Sam Edwards
Reported-by: Marc Haber
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: Paolo Abeni
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Jul, 2017
1 commit
-
We must use pre-processor conditional block or suitable accessors to
manipulate skb->sp elsewhere builds lacking the CONFIG_XFRM will break.Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
26 Jul, 2017
1 commit
-
Paul Moore reported a SELinux/IP_PASSSEC regression
caused by missing skb->sp at recvmsg() time. We need to
preserve the skb head state to process the IP_CMSG_PASSSEC
cmsg.With this commit we avoid releasing the skb head state in the
BH even if a secpath is attached to the current skb, and stores
the skb status (with/without head states) in the scratch area,
so that we can access it at skb deallocation time, without
incurring in cache-miss penalties.This also avoids misusing the skb CB for ipv6 packets,
as introduced by the commit 0ddf3fb2c43d ("udp: preserve
skb->dst if required for IP options processing").Clean a bit the scratch area helpers implementation, to
reduce the code differences between 32 and 64 bits build.Reported-by: Paul Moore
Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue")
Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
Signed-off-by: Paolo Abeni
Tested-by: Paul Moore
Signed-off-by: David S. Miller
19 Jul, 2017
1 commit
-
Eric noticed that in udp_recvmsg() we still need to access
skb->dst while processing the IP options.
Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
skb->dst is no more available at recvmsg() time and bad things
will happen if we enter the relevant code path.This commit address the issue, avoid clearing skb->dst if
any IP options are present into the relevant skb.
Since the IP CB is contained in the first skb cacheline, we can
test it to decide to leverage the consume_stateless_skb()
optimization, without measurable additional cost in the faster
path.v1 -> v2: updated commit message tags
Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue")
Reported-by: Andrey Konovalov
Reported-by: Eric Dumazet
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
01 Jul, 2017
1 commit
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller
28 Jun, 2017
1 commit
-
So that they can be later used by the IPv6 code, too.
Also lift the comments a bit.Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
23 Jun, 2017
1 commit
-
Michael reported an UDP breakage caused by the commit b65ac44674dd
("udp: try to avoid 2 cache miss on dequeue").
The function __first_packet_length() can update the checksum bits
of the pending skb, making the scratched area out-of-sync, and
setting skb->csum, if the skb was previously in need of checksum
validation.On later recvmsg() for such skb, checksum validation will be
invoked again - due to the wrong udp_skb_csum_unnecessary()
value - and will fail, causing the valid skb to be dropped.This change addresses the issue refreshing the scratch area in
__first_packet_length() after the possible checksum update.Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Reported-by: Michael Ellerman
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
21 Jun, 2017
1 commit
-
On UDP packets processing, if the BH is the bottle-neck, it
always sees a cache miss while updating rmem_alloc; try to
avoid it prefetching the value as soon as we have the socket
available.Performances under flood with multiple NIC rx queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~10% performance improvement.Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
18 Jun, 2017
1 commit
-
In udp_v4/6_early_demux() code, we try to hold dst->__refcnt for
dst with DST_NOCACHE flag. This is because later in udp_sk_rx_dst_set()
function, we will try to cache this dst in sk for connected case.
However, a better way to achieve this is to not try to hold dst in
early_demux(), but in udp_sk_rx_dst_set(), call dst_hold_safe(). This
approach is also more consistant with how tcp is handling it. And it
will make later changes simpler.Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller
12 Jun, 2017
2 commits
-
when udp_recvmsg() is executed, on x86_64 and other archs, most skb
fields are on cold cachelines.
If the skb are linear and the kernel don't need to compute the udp
csum, only a handful of skb fields are required by udp_recvmsg().
Since we already use skb->dev_scratch to cache hot data, and
there are 32 bits unused on 64 bit archs, use such field to cache
as much data as we can, and try to prefetch on dequeue the relevant
fields that are left out.This can save up to 2 cache miss per packet.
v1 -> v2:
- changed udp_dev_scratch fields types to u{32,16} variant,
replaced bitfiled with boolSigned-off-by: Paolo Abeni
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
Since UDP no more uses sk->destructor, we can clear completely
the skb head state before enqueuing. Amend and use
skb_release_head_state() for that.All head states share a single cacheline, which is not
normally used/accesses on dequeue. We can avoid entirely accessing
such cacheline implementing and using in the UDP code a specialized
skb free helper which ignores the skb head state.This saves a cacheline miss at skb deallocation time.
v1 -> v2:
replaced secpath_reset() with skb_release_head_state()Signed-off-by: Paolo Abeni
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller