24 Jul, 2016
1 commit
-
Just several instances of overlapping changes.
Signed-off-by: David S. Miller
22 Jul, 2016
1 commit
-
sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Signed-off-by: David S. Miller
20 Jul, 2016
1 commit
-
This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Reported-by: Kazuya Mizuguchi
Reported-by: Keita Kobayashi
Signed-off-by: Yoshihiro Shimoda
Acked-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
07 Jul, 2016
1 commit
-
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/usb/r8152.cAll three conflicts were overlapping changes.
Signed-off-by: David S. Miller
02 Jul, 2016
2 commits
-
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.Reported-by: Eric Leblond
Tested-by: Eric Leblond
Signed-off-by: David S. Miller -
Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
11 Jun, 2016
1 commit
-
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_from_skbSigned-off-by: Mike Rapoport
Signed-off-by: David S. Miller
10 Jun, 2016
1 commit
-
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument
if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains
a pointer into user memory. If userland is 32-bit and kernel is 64-bit
the two disagree about the layout of struct sock_fprog.Add compat setsockopt support to convert a 32-bit compat_sock_fprog to
a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for
SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat
case for setsockopt SO_ATTACH_REUSEPORT_CBPF").Reported-by: Daniel Borkmann
Signed-off-by: Willem de Bruijn
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
24 Apr, 2016
1 commit
-
Conflicts were two cases of simple overlapping changes,
nothing serious.In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.Signed-off-by: David S. Miller
15 Apr, 2016
1 commit
-
consume_skb() isn't for error cases that kfree_skb() is more proper
one. At this patch, it fixed tpacket_rcv() and packet_rcv() to be
consistent for error or non-error cases letting perf trace its event
properly.Signed-off-by: Weongyo Jeong
Signed-off-by: David S. Miller
14 Apr, 2016
1 commit
-
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].
Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause
Cc: Eric W. Biederman
Cc: Pavel Emelyanov
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller
10 Apr, 2016
1 commit
07 Apr, 2016
1 commit
-
Trinity and other fuzzers can hit this WARN on far too easily,
resulting in a tainted kernel that hinders automated fuzzing.Replace it with a rate-limited printk.
Signed-off-by: Dave Jones
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
05 Apr, 2016
1 commit
-
Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.Signed-off-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
10 Mar, 2016
1 commit
-
Replace link layer header validation check ll_header_truncate with
more generic dev_validate_header.Validation based on hard_header_len incorrectly drops valid packets
in variable length protocols, such as AX25. dev_validate_header
calls header_ops.validate for such protocols to ensure correctness
below hard_header_len.See also http://comments.gmane.org/gmane.linux.network/401064
Fixes 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
26 Feb, 2016
1 commit
-
Signed-off-by: David Decotigny
Signed-off-by: David S. Miller
09 Feb, 2016
4 commits
-
Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.
When enabled, a struct virtio_net_hdr is expected to precede the data
in the ring. The vnet option must be set before the ring is created.The implementation reuses the existing skb_copy_bits code that is used
when dev->hard_header_len is non-zero. Move this ll_header check to
before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
Allocate and copy the max of the two.Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.cSigned-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
GSO packet headers must be stored in the linear skb segment.
Move tpacket header parsing before sock_alloc_send_skb. The GSO
follow-on patch will later increase the skb linear argument to
sock_alloc_send_skb if needed for large packets.The header parsing code does not require an allocated skb, so is
safe to move. Later pass to tpacket_fill_skb the computed data
start and length.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
Support socket option PACKET_VNET_HDR together with PACKET_RX_RING.
When enabled, a struct virtio_net_hdr will precede the data in the
packet ring slots.Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.cpkt: 1454269209.798420 len=5066
vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off
csum: start=34 off=16
eth: proto=0x800
ip: src= dst= proto=6 len=5052Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
packet_snd and packet_rcv support virtio net headers for GSO.
Move this logic into helper functions to be able to reuse it in
tpacket_snd and tpacket_rcv.This is a straighforward code move with one exception. Instead of
creating and passing a separate gso_type variable, reuse
vnet_hdr.gso_type after conversion from virtio to kernel gso type.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
30 Nov, 2015
1 commit
-
Commit 9c7077622dd91 ("packet: make packet_snd fail on len smaller
than l2 header") added validation for the packet size in packet_snd.
This change enforces that every packet needs a header (with at least
hard_header_len bytes) plus a payload with at least one byte. Before
this change the payload was optional.This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24
Signed-off-by: David S. Miller
18 Nov, 2015
2 commits
-
Use PAGE_ALIGNED(...) instead of open-coding it.
Signed-off-by: Tobias Klauser
Signed-off-by: David S. Miller -
rb->frames_per_block is an unsigned int, thus can never be negative.
Also fix spacing in the calculation of frames_per_block.
Signed-off-by: Tobias Klauser
Signed-off-by: David S. Miller
16 Nov, 2015
5 commits
-
Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller -
In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, , 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().Reported-by: Eric Dumazet
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller -
Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller -
We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann
Acked-by: Jason Wang
Signed-off-by: David S. Miller -
In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport
header before doing xmit") and later on 40893fd0fd4e ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.Fixes: c1aad275b029 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet
Signed-off-by: Daniel Borkmann
Acked-by: Jason Wang
Signed-off-by: David S. Miller
06 Nov, 2015
1 commit
-
There is a race conditions between packet_notifier and packet_bind{_spkt}.
It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
vlanId=int(sys.argv[1])
dev='vlan%d' % vlanIdos.system('taskset -p 0x10 %d' % os.getpid())
s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
(realDev, dev, vlanId))
os.system('ip netns add dummy')pid=os.fork()
if pid == 0:
# dev should be moved while packet_do_bind is in synchronize net
os.system('taskset -p 0x20000 %d' % os.getpid())
os.system('ip link set %s netns dummy' % dev)
os.system('ip netns exec dummy ip link del %s' % dev)
s.close()
sys.exit(0)time.sleep(.004)
try:
s.bind(('%s' % dev, proto+1))
except:
print 'Could not bind socket'
s.close()
os.system('ip netns del dummy')
sys.exit(0)os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)Signed-off-by: Francesco Ruggeri
Signed-off-by: David S. Miller
13 Oct, 2015
3 commits
-
The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.Signed-off-by: "Eric W. Biederman"
Acked-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller -
Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_privBut SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".We can read non existent memory and crash.
Fixes: c0de08d04215 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Cc: Eric Leblond
Signed-off-by: David S. Miller -
Signed-off-by: Edward Hyunkoo Jee
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Signed-off-by: David S. Miller
11 Oct, 2015
1 commit
-
eBPF socket filter programs may see junk in 'u32 cb[5]' area,
since it could have been used by protocol layers earlier.For socket filter programs used in af_packet we need to clean
20 bytes of skb->cb area if it could be used by the program.
For programs attached to TCP/UDP sockets we need to save/restore
these 20 bytes, since it's used by protocol layers.Remove SK_RUN_FILTER macro, since it's no longer used.
Long term we may move this bpf cb area to per-cpu scratch, but that
requires addition of new 'per-cpu load/store' instructions,
so not suitable as a short term fix.Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Eric Dumazet
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller
05 Oct, 2015
1 commit
-
The current ongoing effort to dump existing cBPF seccomp filters back
to user space requires to hold the pre-transformed instructions like
we do in case of socket filters from sk_attach_filter() side, so they
can be reloaded in original form at a later point in time by utilities
such as criu.To prepare for this, simply extend the bpf_prog_create_from_user()
API to hold a flag that tells whether we should store the original
or not. Also, fanout filters could make use of that in future for
things like diag. While fanout filters already use bpf_prog_destroy(),
move seccomp over to them as well to handle original programs when
present.Signed-off-by: Daniel Borkmann
Cc: Tycho Andersen
Cc: Pavel Emelyanov
Cc: Kees Cook
Cc: Andy Lutomirski
Cc: Alexei Starovoitov
Tested-by: Tycho Andersen
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
24 Sep, 2015
1 commit
-
Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
accessors") accidentally changed the virtio_net header used by
AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.Since virtio_legacy_is_little_endian() is a very long identifier,
define a vio_le macro and use that throughout the code instead of the
hard-coded 'false' for little-endian.This restores the ABI to match 4.1 and earlier kernels, and makes my
test program work again.Signed-off-by: David Woodhouse
Signed-off-by: David S. Miller
18 Aug, 2015
2 commits
-
Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().Signed-off-by: Willem de Bruijn
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller -
Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
to select a socket.This avoids having to keep adding special case fanout modes. One
example use case is application layer load balancing. The QUIC
protocol, for instance, encodes a connection ID in UDP payload.Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
only user so far.Signed-off-by: Willem de Bruijn
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Aug, 2015
1 commit
-
Conflicts:
arch/s390/net/bpf_jit_comp.c
drivers/net/ethernet/ti/netcp_ethss.c
net/bridge/br_multicast.c
net/ipv4/ip_fragment.cAll four conflicts were cases of simple overlapping
changes.Signed-off-by: David S. Miller
30 Jul, 2015
1 commit
-
Follow e8e85cc5eb57 ("packet: remove handling of tx_ring") and remove
the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only
called with tx_ring = 0.Signed-off-by: Tobias Klauser
Signed-off-by: David S. Miller
29 Jul, 2015
1 commit
-
tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:tp_len > dev->mtu + dev->hard_header_len
as dev->mtu and dev->hard_header_len are both unsigned.
That may lead to just returning an incorrect EMSGSIZE errno
to the user.Fixes: 52f1454f629fa ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Alexander Drozdov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller