Eric Lee / smarc-fsl-linux-kernel

18 Feb, 2017

1 commit

828495418 packet: round up linear to header len ... Browse Code »

[ Upstream commit 57031eb794906eea4e1c7b31dc1e2429c0af0c66 ]

Link layer protocols may unconditionally pull headers, as Ethernet
does in eth_type_trans. Ensure that the entire link layer header
always lies in the skb linear segment. tpacket_snd has such a check.
Extend this to packet_snd.

Variable length link layer headers complicate the computation
somewhat. Here skb->len may be smaller than dev->hard_header_len.

Round up the linear length to be at least as long as the smallest of
the two.

Reported-by: Dmitry Vyukov
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2017-02-18 22:11:43 +0800

04 Feb, 2017

1 commit

1e7cbb413 virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving ... Browse Code »

[ Upstream commit 6391a4481ba0796805d6581e42f9f0418c099e34 ]

Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on
xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too,
fixing this by adding a hint (has_data_valid) and set it only on the
receiving path.

Cc: Rolf Neugebauer
Signed-off-by: Jason Wang
Acked-by: Rolf Neugebauer
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Jason Wang
2017-02-04 16:47:09 +0800

03 Dec, 2016

1 commit

84ac72602 packet: fix race condition in packet_set_ring ... Browse Code »

When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.

This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.

The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Philip Pettersson
2016-12-03 01:16:49 +0800

30 Oct, 2016

1 commit

104ba78c9 packet: on direct_xmit, limit tso and csum to supported devices ... Browse Code »

When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.

Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.

Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.

Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.

When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.

Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c

ethtool -K eth0 tso off tx on
psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N

ethtool -K eth0 tx off
psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N

v2:
- add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)

Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Willem de Bruijn
2016-10-30 03:02:15 +0800

07 Oct, 2016

1 commit

666449828 packet: call fanout_release, while UNREGISTERING a netdev ... Browse Code »

If a socket has FANOUT sockopt set, a new proto_hook is registered
as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
registered as part of fanout_add is not removed. Call fanout_release, on a
NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
fanout_list.

This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()

Signed-off-by: Anoob Soman
Signed-off-by: David S. Miller

Anoob Soman
2016-10-07 08:50:18 +0800

24 Jul, 2016

1 commit

de0ba9a0d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Just several instances of overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-07-24 12:53:32 +0800

22 Jul, 2016

1 commit

f8e7718cc packet: propagate sock_cmsg_send() error ... Browse Code »

sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Signed-off-by: David S. Miller

Soheil Hassas Yeganeh
2016-07-22 13:41:48 +0800

20 Jul, 2016

1 commit

edbe77462 packet: fix second argument of sock_tx_timestamp() ... Browse Code »

This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Reported-by: Kazuya Mizuguchi
Reported-by: Keita Kobayashi
Signed-off-by: Yoshihiro Shimoda
Acked-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Yoshihiro Shimoda
2016-07-20 12:00:50 +0800

07 Jul, 2016

1 commit

30d0844bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-07-07 01:35:22 +0800

02 Jul, 2016

2 commits

eb70db875 packet: Use symmetric hash for PACKET_FANOUT_HASH. ... Browse Code »

People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.

The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.

But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.

Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.

Reported-by: Eric Leblond
Tested-by: Eric Leblond
Signed-off-by: David S. Miller

David S. Miller
2016-07-02 04:07:50 +0800
113214be7 bpf: refactor bpf_prog_get and type check into helper ... Browse Code »

Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-07-02 04:00:47 +0800

11 Jun, 2016

1 commit

1276f24ee packet: use common code for virtio_net_hdr and skb GSO conversion ... Browse Code »

Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_from_skb

Signed-off-by: Mike Rapoport
Signed-off-by: David S. Miller

Mike Rapoport
2016-06-11 14:03:56 +0800

10 Jun, 2016

1 commit

719c44d34 packet: compat support for sock_fprog ... Browse Code »

Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument
if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains
a pointer into user memory. If userland is 32-bit and kernel is 64-bit
the two disagree about the layout of struct sock_fprog.

Add compat setsockopt support to convert a 32-bit compat_sock_fprog to
a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for
SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat
case for setsockopt SO_ATTACH_REUSEPORT_CBPF").

Reported-by: Daniel Borkmann
Signed-off-by: Willem de Bruijn
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Willem de Bruijn
2016-06-10 14:41:03 +0800

24 Apr, 2016

1 commit

1602f49b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller

David S. Miller
2016-04-24 06:51:33 +0800

15 Apr, 2016

1 commit

da37845fd packet: uses kfree_skb() for errors. ... Browse Code »

consume_skb() isn't for error cases that kfree_skb() is more proper
one. At this patch, it fixed tpacket_rcv() and packet_rcv() to be
consistent for error or non-error cases letting perf trace its event
properly.

Signed-off-by: Weongyo Jeong
Signed-off-by: David S. Miller

Weongyo Jeong
2016-04-15 05:50:44 +0800

14 Apr, 2016

1 commit

309cf37fe packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface ... Browse Code »

Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.

Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].

Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause
Cc: Eric W. Biederman
Cc: Pavel Emelyanov
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Mathias Krause
2016-04-14 12:46:39 +0800

10 Apr, 2016

1 commit

ae95d7126 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2016-04-10 05:41:41 +0800

07 Apr, 2016

1 commit

6ae81ced3 af_packet: tone down the Tx-ring unsupported spew. ... Browse Code »

Trinity and other fuzzers can hit this WARN on far too easily,
resulting in a tainted kernel that hinders automated fuzzing.

Replace it with a rate-limited printk.

Signed-off-by: Dave Jones
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Dave Jones
2016-04-07 04:05:20 +0800

05 Apr, 2016

1 commit

c14ac9451 sock: enable timestamping using control messages ... Browse Code »

Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.

Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.

Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.

Signed-off-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Soheil Hassas Yeganeh
2016-04-05 03:50:30 +0800

10 Mar, 2016

1 commit

9ed988cd5 packet: validate variable length ll headers ... Browse Code »

Replace link layer header validation check ll_header_truncate with
more generic dev_validate_header.

Validation based on hard_header_len incorrectly drops valid packets
in variable length protocols, such as AX25. dev_validate_header
calls header_ops.validate for such protocols to ensure correctness
below hard_header_len.

See also http://comments.gmane.org/gmane.linux.network/401064

Fixes 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2016-03-10 11:13:01 +0800

26 Feb, 2016

1 commit

7cad1bac9 net: core: use __ethtool_get_ksettings ... Browse Code »

Signed-off-by: David Decotigny
Signed-off-by: David S. Miller

David Decotigny
2016-02-26 11:06:47 +0800

09 Feb, 2016

4 commits

1d036d25e packet: tpacket_snd gso and checksum offload ... Browse Code »

Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.

When enabled, a struct virtio_net_hdr is expected to precede the data
in the ring. The vnet option must be set before the ring is created.

The implementation reuses the existing skb_copy_bits code that is used
when dev->hard_header_len is non-zero. Move this ll_header check to
before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
Allocate and copy the max of the two.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2016-02-09 19:43:50 +0800
8d39b4a6b packet: parse tpacket header before skb alloc ... Browse Code »

GSO packet headers must be stored in the linear skb segment.
Move tpacket header parsing before sock_alloc_send_skb. The GSO
follow-on patch will later increase the skb linear argument to
sock_alloc_send_skb if needed for large packets.

The header parsing code does not require an allocated skb, so is
safe to move. Later pass to tpacket_fill_skb the computed data
start and length.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2016-02-09 19:43:50 +0800
58d19b19c packet: vnet_hdr support for tpacket_rcv ... Browse Code »

Support socket option PACKET_VNET_HDR together with PACKET_RX_RING.
When enabled, a struct virtio_net_hdr will precede the data in the
packet ring slots.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c

pkt: 1454269209.798420 len=5066
vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off
csum: start=34 off=16
eth: proto=0x800
ip: src= dst= proto=6 len=5052

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2016-02-09 19:43:50 +0800
16cc14004 packet: move vnet_hdr code to helper functions ... Browse Code »

packet_snd and packet_rcv support virtio net headers for GSO.
Move this logic into helper functions to be able to reuse it in
tpacket_snd and tpacket_rcv.

This is a straighforward code move with one exception. Instead of
creating and passing a separate gso_type variable, reuse
vnet_hdr.gso_type after conversion from virtio to kernel gso type.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2016-02-09 19:43:50 +0800

30 Nov, 2015

1 commit

880621c26 packet: Allow packets with only a header (but no payload) ... Browse Code »

Commit 9c7077622dd91 ("packet: make packet_snd fail on len smaller
than l2 header") added validation for the packet size in packet_snd.
This change enforces that every packet needs a header (with at least
hard_header_len bytes) plus a payload with at least one byte. Before
this change the payload was optional.

This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24
Signed-off-by: David S. Miller

Martin Blumenstingl
2015-11-30 11:17:17 +0800

18 Nov, 2015

2 commits

90836b67e packet: Use PAGE_ALIGNED macro ... Browse Code »

Use PAGE_ALIGNED(...) instead of open-coding it.

Signed-off-by: Tobias Klauser
Signed-off-by: David S. Miller

Tobias Klauser
2015-11-18 04:25:44 +0800
4194b4914 packet: Don't check frames_per_block against negative values ... Browse Code »

rb->frames_per_block is an unsigned int, thus can never be negative.

Also fix spacing in the calculation of frames_per_block.

Signed-off-by: Tobias Klauser
Signed-off-by: David S. Miller

Tobias Klauser
2015-11-18 04:25:44 +0800

16 Nov, 2015

5 commits

5cfb4c8d0 packet: fix tpacket_snd max frame len ... Browse Code »

Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).

I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Daniel Borkmann
2015-11-16 07:00:35 +0800
c72219b75 packet: infer protocol from ethernet header if unset ... Browse Code »

In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).

Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, , 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.

That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.

Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().

Reported-by: Eric Dumazet
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Daniel Borkmann
2015-11-16 07:00:35 +0800
3c70c1324 packet: only allow extra vlan len on ethernet devices ... Browse Code »

Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.

This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).

Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Daniel Borkmann
2015-11-16 07:00:35 +0800
8fd6c80d9 packet: always probe for transport header ... Browse Code »

We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.

While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.

Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann
Acked-by: Jason Wang
Signed-off-by: David S. Miller

Daniel Borkmann
2015-11-16 07:00:35 +0800
efdfa2f78 packet: do skb_probe_transport_header when we actually have data ... Browse Code »

In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport
header before doing xmit") and later on 40893fd0fd4e ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.

Fixes: c1aad275b029 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet
Signed-off-by: Daniel Borkmann
Acked-by: Jason Wang
Signed-off-by: David S. Miller

Daniel Borkmann
2015-11-16 07:00:35 +0800

06 Nov, 2015

1 commit

30f7ea1c2 packet: race condition in packet_bind ... Browse Code »

There is a race conditions between packet_notifier and packet_bind{_spkt}.

It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.

This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.

import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId

os.system('taskset -p 0x10 %d' % os.getpid())

s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
(realDev, dev, vlanId))
os.system('ip netns add dummy')

pid=os.fork()

if pid == 0:
# dev should be moved while packet_do_bind is in synchronize net
os.system('taskset -p 0x20000 %d' % os.getpid())
os.system('ip link set %s netns dummy' % dev)
os.system('ip netns exec dummy ip link del %s' % dev)
s.close()
sys.exit(0)

time.sleep(.004)
try:
s.bind(('%s' % dev, proto+1))
except:
print 'Could not bind socket'
s.close()
os.system('ip netns del dummy')
sys.exit(0)

os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)

Signed-off-by: Francesco Ruggeri
Signed-off-by: David S. Miller

Francesco Ruggeri
2015-11-06 03:48:42 +0800

13 Oct, 2015

3 commits

19bcf9f20 ipv4: Pass struct net into ip_defrag and ip_check_defrag ... Browse Code »

The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.

So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.

Signed-off-by: "Eric W. Biederman"
Acked-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-13 10:44:16 +0800
161642e24 packet: fix match_fanout_group() ... Browse Code »

Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv

But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".

We can read non existent memory and crash.

Fixes: c0de08d04215 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Cc: Eric Leblond
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-13 10:42:38 +0800
c7d39e326 packet: support per-packet fwmark for af_packet sendmsg ... Browse Code »

Signed-off-by: Edward Hyunkoo Jee
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Signed-off-by: David S. Miller

Edward Jee
2015-10-13 10:25:22 +0800

11 Oct, 2015

1 commit

ff936a04e bpf: fix cb access in socket filter programs ... Browse Code »

eBPF socket filter programs may see junk in 'u32 cb[5]' area,
since it could have been used by protocol layers earlier.

For socket filter programs used in af_packet we need to clean
20 bytes of skb->cb area if it could be used by the program.
For programs attached to TCP/UDP sockets we need to save/restore
these 20 bytes, since it's used by protocol layers.

Remove SK_RUN_FILTER macro, since it's no longer used.

Long term we may move this bpf cb area to per-cpu scratch, but that
requires addition of new 'per-cpu load/store' instructions,
so not suitable as a short term fix.

Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Eric Dumazet
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2015-10-11 19:40:05 +0800

05 Oct, 2015

1 commit

bab189918 bpf, seccomp: prepare for upcoming criu support ... Browse Code »

The current ongoing effort to dump existing cBPF seccomp filters back
to user space requires to hold the pre-transformed instructions like
we do in case of socket filters from sk_attach_filter() side, so they
can be reloaded in original form at a later point in time by utilities
such as criu.

To prepare for this, simply extend the bpf_prog_create_from_user()
API to hold a flag that tells whether we should store the original
or not. Also, fanout filters could make use of that in future for
things like diag. While fanout filters already use bpf_prog_destroy(),
move seccomp over to them as well to handle original programs when
present.

Signed-off-by: Daniel Borkmann
Cc: Tycho Andersen
Cc: Pavel Emelyanov
Cc: Kees Cook
Cc: Andy Lutomirski
Cc: Alexei Starovoitov
Tested-by: Tycho Andersen
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-10-05 21:47:05 +0800

24 Sep, 2015

1 commit

d3869efe7 Fix AF_PACKET ABI breakage in 4.2 ... Browse Code »

Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
accessors") accidentally changed the virtio_net header used by
AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.

Since virtio_legacy_is_little_endian() is a very long identifier,
define a vio_le macro and use that throughout the code instead of the
hard-coded 'false' for little-endian.

This restores the ABI to match 4.1 and earlier kernels, and makes my
test program work again.

Signed-off-by: David Woodhouse
Signed-off-by: David S. Miller

David Woodhouse
2015-09-24 05:33:55 +0800