Eric Lee / smarc-fsl-linux-kernel

26 Dec, 2016

2 commits

1f3a8e49d ktime: Get rid of ktime_equal() ... Browse Code »

No point in going through loops and hoops instead of just comparing the
values.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra

Thomas Gleixner
2016-12-26 00:21:23 +0800
2456e8553 ktime: Get rid of the union ... Browse Code »

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra

Thomas Gleixner
2016-12-26 00:21:22 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

24 Dec, 2016

2 commits

a98f91758 ipv6: handle -EFAULT from skb_copy_bits ... Browse Code »

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
[] ? unmap_page_range+0x693/0x830
[] inet_sendmsg+0x67/0xa0
[] sock_sendmsg+0x38/0x50
[] SYSC_sendto+0xef/0x170
[] SyS_sendto+0xe/0x10
[] do_syscall_64+0x50/0xa0
[] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include
#include
#include
#include
#include
#include
#include

#define LEN 504

int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];

memset(buf, 0, LEN);

fd = socket(AF_INET6, SOCK_RAW, 7);

setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones
Signed-off-by: David S. Miller

Dave Jones
2016-12-24 01:20:39 +0800
39b2dd765 inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets ... Browse Code »

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2016-12-24 01:19:18 +0800

18 Dec, 2016

2 commits

c2ed1880f net: ipv6: check route protocol when deleting routes ... Browse Code »

The protocol field is checked when deleting IPv4 routes, but ignored for
IPv6, which causes problems with routing daemons accidentally deleting
externally set routes (observed by multiple bird6 users).

This can be verified using `ip -6 route del proto something`.

Signed-off-by: Mantas Mikulėnas
Signed-off-by: David S. Miller

Mantas M
2016-12-18 10:37:06 +0800
0643ee4fd inet: Fix get port to handle zero port number with soreuseport set ... Browse Code »

A user may call listen with binding an explicit port with the intent
that the kernel will assign an available port to the socket. In this
case inet_csk_get_port does a port scan. For such sockets, the user may
also set soreuseport with the intent a creating more sockets for the
port that is selected. The problem is that the initial socket being
opened could inadvertently choose an existing and unreleated port
number that was already created with soreuseport.

This patch adds a boolean parameter to inet_bind_conflict that indicates
rather soreuseport is allowed for the check (in addition to
sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
the argument is set to true if an explicit port is being looked up (snum
argument is nonzero), and is false if port scan is done.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2016-12-18 00:13:19 +0800

08 Dec, 2016

1 commit

5fccd64aa Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains a large Netfilter update for net-next,
to summarise:

1) Add support for stateful objects. This series provides a nf_tables
native alternative to the extended accounting infrastructure for
nf_tables. Two initial stateful objects are supported: counters and
quotas. Objects are identified by a user-defined name, you can fetch
and reset them anytime. You can also use a maps to allow fast lookups
using any arbitrary key combination. More info at:

http://marc.info/?l=netfilter-devel&m=148029128323837&w=2

2) On-demand registration of nf_conntrack and defrag hooks per netns.
Register nf_conntrack hooks if we have a stateful ruleset, ie.
state-based filtering or NAT. The new nf_conntrack_default_on sysctl
enables this from newly created netnamespaces. Default behaviour is not
modified. Patches from Florian Westphal.

3) Allocate 4k chunks and then use these for x_tables counter allocation
requests, this improves ruleset load time and also datapath ruleset
evaluation, patches from Florian Westphal.

4) Add support for ebpf to the existing x_tables bpf extension.
From Willem de Bruijn.

5) Update layer 4 checksum if any of the pseudoheader fields is updated.
This provides a limited form of 1:1 stateless NAT that make sense in
specific scenario, eg. load balancing.

6) Add support to flush sets in nf_tables. This series comes with a new
set->ops->deactivate_one() indirection given that we have to walk
over the list of set elements, then deactivate them one by one.
The existing set->ops->deactivate() performs an element lookup that
we don't need.

7) Two patches to avoid cloning packets, thus speed up packet forwarding
via nft_fwd from ingress. From Florian Westphal.

8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
prevent infinite loops, patch from Dwip Banerjee. And one minor
refactoring from Gao feng.

9) Revisit recent log support for nf_tables netdev families: One patch
to ensure that we correctly handle non-ethernet packets. Another
patch to add missing logger definition for netdev. Patches from
Liping Zhang.

10) Three patches for nft_fib, one to address insufficient register
initialization and another to solve incorrect (although harmless)
byteswap operation. Moreover update xt_rpfilter and nft_fib to match
lbcast packets with zeronet as source, eg. DHCP Discover packets
(0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.

11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
been broken in many-cast mode for some little time, let's give them a
chance by placing them at the same level as other existing protocols.
Thus, users don't explicitly have to modprobe support for this and
NAT rules work for them. Some people point to the lack of support in
SOHO Linux-based routers that make deployment of new protocols harder.
I guess other middleboxes outthere on the Internet are also to blame.
Anyway, let's see if this has any impact in the midrun.

12) Skip software SCTP software checksum calculation if the NIC comes
with SCTP checksum offload support. From Davide Caratti.

13) Initial core factoring to prepare conversion to hook array. Three
patches from Aaron Conole.

14) Gao Feng made a wrong conversion to switch in the xt_multiport
extension in a patch coming in the previous batch. Fix it in this
batch.

15) Get vmalloc call in sync with kmalloc flags to avoid a warning
and likely OOM killer intervention from x_tables. From Marcelo
Ricardo Leitner.

16) Update Arturo Borrero's email address in all source code headers.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-12-08 08:16:46 +0800

07 Dec, 2016

5 commits

11583438b netfilter: nft_fib: convert htonl to ntohl properly ... Browse Code »

Acctually ntohl and htonl are identical, so this doesn't affect
anything, but it is conceptually wrong.

Signed-off-by: Liping Zhang
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-12-07 04:42:20 +0800
ae0ac0ed6 netfilter: x_tables: pack percpu counter allocations ... Browse Code »

instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.

This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.

As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.

Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-07 04:42:19 +0800
f28e15bac netfilter: x_tables: pass xt_counters struct to counter allocator ... Browse Code »

Keeps some noise away from a followup patch.

Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-07 04:42:18 +0800
4d31eef51 netfilter: x_tables: pass xt_counters struct instead of packet counter ... Browse Code »

On SMP we overload the packet counter (unsigned long) to contain
percpu offset. Hide this from callers and pass xt_counters address
instead.

Preparation patch to allocate the percpu counters in page-sized batch
chunks.

Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-07 04:42:17 +0800
834184b1f netfilter: defrag: only register defrag functionality if needed ... Browse Code »

nf_defrag modules for ipv4 and ipv6 export an empty stub function.
Any module that needs the defragmentation hooks registered simply 'calls'
this empty function to create a phony module dependency -- modprobe will
then load the defrag module too.

This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook
registration until the functionality is requested within a network namespace
instead of module load time for all namespaces.

Hooks are only un-registered on module unload or when a namespace that used
such defrag functionality exits.

We have to use struct net for this as the register hooks can be called
before netns initialization here from the ipv4/ipv6 conntrack module
init path.

There is no unregister functionality support, defrag will always be
active once it was requested inside a net namespace.

The reason is that defrag has impact on nft and iptables rulesets
(without defrag we might see framents).

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-07 04:42:00 +0800

06 Dec, 2016

2 commits

96d5822c1 ipv6: Allow IPv4-mapped address as next-hop ... Browse Code »

Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.

It is possible to configure IP interfaces with IPv4-mapped addresses, and
one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
to this fix the kernel returned an EINVAL when attempting to add an IPv6
route with an IPv4-mapped address as a nexthop/gateway.

RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
thus in order to support that type of address configuration the kernel
needs to allow IPv4-mapped addresses as nexthops.

Signed-off-by: Erik Nordmark
Signed-off-by: Bob Gilligan
Signed-off-by: David S. Miller

Erik Nordmark
2016-12-06 03:52:05 +0800
7aa5470c2 tcp: tsq: move tsq_flags close to sk_wmem_alloc ... Browse Code »

tsq_flags being in the same cache line than sk_wmem_alloc
makes a lot of sense. Both fields are changed from tcp_wfree()
and more generally by various TSQ related functions.

Prior patch made room in struct sock and added sk_tsq_flags,
this patch deletes tsq_flags from struct tcp_sock.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-12-06 02:32:24 +0800

05 Dec, 2016

7 commits

0c66dc1ea netfilter: conntrack: register hooks in netns when needed by ruleset ... Browse Code »

This makes use of nf_ct_netns_get/put added in previous patch.
We add get/put functions to nf_conntrack_l3proto structure, ipv4 and ipv6
then implement use-count to track how many users (nft or xtables modules)
have a dependency on ipv4 and/or ipv6 connection tracking functionality.

When count reaches zero, the hooks are unregistered.

This delays activation of connection tracking inside a namespace until
stateful firewall rule or nat rule gets added.

This patch breaks backwards compatibility in the sense that connection
tracking won't be active anymore when the protocol tracker module is
loaded. This breaks e.g. setups that ctnetlink for flow accounting and
the like, without any '-m conntrack' packet filter rules.

Followup patch restores old behavour and makes new delayed scheme
optional via sysctl.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-05 04:17:24 +0800
20afd4239 netfilter: nf_tables: add conntrack dependencies for nat/masq/redir expressions ... Browse Code »

so that conntrack core will add the needed hooks in this namespace.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-05 04:17:16 +0800
ecb2421b5 netfilter: add and use nf_ct_netns_get/put ... Browse Code »

currently aliased to try_module_get/_put.
Will be changed in next patch when we add functions to make use of ->net
argument to store usercount per l3proto tracker.

This is needed to avoid registering the conntrack hooks in all netns and
later only enable connection tracking in those that need conntrack.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-12-05 04:16:50 +0800
9b91c96c5 netfilter: conntrack: built-in support for UDPlite ... Browse Code »

CONFIG_NF_CT_PROTO_UDPLITE is no more a tristate. When set to y,
connection tracking support for UDPlite protocol is built-in into
nf_conntrack.ko.

footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_udplite,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko

(builtin)|| udplite| ipv4 | ipv6 |nf_conntrack
---------++--------+--------+--------+--------------
none || 432538 | 828755 | 828676 | 6141434
UDPlite || - | 829649 | 829362 | 6498204

Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso

Davide Caratti
2016-12-05 03:57:36 +0800
a85406afe netfilter: conntrack: built-in support for SCTP ... Browse Code »

CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection
tracking support for SCTP protocol is built-in into nf_conntrack.ko.

footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko

(builtin)|| sctp | ipv4 | ipv6 | nf_conntrack
---------++--------+--------+--------+--------------
none || 498243 | 828755 | 828676 | 6141434
SCTP || - | 829254 | 829175 | 6547872

Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso

Davide Caratti
2016-12-05 03:55:37 +0800
c51d39010 netfilter: conntrack: built-in support for DCCP ... Browse Code »

CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection
tracking support for DCCP protocol is built-in into nf_conntrack.ko.

footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko

(builtin)|| dccp | ipv4 | ipv6 | nf_conntrack
---------++--------+--------+--------+--------------
none || 469140 | 828755 | 828676 | 6141434
DCCP || - | 830566 | 829935 | 6533526

Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso

Davide Caratti
2016-12-05 03:53:15 +0800
cd7275146 netfilter: update Arturo Borrero Gonzalez email address ... Browse Code »

The email address has changed, let's update the copyright statements.

Signed-off-by: Arturo Borrero Gonzalez
Signed-off-by: Pablo Neira Ayuso

Arturo Borrero Gonzalez
2016-12-05 03:45:25 +0800

04 Dec, 2016

2 commits

adc176c54 ipv6 addrconf: Implemented enhanced DAD (RFC7527) ... Browse Code »

Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.

Signed-off-by: Erik Nordmark
Signed-off-by: Bob Gilligan
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Erik Nordmark
2016-12-04 12:21:37 +0800
2745529ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller

David S. Miller
2016-12-04 01:29:53 +0800

03 Dec, 2016

5 commits

610236587 bpf: Add new cgroup attach type to enable sock modifications ... Browse Code »

Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

David Ahern
2016-12-03 02:46:08 +0800
6b6ebb6b0 ip6_offload: check segs for NULL in ipv6_gso_segment. ... Browse Code »

segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:

[ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[ 97.819112] IP: [] ipv6_gso_segment+0x119/0x2f0
[ 97.825214] PGD 0 [ 97.827047]
[ 97.828540] Oops: 0000 [#1] SMP
[ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
[ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
[ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
[ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
[ 97.947720] RIP: 0010:[] [] ipv6_gso_segment+0x119/0x2f0
[ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207
[ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
[ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
[ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
[ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
[ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
[ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
[ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
[ 98.018149] Stack:
[ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
[ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
[ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
[ 98.042437] Call Trace:
[ 98.044879] [ 98.046803] [] ? tg3_start_xmit+0x84a/0xd60 [tg3]
[ 98.053156] [] skb_mac_gso_segment+0xb0/0x130
[ 98.059158] [] __skb_gso_segment+0x73/0x110
[ 98.064985] [] validate_xmit_skb+0x12d/0x2b0
[ 98.070899] [] validate_xmit_skb_list+0x42/0x70
[ 98.077073] [] sch_direct_xmit+0xd0/0x1b0
[ 98.082726] [] __dev_queue_xmit+0x486/0x690
[ 98.088554] [] ? cpumask_next_and+0x35/0x50
[ 98.094380] [] dev_queue_xmit+0x10/0x20
[ 98.099863] [] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
[ 98.106907] [] br_forward_finish+0x41/0xc0 [bridge]
[ 98.113430] [] ? nf_iterate+0x52/0x60
[ 98.118735] [] ? nf_hook_slow+0x6b/0xc0
[ 98.124216] [] __br_forward+0x14c/0x1e0 [bridge]
[ 98.130480] [] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[ 98.137785] [] br_forward+0x9d/0xb0 [bridge]
[ 98.143701] [] br_handle_frame_finish+0x267/0x560 [bridge]
[ 98.150834] [] br_handle_frame+0x174/0x2f0 [bridge]
[ 98.157355] [] ? sched_clock+0x9/0x10
[ 98.162662] [] ? sched_clock_cpu+0x72/0xa0
[ 98.168403] [] __netif_receive_skb_core+0x1e5/0xa20
[ 98.174926] [] ? timerqueue_add+0x59/0xb0
[ 98.180580] [] __netif_receive_skb+0x18/0x60
[ 98.186494] [] process_backlog+0x95/0x140
[ 98.192145] [] net_rx_action+0x16d/0x380
[ 98.197713] [] __do_softirq+0xd1/0x283
[ 98.203106] [] do_softirq_own_stack+0x1c/0x30
[ 98.209107] [ 98.211029] [] do_softirq+0x50/0x60
[ 98.216166] [] netif_rx_ni+0x33/0x80
[ 98.221386] [] tun_get_user+0x487/0x7f0 [tun]
[ 98.227388] [] tun_sendmsg+0x4b/0x60 [tun]
[ 98.233129] [] handle_tx+0x282/0x540 [vhost_net]
[ 98.239392] [] handle_tx_kick+0x15/0x20 [vhost_net]
[ 98.245916] [] vhost_worker+0x9e/0xf0 [vhost]
[ 98.251919] [] ? vhost_umem_alloc+0x40/0x40 [vhost]
[ 98.258440] [] ? do_syscall_64+0x67/0x180
[ 98.264094] [] kthread+0xd9/0xf0
[ 98.268965] [] ? kthread_park+0x60/0x60
[ 98.274444] [] ret_from_fork+0x25/0x30
[ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
[ 98.299425] RIP [] ipv6_gso_segment+0x119/0x2f0
[ 98.305612] RSP
[ 98.309094] CR2: 00000000000000cc
[ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---

Signed-off-by: Artem Savkov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Artem Savkov
2016-12-03 02:34:58 +0800
95a22caee tcp: randomize tcp timestamp offsets for each connection ... Browse Code »

jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.

commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).

So only two items are left:
- add a tsoffset for request sockets
- extend the tcp isn generator to also return another 32bit number
in addition to the ISN.

Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.

Includes fixes from Eric Dumazet.

Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Acked-by: Yuchung Cheng
Signed-off-by: David S. Miller

Florian Westphal
2016-12-03 01:49:59 +0800
80d1106ae Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" ... Browse Code »

This reverts commit ae148b085876fa771d9ef2c05f85d4b4bf09ce0d
("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").

skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
dst_output() is called. It is no longer necessary to do it for each tunnel.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper
Signed-off-by: David S. Miller

Eli Cooper
2016-12-03 01:34:22 +0800
b4e479a96 ipv6: Set skb->protocol properly for local output ... Browse Code »

When xfrm is applied to TSO/GSO packets, it follows this path:

xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper
Signed-off-by: David S. Miller

Eli Cooper
2016-12-03 01:34:22 +0800

02 Dec, 2016

2 commits

7bbf91ce2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
pull request (net): ipsec 2016-12-01

1) Change the error value when someone tries to run 32bit
userspace on a 64bit host from -ENOTSUPP to the userspace
exported -EOPNOTSUPP. Fix from Yi Zhao.

2) On inbound, ESN sequence numbers are already in network
byte order. So don't try to convert it again, this fixes
integrity verification for ESN. Fixes from Tobias Brunner.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-12-02 00:35:49 +0800
3d2dd617f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

This is a large batch of Netfilter fixes for net, they are:

1) Three patches to fix NAT conversion to rhashtable: Switch to rhlist
structure that allows to have several objects with the same key.
Moreover, fix wrong comparison logic in nf_nat_bysource_cmp() as this is
expecting a return value similar to memcmp(). Change location of
the nat_bysource field in the nf_conn structure to avoid zeroing
this as it breaks interaction with SLAB_DESTROY_BY_RCU and lead us
to crashes. From Florian Westphal.

2) Don't allow malformed fragments go through in IPv6, drop them,
otherwise we hit GPF, patch from Florian Westphal.

3) Fix crash if attributes are missing in nft_range, from Liping Zhang.

4) Fix arptables 32-bits userspace 64-bits kernel compat, from Hongxu Jia.

5) Two patches from David Ahern to fix netfilter interaction with vrf.
From David Ahern.

6) Fix element timeout calculation in nf_tables, we take milliseconds
from userspace, but we use jiffies from kernelspace. Patch from
Anders K. Pedersen.

7) Missing validation length netlink attribute for nft_hash, from
Laura Garcia.

8) Fix nf_conntrack_helper documentation, we don't default to off
anymore for a bit of time so let's get this in sync with the code.

I know is late but I think these are important, specifically the NAT
bits, as they are mostly addressing fallout from recent changes. I also
read there are chances to have -rc8, if that is the case, that would
also give us a bit more time to test this.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-12-02 00:04:41 +0800

01 Dec, 2016

1 commit

0382a25af l2tp: lock socket before checking flags in connect() ... Browse Code »

Socket flags aren't updated atomically, so the socket must be locked
while reading the SOCK_ZAPPED flag.

This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
also brings error handling for __ip6_datagram_connect() failures.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2016-12-01 03:14:07 +0800

30 Nov, 2016

2 commits

a55e23864 esp6: Fix integrity verification when ESN are used ... Browse Code »

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner
Acked-by: Herbert Xu
Signed-off-by: Steffen Klassert

Tobias Brunner
2016-11-30 18:10:16 +0800
9b57da063 netfilter: ipv6: nf_defrag: drop mangled skb on ream error ... Browse Code »

Dmitry Vyukov reported GPF in network stack that Andrey traced down to
negative nh offset in nf_ct_frag6_queue().

Problem is that all network headers before fragment header are pulled.
Normal ipv6 reassembly will drop the skb when errors occur further down
the line.

netfilter doesn't do this, and instead passed the original fragment
along. That was also fine back when netfilter ipv6 defrag worked with
cloned fragments, as the original, pristine fragment was passed on.

So we either have to undo the pull op, or discard such fragments.
Since they're malformed after all (e.g. overlapping fragment) it seems
preferrable to just drop them.

Same for temporary errors -- it doesn't make sense to accept (and
perhaps forward!) only some fragments of same datagram.

Fixes: 029f7f3b8701cc7ac ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
Reported-by: Dmitry Vyukov
Debugged-by: Andrey Konovalov
Diagnosed-by: Eric Dumazet
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-11-30 03:23:58 +0800

29 Nov, 2016

1 commit

79dc7e3f1 net: handle no dst on skb in icmp6_send ... Browse Code »

Andrey reported the following while fuzzing the kernel with syzkaller:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[] []
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0 EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS: 00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
[] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
[< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
[] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
[] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
[] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
...

icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
cases the dst->dev should be preferred for determining the L3 domain
if the dst has been set on the skb. Fallback to the skb->dev if it has
not. This covers the case reported here where icmp6_send is invoked on
Rx before the route lookup.

Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain")
Reported-by: Andrey Konovalov
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-11-29 05:13:01 +0800

28 Nov, 2016

1 commit

8eb4adf60 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
pull request (net): ipsec 2016-11-25

1) Fix a refcount leak in vti6.
From Nicolas Dichtel.

2) Fix a wrong if statement in xfrm_sk_policy_lookup.
From Florian Westphal.

3) The flowcache watermarks are per cpu. Take this into
account when comparing to the threshold where we
refusing new allocations. From Miroslav Urbanek.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-11-28 09:21:48 +0800

27 Nov, 2016

1 commit

0b42f25d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-11-27 12:42:21 +0800

26 Nov, 2016

1 commit

33b486793 net: ipv4, ipv6: run cgroup eBPF egress programs ... Browse Code »

If the cgroup associated with the receiving socket has an eBPF
programs installed, run them from ip_output(), ip6_output() and
ip_mc_output(). From mentioned functions we have two socket contexts
as per 7026b1ddb6b8 ("netfilter: Pass socket pointer down through
okfn()."). We explicitly need to use sk instead of skb->sk here,
since otherwise the same program would run multiple times on egress
when encap devices are involved, which is not desired in our case.

eBPF programs used in this context are expected to either return 1 to
let the packet pass, or != 1 to drop them. The programs have access to
the skb through bpf_skb_load_bytes(), and the payload starts at the
network headers (L3).

Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
the feature is unused.

Signed-off-by: Daniel Mack
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Mack
2016-11-26 05:26:04 +0800

25 Nov, 2016

2 commits

30c7be26f udplite: call proper backlog handlers ... Browse Code »

In commits 93821778def10 ("udp: Fix rcv socket locking") and
f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.

This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
before queueing")

Bug found by syzkaller team, thanks a lot guys !

Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9

Fixes: 93821778def1 ("udp: Fix rcv socket locking")
Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet
Reported-by: Andrey Konovalov
Cc: Benjamin LaHaise
Cc: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2016-11-25 04:32:14 +0800
764d3be6e ipv6: bump genid when the IFA_F_TENTATIVE flag is clear ... Browse Code »

When an ipv6 address has the tentative flag set, it can't be
used as source for egress traffic, while the associated route,
if any, can be looked up and even stored into some dst_cache.

In the latter scenario, the source ipv6 address selected and
stored in the cache is most probably wrong (e.g. with
link-local scope) and the entity using the dst_cache will
experience lack of ipv6 connectivity until said cache is
cleared or invalidated.

Overall this may cause lack of connectivity over most IPv6 tunnels
(comprising geneve and vxlan), if the first egress packet reaches
the tunnel before the DaD is completed for the used ipv6
address.

This patch bumps a new genid after that the IFA_F_TENTATIVE flag
is cleared, so that dst_cache will be invalidated on
next lookup and ipv6 connectivity restored.

Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd762 ("geneve: add dst caching support")
Acked-by: Hannes Frederic Sowa
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2016-11-25 01:04:10 +0800