Eric Lee / smarc-fsl-linux-kernel

16 Jan, 2016

1 commit

9207f9d45 net: preserve IP control block during GSO segmentation ... Browse Code »

Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.

This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.

Signed-off-by: Konstantin Khlebnikov
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller

Konstantin Khlebnikov
2016-01-16 03:35:24 +0800

12 Jan, 2016

2 commits

9d367eddf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/bonding/bond_main.c
drivers/net/ethernet/mellanox/mlxsw/spectrum.h
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

The bond_main.c and mellanox switch conflicts were cases of
overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-01-12 12:55:43 +0800
40ba33022 udp: disallow UFO for sockets with SO_NO_CHECK option ... Browse Code »

Commit acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM
sockets") disallows UFO for packets sent from raw sockets. We need to do
the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if
for a bit different reason: while such socket would override the
CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and
bad offloading flags warning is triggered in __skb_gso_segment().

In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow
UFO for packets sent by sockets with UDP_NO_CHECK6_TX option.

Signed-off-by: Michal Kubecek
Tested-by: Shannon Nelson
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Michal Kubeček
2016-01-12 06:40:57 +0800

16 Dec, 2015

1 commit

c8cd0989b net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUM ... Browse Code »

These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.

This patch also:
- Cleans up can_checksum_protocol
- Simplifies netdev_intersect_features

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-12-16 05:50:20 +0800

01 Dec, 2015

1 commit

dfc3b0e89 net: remove unnecessary mroute.h includes ... Browse Code »

It looks like many files are including mroute.h unnecessarily, so remove
the include. Most importantly remove it from ipv6.

CC: Hideaki YOSHIFUJI
CC: Steffen Klassert
CC: Herbert Xu
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2015-12-01 04:26:21 +0800

02 Nov, 2015

2 commits

dbd3393c5 ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment ... Browse Code »

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Cc: Eric Dumazet
Cc: Vlad Yasevich
Cc: Benjamin Coddington
Cc: Tom Herbert
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-02 01:01:27 +0800
d749c9cbf ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets ... Browse Code »

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet
Cc: Vlad Yasevich
Cc: Benjamin Coddington
Cc: Tom Herbert
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-02 01:01:27 +0800

19 Oct, 2015

1 commit

dc6ef6be5 tcp: do not set queue_mapping on SYNACK ... Browse Code »

At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into
SYNACK packets") we had little ways to cope with SYN floods.

We no longer need to reflect incoming skb queue mappings, and instead
can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
affinities.

Note that all SYNACK retransmits were picking TX queue 0, this no longer
is a win given that SYNACK rtx are now distributed on all cpus.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-19 13:26:02 +0800

08 Oct, 2015

10 commits

ede2059db dst: Pass net into dst->output ... Browse Code »

The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:27:03 +0800
33224b16f ipv4, ipv6: Pass net into ip_local_out and ip6_local_out ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:27:02 +0800
cf91a99da ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:27:02 +0800
77589ce0f ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit ... Browse Code »

Compute net and store it in a variable in the functions
ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be
recomputed next time it is needed.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:59 +0800
e2cb77db0 ipv4: Merge ip_local_out and ip_local_out_sk ... Browse Code »

It is confusing and silly hiding a parameter so modify all of
the callers to pass in the appropriate socket or skb->sk if
no socket is known.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:57 +0800
b92dacd45 ipv4: Merge __ip_local_out and __ip_local_out_sk ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:57 +0800
4ebdfba73 dst: Pass a sk into .local_out ... Browse Code »

For consistency with the other similar methods in the kernel pass a
struct sock into the dst_ops .local_out method.

Simplifying the socket passing case is needed a prequel to passing a
struct net reference into .local_out.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:55 +0800
13206b6bf net: Pass net into dst_output and remove dst_output_okfn ... Browse Code »

Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:54 +0800
850dcc4d4 ipv4: Fix ip_queue_xmit to pass sk into ip_local_out_sk ... Browse Code »

After a packet has been encapsulated by a tunnel we should use the
tunnel sockets local multicast loopback flag to control if the
encapsulated packet should be locally loopback back.

Pass sk into ip_local_out_sk so that in the rare case we are dealing
with a tunneled packet whose tunnel destination address is a multicast
address the kernel properly decides to loopback this packet.

In practice I don't think this matters as ip_queue_xmit is used by
tcp, l2tp and sctp none of which I am aware of uses ip level
multicasting as they are all point to point communications protocols.
Let's fix this before someone uses ip_queue_xmit for a tunnel protocol
that does use multicast.

Fixes: aad88724c9d5 ("ipv4: add a sock pointer to dst->output() path.")
Fixes: b0270e91014d ("ipv4: add a sock pointer to ip_queue_xmit()")
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:52 +0800
fd2874b3b ipv4: Fix ip_local_out_sk by passing the sk into __ip_local_out_sk ... Browse Code »

In the rare case where sk != skb->sk ip_local_out_sk arranges
to call dst->output differently if the skb is queued or not.
This is a bug.

Fix this bug by passing the sk parameter of ip_local_out_sk through
from ip_local_out_sk to __ip_local_out_sk (skipping __ip_local_out).

Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:52 +0800

30 Sep, 2015

2 commits

694869b3c ipv4: Pass struct net through ip_fragment ... Browse Code »

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-09-30 14:45:03 +0800
007979eaf net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER ... Browse Code »

Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the
netif_is_vrf and netif_index_is_vrf macros.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-09-30 11:40:32 +0800

26 Sep, 2015

1 commit

cfe673b0a ip: constify ip_build_and_send_pkt() socket argument ... Browse Code »

This function is used to build and send SYNACK packets,
possibly on behalf of unlocked listener socket.

Make sure we did not miss a write by making this socket const.

We no longer can use ip_select_ident() and have to either
set iph->id to 0 or directly call __ip_select_ident()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-09-26 04:00:38 +0800

18 Sep, 2015

7 commits

0c4b51f00 netfilter: Pass net into okfn ... Browse Code »

This is immediately motivated by the bridge code that chains functions that
call into netfilter. Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn. For the moment dst_output_okfn
just silently drops the struct net.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:37 +0800
29a26a568 netfilter: Pass struct net into the netfilter hooks ... Browse Code »

Pass a network namespace parameter into the netfilter hooks. At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.

This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".

In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.

The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)". I am documenting them in case something odd
pops up and someone starts trying to track down what happened.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:37 +0800
4ba1bf429 ipv4: Only compute net once in ip_finish_output2 ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:34 +0800
9479b0af4 ipv4: Explicitly compute net in ip_fragment ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:34 +0800
26a949dbd ipv4: Only compute net once in ip_do_fragment ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:34 +0800
88f5cc245 ipv4: Remember the net in ip_output and ip_mc_output ... Browse Code »

This is a prepatory patch to passing net int the netfilter hooks,
where net will be used again.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:33 +0800
5a70649e0 net: Merge dst_output and dst_output_sk ... Browse Code »

Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb->sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-09-18 08:18:32 +0800

14 Aug, 2015

1 commit

f7ba868b7 net: Use VRF index for oif in ip_send_unicast_reply ... Browse Code »

If output device is not specified use VRF device if input device is
enslaved. This is needed to ensure tcp acks and resets go out VRF device.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-14 13:43:21 +0800

16 Jun, 2015

1 commit

ada6c1de9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

This a bit large (and late) patchset that contains Netfilter updates for
net-next. Most relevantly br_netfilter fixes, ipset RCU support, removal of
x_tables percpu ruleset copy and rework of the nf_tables netdev support. More
specifically, they are:

1) Warn the user when there is a better protocol conntracker available, from
Marcelo Ricardo Leitner.

2) Fix forwarding of IPv6 fragmented traffic in br_netfilter, from Bernhard
Thaler. This comes with several patches to prepare the change in first place.

3) Get rid of special mtu handling of PPPoE/VLAN frames for br_netfilter. This
is not needed anymore since now we use the largest fragment size to
refragment, from Florian Westphal.

4) Restore vlan tag when refragmenting in br_netfilter, also from Florian.

5) Get rid of the percpu ruleset copy in x_tables, from Florian. Plus another
follow up patch to refine it from Eric Dumazet.

6) Several ipset cleanups, fixes and finally RCU support, from Jozsef Kadlecsik.

7) Get rid of parens in Netfilter Kconfig files.

8) Attach the net_device to the basechain as opposed to the initial per table
approach in the nf_tables netdev family.

9) Subscribe to netdev events to detect the removal and registration of a
device that is referenced by a basechain.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-06-16 05:30:32 +0800

13 Jun, 2015

1 commit

b60f2f3d6 net: ipv4: un-inline ip_finish_output2 ... Browse Code »

text data bss dec hex filename
old: 16527 44 0 16571 40bb net/ipv4/ip_output.o
new: 14935 44 0 14979 3a83 net/ipv4/ip_output.o

Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2015-06-13 05:19:17 +0800

12 Jun, 2015

1 commit

33b1f3139 net: ip_fragment: remove BRIDGE_NETFILTER mtu special handling ... Browse Code »

since commit d6b915e29f4adea9
("ip_fragment: don't forward defragmented DF packet") the largest
fragment size is available in the IPCB.

Therefore we no longer need to care about 'encapsulation'
overhead of stripped PPPOE/VLAN headers since ip_do_fragment
doesn't use device mtu in such cases.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-06-12 20:16:46 +0800

28 May, 2015

2 commits

d6b915e29 ip_fragment: don't forward defragmented DF packet ... Browse Code »

We currently always send fragments without DF bit set.

Thus, given following setup:

mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
A R1 R2 B

Where R1 and R2 run linux with netfilter defragmentation/conntrack
enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
will respond with icmp too big error if one of these fragments exceeded
1400 bytes.

However, if R1 receives fragment sizes 1200 and 100, it would
forward the reassembled packet without refragmenting, i.e.
R2 will send an icmp error in response to a packet that was never sent,
citing mtu that the original sender never exceeded.

The other minor issue is that a refragmentation on R1 will conceal the
MTU of R2-B since refragmentation does not set DF bit on the fragments.

This modifies ip_fragment so that we track largest fragment size seen
both for DF and non-DF packets, and set frag_max_size to the largest
value.

If the DF fragment size is larger or equal to the non-df one, we will
consider the packet a path mtu probe:
We set DF bit on the reassembled skb and also tag it with a new IPCB flag
to force refragmentation even if skb fits outdev mtu.

We will also set DF bit on each fragment in this case.

Joint work with Hannes Frederic Sowa.

Reported-by: Jesse Gross
Signed-off-by: Florian Westphal
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florian Westphal
2015-05-28 01:03:31 +0800
c5501eb34 net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper ... Browse Code »

ip_skb_dst_mtu is small inline helper, but its called in several places.

before: 17061 44 0 17105 42d1 net/ipv4/ip_output.o
after: 16805 44 0 16849 41d1 net/ipv4/ip_output.o

Signed-off-by: Florian Westphal
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florian Westphal
2015-05-28 01:03:30 +0800

25 May, 2015

1 commit

be12a1fe2 net: skbuff: add skb_append_pagefrags and use it ... Browse Code »

Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-05-25 12:06:58 +0800

19 May, 2015

1 commit

49d16b23c bridge_netfilter: No ICMP packet on IPv4 fragmentation error ... Browse Code »

When bridge netfilter re-fragments an IP packet for output, all
packets that can not be re-fragmented to their original input size
should be silently discarded.

However, current bridge netfilter output path generates an ICMP packet
with 'size exceeded MTU' message for such packets, this is a bug.

This patch refactors the ip_fragment() API to allow two separate
use cases. The bridge netfilter user case will not
send ICMP, the routing output will, as before.

Signed-off-by: Andy Zhou
Acked-by: Florian Westphal
Signed-off-by: David S. Miller

Andy Zhou
2015-05-19 12:15:39 +0800

14 May, 2015

1 commit

7d771aaac ipv4: __ip_local_out_sk() is static ... Browse Code »

__ip_local_out_sk() is only used from net/ipv4/ip_output.c

net/ipv4/ip_output.c:94:5: warning: symbol '__ip_local_out_sk' was not
declared. Should it be static?

Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-05-14 03:21:33 +0800

08 Apr, 2015

2 commits

8bc0034cf net: remove extra newlines ... Browse Code »

Signed-off-by: Sheng Yong
Signed-off-by: David S. Miller

Sheng Yong
2015-04-08 10:24:37 +0800
7026b1ddb netfilter: Pass socket pointer down through okfn(). ... Browse Code »

On the output paths in particular, we have to sometimes deal with two
socket contexts. First, and usually skb->sk, is the local socket that
generated the frame.

And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.

We do not want to disassociate skb->sk when encapsulating in order
to fix this, because that would break socket memory accounting.

The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device. We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.

Signed-off-by: David S. Miller

David Miller
2015-04-08 03:25:55 +0800

04 Apr, 2015

1 commit

00db41243 ipv4: coding style: comparison for inequality with NULL ... Browse Code »

The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-04 00:11:15 +0800