Eric Lee / smarc-fsl-linux-kernel

08 Oct, 2016

1 commit

d1f532337 Merge branch 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS splice updates from Al Viro:
"There's a bunch of branches this cycle, both mine and from other folks
and I'd rather send pull requests separately.

This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
(and introduction of such). Gets rid of a lot of code in fs/splice.c
and elsewhere; there will be followups, but these are for the next
cycle... Some pipe/splice-related cleanups from Miklos in the same
branch as well"

* 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
pipe: fix comment in pipe_buf_operations
pipe: add pipe_buf_steal() helper
pipe: add pipe_buf_confirm() helper
pipe: add pipe_buf_release() helper
pipe: add pipe_buf_get() helper
relay: simplify relay_file_read()
switch default_file_splice_read() to use of pipe-backed iov_iter
switch generic_file_splice_read() to use of ->read_iter()
new iov_iter flavour: pipe-backed
fuse_dev_splice_read(): switch to add_to_pipe()
skb_splice_bits(): get rid of callback
new helper: add_to_pipe()
splice: lift pipe_lock out of splice_to_pipe()
splice: switch get_iovec_page_array() to iov_iter
splice_to_pipe(): don't open-code wakeup_pipe_readers()
consistent treatment of EFAULT on O_DIRECT read/write

Linus Torvalds
2016-10-08 06:36:58 +0800

04 Oct, 2016

2 commits

b6a792084 net: skbuff: Limit skb_vlan_pop/push() to expect skb->data at mac header ... Browse Code »

skb_vlan_pop/push were too generic, trying to support the cases where
skb->data is at mac header, and cases where skb->data is arbitrarily
elsewhere.

Supporting an arbitrary skb->data was complex and bogus:
- It failed to unwind skb->data to its original location post actual
pop/push.
(Also, semantic is not well defined for unwinding: If data was into
the eth header, need to use same offset from start; But if data was
at network header or beyond, need to adjust the original offset
according to the push/pull)
- It mangled the rcsum post actual push/pop, without taking into account
that the eth bytes might already have been pulled out of the csum.

Most callers (ovs, bpf) already had their skb->data at mac_header upon
invoking skb_vlan_pop/push.
Last caller that failed to do so (act_vlan) has been recently fixed.

Therefore, to simplify things, no longer support arbitrary skb->data
inputs for skb_vlan_pop/push().

skb->data is expected to be exactly at mac_header; WARN otherwise.

Signed-off-by: Shmulik Ladkani
Cc: Daniel Borkmann
Cc: Pravin Shelar
Cc: Jiri Pirko
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-10-04 09:41:40 +0800
25869262e skb_splice_bits(): get rid of callback ... Browse Code »

since pipe_lock is the outermost now, we don't need to drop/regain
socket locks around the call of splice_to_pipe() from skb_splice_bits(),
which kills the need to have a socket-specific callback; we can just
call splice_to_pipe() and be done with that.

Signed-off-by: Al Viro

Al Viro
2016-10-04 08:40:56 +0800

22 Sep, 2016

3 commits

ecf4ee41d net: skbuff: Coding: Use eth_type_vlan() instead of open coding it ... Browse Code »

Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing
skb->protocol to ETH_P_8021Q or ETH_P_8021AD.

Signed-off-by: Shmulik Ladkani
Reviewed-by: Pravin B Shelar
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-09-22 13:35:57 +0800
636c26280 net: skbuff: Remove errornous length validation in skb_vlan_pop() ... Browse Code »

In 93515d53b1
"net: move vlan pop/push functions into common code"
skb_vlan_pop was moved from its private location in openvswitch to
skbuff common code.

In case skb has non hw-accel vlan tag, the original 'pop_vlan()' assured
that skb->len is sufficient (if skb->len < VLAN_ETH_HLEN then pop was
considered a no-op).

This validation was moved as is into the new common 'skb_vlan_pop'.

Alas, in its original location (openvswitch), there was a guarantee that
'data' points to the mac_header, therefore the 'skb->len < VLAN_ETH_HLEN'
condition made sense.
However there's no such guarantee in the generic 'skb_vlan_pop'.

For short packets received in rx path going through 'skb_vlan_pop',
this causes 'skb_vlan_pop' to fail pop-ing a valid vlan hdr (in the non
hw-accel case) or to fail moving next tag into hw-accel tag.

Remove the 'skb->len < VLAN_ETH_HLEN' condition entirely:
It is superfluous since inner '__skb_vlan_pop' already verifies there
are VLAN_ETH_HLEN writable bytes at the mac_header.

Note this presents a slight change to skb_vlan_pop() users:
In case total length is smaller than VLAN_ETH_HLEN, skb_vlan_pop() now
returns an error, as opposed to previous "no-op" behavior.
Existing callers (e.g. tc act vlan, ovs) usually drop the packet if
'skb_vlan_pop' fails.

Fixes: 93515d53b1 ("net: move vlan pop/push functions into common code")
Signed-off-by: Shmulik Ladkani
Cc: Pravin Shelar
Reviewed-by: Pravin B Shelar
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-09-22 13:35:57 +0800
bfca4c520 net: skbuff: Export __skb_vlan_pop ... Browse Code »

This exports the functionality of extracting the tag from the payload,
without moving next vlan tag into hw accel tag.

Signed-off-by: Shmulik Ladkani
Signed-off-by: David S. Miller

Shmulik Ladkani
2016-09-22 13:34:20 +0800

20 Sep, 2016

1 commit

07b26c945 gso: Support partial splitting at the frag_list pointer ... Browse Code »

Since commit 8a29111c7 ("net: gro: allow to build full sized skb")
gro may build buffers with a frag_list. This can hurt forwarding
because most NICs can't offload such packets, they need to be
segmented in software. This patch splits buffers with a frag_list
at the frag_list pointer into buffers that can be TSO offloaded.

Signed-off-by: Steffen Klassert
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller

Steffen Klassert
2016-09-20 08:59:34 +0800

09 Sep, 2016

1 commit

9f5afeae5 tcp: use an RB tree for ooo receive queue ... Browse Code »

Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang
Signed-off-by: Eric Dumazet
Cc: Yuchung Cheng
Cc: Neal Cardwell
Cc: Ilpo Järvinen
Acked-By: Ilpo Järvinen
Signed-off-by: David S. Miller

Yaogong Wang
2016-09-09 08:25:58 +0800

07 Jul, 2016

1 commit

30d0844bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-07-07 01:35:22 +0800

02 Jul, 2016

1 commit

82a31b923 net_sched: fix mirrored packets checksum ... Browse Code »

Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim
Cc: Tom Herbert
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2016-07-02 04:19:34 +0800

04 Jun, 2016

5 commits

76f21b990 net: Add docbook description for 'mtu' arg to skb_gso_validate_mtu() ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2016-06-04 13:56:28 +0800
90017accf sctp: Add GSO support ... Browse Code »

SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.

This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.

With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.

This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.

v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for
unsupported GSO")

Signed-off-by: Marcelo Ricardo Leitner
Tested-by: Xin Long
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-06-04 07:37:21 +0800
ae7ef81ef skbuff: introduce skb_gso_validate_mtu ... Browse Code »

skb_gso_network_seglen is not enough for checking fragment sizes if
skb is using GSO_BY_FRAGS as we have to check frag per frag.

This patch introduces skb_gso_validate_mtu, based on the former, which
will wrap the use case inside it as all calls to skb_gso_network_seglen
were to validate if it fits on a given TMU, and improve the check.

Signed-off-by: Marcelo Ricardo Leitner
Tested-by: Xin Long
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-06-04 07:37:21 +0800
3953c46c3 sk_buff: allow segmenting based on frag sizes ... Browse Code »

This patch allows segmenting a skb based on its frags sizes instead of
based on a fixed value.

Signed-off-by: Marcelo Ricardo Leitner
Tested-by: Xin Long
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-06-04 07:37:21 +0800
57c056503 skbuff: export skb_gro_receive ... Browse Code »

sctp GSO requires it and sctp can be compiled as a module, so we need to
export this function.

Signed-off-by: Marcelo Ricardo Leitner
Tested-by: Xin Long
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-06-04 07:37:21 +0800

11 May, 2016

1 commit

953abb382 skbuff: remove unused variable `doff' ... Browse Code »

There are two instances of an unused variable, `doff' added by
commit 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function")
in pskb_carve_inside_header() and pskb_carve_inside_nonlinear().
Remove these instances, they are not used.

Reported by: Daniel Borkmann
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-05-11 04:05:12 +0800

05 May, 2016

2 commits

36c983824 gso: Only allow GSO_PARTIAL if we can checksum the inner protocol ... Browse Code »

This patch addresses a possible issue that can occur if we get into any odd
corner cases where we support TSO for a given protocol but not the checksum
or scatter-gather offload. There are few drivers floating around that
setup their tunnels this way and by enforcing the checksum piece we can
avoid mangling any frames.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-05-05 01:32:27 +0800
d7fb5a804 gso: Do not perform partial GSO if number of partial segments is 1 or less ... Browse Code »

In the event that the number of partial segments is equal to 1 we don't
really need to perform partial segmentation offload. As such we should
skip multiplying the MSS and instead just clear the partial_segs value
since it will not provide any gain to advertise the frame as being GSO when
it is a single frame.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-05-05 01:32:26 +0800

26 Apr, 2016

1 commit

6fa01ccd8 skbuff: Add pskb_extract() helper function ... Browse Code »

A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.

The existing code uses the sequence
clone = skb_clone(..);
pskb_pull(clone, off, ..);
pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().

To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.

Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-04-26 04:54:14 +0800

24 Apr, 2016

1 commit

1602f49b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller

David S. Miller
2016-04-24 06:51:33 +0800

16 Apr, 2016

1 commit

9241e2df4 vlan: pull on __vlan_insert_tag error path and fix csum correction ... Browse Code »

When __vlan_insert_tag() fails from skb_vlan_push() path due to the
skb_cow_head(), we need to undo the __skb_push() in the error path
as well that was done earlier to move skb->data pointer to mac header.

Moreover, I noticed that when in the non-error path the __skb_pull()
is done and the original offset to mac header was non-zero, we fixup
from a wrong skb->data offset in the checksum complete processing.

So the skb_postpush_rcsum() really needs to be done before __skb_pull()
where skb->data still points to the mac header start and thus operates
under the same conditions as in __vlan_insert_tag().

Fixes: 93515d53b133 ("net: move vlan pop/push functions into common code")
Signed-off-by: Daniel Borkmann
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Daniel Borkmann
2016-04-16 11:20:11 +0800

15 Apr, 2016

1 commit

802ab55ad GSO: Support partial segmentation offload ... Browse Code »

This patch adds support for something I am referring to as GSO partial.
The basic idea is that we can support a broader range of devices for
segmentation if we use fixed outer headers and have the hardware only
really deal with segmenting the inner header. The idea behind the naming
is due to the fact that everything before csum_start will be fixed headers,
and everything after will be the region that is handled by hardware.

With the current implementation it allows us to add support for the
following GSO types with an inner TSO_MANGLEID or TSO6 offload:
NETIF_F_GSO_GRE
NETIF_F_GSO_GRE_CSUM
NETIF_F_GSO_IPIP
NETIF_F_GSO_SIT
NETIF_F_UDP_TUNNEL
NETIF_F_UDP_TUNNEL_CSUM

In the case of hardware that already supports tunneling we may be able to
extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
hardware can support updating inner IPv4 headers.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-04-15 04:23:41 +0800

21 Mar, 2016

1 commit

abbdb5a74 net: remove a dubious unlikely() clause ... Browse Code »

TCP protocol is still used these days, and TCP uses
clones in its transmit path. We can not optimize linux
stack assuming it is mostly used in routers, or that TCP
is dead.

Fixes: 795bb1c00d ("net: bulk free infrastructure for NAPI context, use napi_consume_skb")
Signed-off-by: Eric Dumazet
Cc: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Eric Dumazet
2016-03-21 04:24:07 +0800

14 Mar, 2016

1 commit

885eb0a51 net: adjust napi_consume_skb to handle non-NAPI callers ... Browse Code »

Some drivers reuse/share code paths that free SKBs between NAPI
and non-NAPI calls. Adjust napi_consume_skb to handle this
use-case.

Before, calls from netpoll (w/ IRQs disabled) was handled and
indicated with a budget zero indication. Use the same zero
indication to handle calls not originating from NAPI/softirq.
Simply handled by using dev_consume_skb_any().

This adds an extra branch+call for the netpoll case (checking
in_irq() + irqs_disabled()), but that is okay as this is a slowpath.

Suggested-by: Alexander Duyck
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2016-03-14 10:35:35 +0800

10 Mar, 2016

1 commit

fa9835e52 net: Walk fragments in __skb_splice_bits ... Browse Code »

Add walking of fragments in __skb_splice_bits.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2016-03-10 05:36:14 +0800

09 Mar, 2016

1 commit

810813c47 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2016-03-09 01:34:12 +0800

02 Mar, 2016

1 commit

64d4e3431 net: remove skb_sender_cpu_clear() ... Browse Code »

After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id cohabitation")
skb_sender_cpu_clear() becomes empty and can be removed.

Cc: Eric Dumazet
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-03-02 06:36:47 +0800

26 Feb, 2016

1 commit

9b368814b net: fix bridge multicast packet checksum validation ... Browse Code »

We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:

[...]
[ 43.986968] eth0: hw csum failure
[ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[ 43.996193] Hardware name: BCM2709
[ 43.999647] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[ 44.007432] [] (show_stack) from [] (dump_stack+0x80/0x90)
[ 44.014695] [] (dump_stack) from [] (__skb_checksum_complete+0x6c/0xac)
[ 44.023090] [] (__skb_checksum_complete) from [] (ipv6_mc_validate_checksum+0x104/0x178)
[ 44.032959] [] (ipv6_mc_validate_checksum) from [] (skb_checksum_trimmed+0x130/0x188)
[ 44.042565] [] (skb_checksum_trimmed) from [] (ipv6_mc_check_mld+0x118/0x338)
[ 44.051501] [] (ipv6_mc_check_mld) from [] (br_multicast_rcv+0x5dc/0xd00)
[ 44.060077] [] (br_multicast_rcv) from [] (br_handle_frame_finish+0xac/0x51c)
[...]

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas
Signed-off-by: Linus Lüssing
Signed-off-by: David S. Miller

Linus Lüssing
2016-02-26 05:16:38 +0800

23 Feb, 2016

1 commit

b63335311 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-02-23 13:09:14 +0800

20 Feb, 2016

1 commit

6b83d28a5 net: use skb_postpush_rcsum instead of own implementations ... Browse Code »

Replace individual implementations with the recently introduced
skb_postpush_rcsum() helper.

Signed-off-by: Daniel Borkmann
Acked-by: Tom Herbert
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-02-20 12:43:10 +0800

12 Feb, 2016

2 commits

15fad714b net: bulk free SKBs that were delay free'ed due to IRQ context ... Browse Code »

The network stack defers SKBs free, in-case free happens in IRQ or
when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
writes SKBs that were free'ed during IRQ to the softirq completion
queue (softnet_data.completion_queue).

These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
in function net_tx_action(). Take advantage of this a use the skb
defer and flush API, as we are already in softirq context.

For modern drivers this rarely happens. Although most drivers do call
dev_kfree_skb_any(), which detects the situation and calls
__dev_kfree_skb_irq() when needed. This due to netpoll can call from
IRQ context.

Signed-off-by: Alexander Duyck
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2016-02-12 00:59:09 +0800
795bb1c00 net: bulk free infrastructure for NAPI context, use napi_consume_skb ... Browse Code »

Discovered that network stack were hitting the kmem_cache/SLUB
slowpath when freeing SKBs. Doing bulk free with kmem_cache_free_bulk
can speedup this slowpath.

NAPI context is a bit special, lets take advantage of that for bulk
free'ing SKBs.

In NAPI context we are running in softirq, which gives us certain
protection. A softirq can run on several CPUs at once. BUT the
important part is a softirq will never preempt another softirq running
on the same CPU. This gives us the opportunity to access per-cpu
variables in softirq context.

Extend napi_alloc_cache (before only contained page_frag_cache) to be
a struct with a small array based stack for holding SKBs. Introduce a
SKB defer and flush API for accessing this.

Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
when running in NAPI context. A small trick to handle/detect if we
are called from netpoll is to see if budget is 0. In that case, we
need to invoke dev_consume_skb_irq().

Joint work with Alexander Duyck.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2016-02-12 00:59:09 +0800

11 Feb, 2016

4 commits

f245d079c net: Allow tunnels to use inner checksum offloads with outer checksums needed ... Browse Code »

This patch enables us to use inner checksum offloads if provided by
hardware with outer checksums computed by software.

It basically reduces encap_hdr_csum to an advisory flag for now, but based
on the fact that SCTP may be getting segmentation support before long I
thought we may want to keep it as it is possible we may need to support
CRC32c and 1's compliment checksum in the same packet at some point in the
future.

Signed-off-by: Alexander Duyck
Acked-by: Tom Herbert
Signed-off-by: David S. Miller

Alexander Duyck
2016-02-11 21:55:34 +0800
ddff00d42 net: Move skb_has_shared_frag check out of GRE code and into segmentation ... Browse Code »

The call skb_has_shared_frag is used in the GRE path and skb_checksum_help
to verify that no frags can be modified by an external entity. This check
really doesn't belong in the GRE path but in the skb_segment function
itself. This way any protocol that might be segmented will be performing
this check before attempting to offload a checksum to software.

Signed-off-by: Alexander Duyck
Acked-by: Tom Herbert
Signed-off-by: David S. Miller

Alexander Duyck
2016-02-11 21:55:34 +0800
7fbeffed7 net: Update remote checksum segmentation to support use of GSO checksum ... Browse Code »

This patch addresses two main issues.

First in the case of remote checksum offload we were avoiding dealing with
scatter-gather issues. As a result it would be possible to assemble a
series of frames that used frags instead of being linearized as they should
have if remote checksum offload was enabled.

Second I have updated the code so that we now let GSO take care of doing
the checksum on the data itself and drop the special case that was added
for remote checksum offload.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-02-11 21:55:33 +0800
764434562 net: Move GSO csum into SKB_GSO_CB ... Browse Code »

This patch moves the checksum maintained by GSO out of skb->csum and into
the GSO context block in order to allow for us to work on outer checksums
while maintaining the inner checksum offsets in the case of the inner
checksum being offloaded, while the outer checksums will be computed.

While updating the code I also did a minor cleanu-up on gso_make_checksum.
The change is mostly to make it so that we store the values and compute the
checksum instead of computing the checksum and then storing the values we
needed to update.

Signed-off-by: Alexander Duyck
Acked-by: Tom Herbert
Signed-off-by: David S. Miller

Alexander Duyck
2016-02-11 21:55:33 +0800

09 Feb, 2016

1 commit

5f74f82ea net:Add sysctl_max_skb_frags ... Browse Code »

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry
Reviewed-by: Håkon Bugge
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hans Westgaard Ry
2016-02-09 17:28:06 +0800

18 Dec, 2015

1 commit

ac5cc9779 net: check both type and procotol for tcp sockets ... Browse Code »

Dmitry reported the following out-of-bound access:

Call Trace:
[] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
[] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
[< inline >] SYSC_setsockopt net/socket.c:1746
[] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
[] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

This is because we mistake a raw socket as a tcp socket.
We should check both sk->sk_type and sk->sk_protocol to ensure
it is a tcp socket.

Willem points out __skb_complete_tx_timestamp() needs to fix as well.

Reported-by: Dmitry Vyukov
Cc: Willem de Bruijn
Cc: Eric Dumazet
Signed-off-by: Cong Wang
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

WANG Cong
2015-12-18 04:46:32 +0800

15 Dec, 2015

1 commit

f65486156 skbuff: Fix offset error in skb_reorder_vlan_header ... Browse Code »

skb_reorder_vlan_header is called after the vlan header has
been pulled. As a result the offset of the begining of
the mac header has been incrased by 4 bytes (VLAN_HLEN).
When moving the mac addresses, include this incrase in
the offset calcualation so that the mac addresses are
copied correctly.

Fixes: a6e18ff1117 (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off)
CC: Nicolas Dichtel
CC: Patrick McHardy
Signed-off-by: Vladislav Yasevich
Signed-off-by: David S. Miller

Vlad Yasevich
2015-12-15 13:30:41 +0800

18 Nov, 2015

1 commit

a6e18ff11 vlan: Fix untag operations of stacked vlans with REORDER_HEADER off ... Browse Code »

When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header. Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).

As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.

To solve this, we use skb->mac_len as the offset. The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.

Signed-off-by: Vladislav Yasevich
Signed-off-by: David S. Miller

Vlad Yasevich
2015-11-18 03:38:35 +0800