08 May, 2019
1 commit
-
There was NVMEM support added to of_get_mac_address, so it could now
return ERR_PTR encoded error values, so we need to adjust all current
users of of_get_mac_address to this new fact.While at it, remove superfluous is_valid_ether_addr as the MAC address
returned from of_get_mac_address is always valid and checked by
is_valid_ether_addr anyway.Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address")
Signed-off-by: Petr Štetiar
Signed-off-by: David S. Miller
06 May, 2019
1 commit
-
Frames get processed by DSA and redirected to switch port net devices
based on the ETH_P_XDSA multiplexed packet_type handler found by the
network stack when calling eth_type_trans().The running assumption is that once the DSA .rcv function is called, DSA
is always able to decode the switch tag in order to change the skb->dev
from its master.However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105,
user of DSA_TAG_PROTO_8021Q) where this assumption is not completely
true, since switch tagging piggybacks on the absence of a vlan_filtering
bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't
rely on switch tagging, but on a different mechanism. So it would make
sense to at least be able to terminate that.Having DSA receive traffic it can't decode would put it in an impossible
situation: the eth_type_trans() function would invoke the DSA .rcv(),
which could not change skb->dev, then eth_type_trans() would be invoked
again, which again would call the DSA .rcv, and the packet would never
be able to exit the DSA filter and would spiral in a loop until the
whole system dies.This happens because eth_type_trans() doesn't actually look at the skb
(so as to identify a potential tag) when it deems it as being
ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer
installed (therefore it's a DSA master) and that there exists a .rcv
callback (everybody except DSA_TAG_PROTO_NONE has that). This is
understandable as there are many switch tags out there, and exhaustively
checking for all of them is far from ideal.The solution lies in introducing a filtering function for each tagging
protocol. In the absence of a filtering function, all traffic is passed
to the .rcv DSA callback. The tagging protocol should see the filtering
function as a pre-validation that it can decode the incoming skb. The
traffic that doesn't match the filter will bypass the DSA .rcv callback
and be left on the master netdevice, which wasn't previously possible.Signed-off-by: Vladimir Oltean
Reviewed-by: Florian Fainelli
Signed-off-by: David S. Miller
24 Apr, 2019
2 commits
-
Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.Cc: Maxim Krasnyansky
Cc: Saeed Mahameed
Cc: Jeff Kirsher
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang
Cc: Salil Mehta
Cc: Michael Chan
Cc: Igor Russkikh
Signed-off-by: Stanislav Fomichev
Signed-off-by: Daniel Borkmann -
This new argument will be used in the next patches for the
eth_get_headlen use case. eth_get_headlen calls flow dissector
with only data (without skb) so there is currently no way to
pull attached BPF flow dissector program. With this new argument,
we can amend the callers to explicitly pass network namespace
so we can use attached BPF program.Signed-off-by: Stanislav Fomichev
Reviewed-by: Saeed Mahameed
Signed-off-by: Daniel Borkmann
23 Feb, 2019
1 commit
-
The previous commit introduced parse_protocol callback which should
extract the protocol number from the L2 header. Make all Ethernet
devices support it.Signed-off-by: Maxim Mikityanskiy
Signed-off-by: David S. Miller
04 Dec, 2018
1 commit
-
We already have of_get_nvmem_mac_address() but some non-DT systems want
to read the MAC address from NVMEM too. Implement a generalized routine
that takes struct device as argument.Signed-off-by: Bartosz Golaszewski
Signed-off-by: David S. Miller
16 Nov, 2018
1 commit
-
netperf udp stream shows that eth_type_trans takes certain cpu,
so adjust the mac address check order, and firstly check if it
is device address, and only check if it is multicast address
only if not the device address.After this change:
To unicast, and skb dst mac is device mac, this is most of time
reduce a comparision
To unicast, and skb dst mac is not device mac, nothing change
To multicast, increase a comparisionBefore:
1.03% [kernel] [k] eth_type_transAfter:
0.78% [kernel] [k] eth_type_transSigned-off-by: Zhang Yu
Signed-off-by: Li RongQing
Signed-off-by: David S. Miller
26 Jun, 2018
1 commit
-
Manage pending per-NAPI GRO packets via list_head.
Return an SKB pointer from the GRO receive handlers. When GRO receive
handlers return non-NULL, it means that this SKB needs to be completed
at this time and removed from the NAPI queue.Several operations are greatly simplified by this transformation,
especially timing out the oldest SKB in the list when gro_count
exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
in reverse order.Signed-off-by: David S. Miller
08 May, 2018
1 commit
-
When the core networking needs to detect the transport offset in a given
packet and parse it explicitly, a full-blown flow_keys struct is used for
storage.
This patch introduces a smaller keys store, rework the basic flow dissect
helper to use it, and apply this new helper where possible - namely in
skb_probe_transport_header(). The used flow dissector data structures
are renamed to match more closely the new role.The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
to the smaller memset. Small, but measurable improvement is measured also
in macro benchmarking.v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
as per DaveM suggestionSuggested-by: David Miller
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
16 Jun, 2017
1 commit
-
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
17 Feb, 2017
1 commit
-
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-02-161) Make struct xfrm_input_afinfo const, nothing writes to it.
From Florian Westphal.2) Remove all places that write to the afinfo policy backend
and make the struct const then.
From Florian Westphal.3) Prepare for packet consuming gro callbacks and add
ESP GRO handlers. ESP packets can be decapsulated
at the GRO layer then. It saves a round through
the stack for each ESP packet.Please note that this has a merge coflict between commit
63fca65d0863 ("net: add confirm_neigh method to dst_ops")
from net-next and
3d7d25a68ea5 ("xfrm: policy: remove garbage_collect callback")
a2817d8b279b ("xfrm: policy: remove family field")from ipsec-next.
The conflict can be solved as it is done in linux-next.
Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
15 Feb, 2017
1 commit
-
Add a skb_gro_flush_final helper to prepare for consuming
skbs in call_gro_receive. We will extend this helper to not
touch the skb if the skb is consumed by a gro callback with
a followup patch. We need this to handle the upcomming IPsec
ESP callbacks as they reinject the skb to the napi_gro_receive
asynchronous. The handler is used in all gro_receive functions
that can call the ESP gro handlers.Signed-off-by: Steffen Klassert
11 Feb, 2017
1 commit
09 Feb, 2017
1 commit
-
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Acked-by: Sowmini Varadhan
Signed-off-by: David S. Miller
30 Jan, 2017
1 commit
-
This patch adds devm_alloc_etherdev_mqs function and devm_alloc_etherdev
macro. These can be used for simpler netdev allocation without having to
care about calling free_netdev.Thanks to this change drivers, their error paths and removal paths may
get simpler by a bit.Signed-off-by: Rafał Miłecki
Signed-off-by: David S. Miller
08 Nov, 2016
1 commit
-
The default TX queue length of Ethernet devices have been a magic
constant of 1000, ever since the initial git import.Looking back in historical trees[1][2] the value used to be 100,
with the same comment "Ethernet wants good queues". The commit[3]
that changed this from 100 to 1000 didn't describe why, but from
conversations with Robert Olsson it seems that it was changed
when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
link speed increased x10 the queue size were also adjusted. This
value later caused much heartache for the bufferbloat community.This patch merely moves the value into a defined constant.
[1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
[2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
[3] https://git.kernel.org/tglx/history/c/98921832c232Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
31 Oct, 2016
1 commit
-
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.Signed-off-by: David S. Miller
21 Oct, 2016
1 commit
-
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.Thanks to Vladimír Beneš for the initial bug report.
Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca
Reviewed-by: Jiri Benc
Acked-by: Hannes Frederic Sowa
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
13 Oct, 2016
1 commit
-
With centralized MTU checking, there's nothing productive done by
eth_change_mtu that isn't already done in dev_set_mtu, so mark it as
deprecated and remove all usage of it in the kernel. All callers have been
audited for calls to alloc_etherdev* or ether_setup directly, which means
they all have a valid dev->min_mtu and dev->max_mtu. Now eth_change_mtu
prints out a netdev_warn about being deprecated, for the benefit of
out-of-tree drivers that might be utilizing it.Of note, dvb_net.c actually had dev->mtu = 4096, while using
eth_change_mtu, meaning that if you ever tried changing it's mtu, you
couldn't set it above 1500 anymore. It's now getting dev->max_mtu also set
to 4096 to remedy that.v2: fix up lantiq_etop, missed breakage due to drive not compiling on x86
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson
Signed-off-by: David S. Miller
25 Feb, 2016
1 commit
-
We want to try and pull the L4 header in if it is available in the first
fragment. As such add the flag to indicate we want to pull the headers on
the first fragment in.Signed-off-by: Alexander Duyck
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
07 Jan, 2016
1 commit
-
A repeating pattern in drivers has become to use OF node information
and, if not found, platform specific host information to extract the
ethernet address for a given device.Currently this is done with a call to of_get_mac_address() and then
some ifdef'd stuff for SPARC.Consolidate this into a portable routine, and provide the
arch_get_platform_mac_address() weak function hook for all
architectures to implement if they want.Signed-off-by: David S. Miller
29 Sep, 2015
1 commit
-
Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC))
generated suboptimal assembler code in eth_get_headlen().This early return coding style is usually not an issue, on super scalar CPUs,
but the compiler choose to put the return statement after this very unlikely
branch, thus creating larger jump down to the likely code path.Performance wise, I could measure slightly less L1-icache-load-misses
and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
02 Sep, 2015
1 commit
-
The flags argument will allow control of the dissection process (for
instance whether to parse beyond L3).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
10 Aug, 2015
1 commit
-
This patch fix double word "the the" in
Documentation/DocBook/networking/API-eth-get-headlen.html
Documentation/DocBook/networking/netdev.html
Documentation/DocBook/networking.xmlThese files are generated from comment in source,
so I have to fix comment in net/ethernet/eth.c.Signed-off-by: Masanari Iida
Signed-off-by: David S. Miller
05 Jun, 2015
1 commit
-
This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.With this patch, Ethertype and IP protocol are now included in the
flow hash input.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
02 Jun, 2015
1 commit
-
When we scan a packet for GRO processing, we want to see the most
common packet types in the front of the offload_base list.So add a priority field so we can handle this properly.
IPv4/IPv6 get the highest priority with the implicit zero priority
field.Next comes ethernet with a priority of 10, and then we have the MPLS
types with a priority of 15.Suggested-by: Eric Dumazet
Suggested-by: Toshiaki Makita
Signed-off-by: David S. Miller
14 May, 2015
2 commits
-
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Since these functions are defined in flow_dissector.c, move header
declarations from skbuff.h into flow_dissector.hSigned-off-by: Jiri Pirko
Signed-off-by: David S. Miller
06 May, 2015
1 commit
-
This change does two things. First it fixes a sparse error for the fact
that the __be16 degrades to an integer. Since that is actually what I am
kind of doing I am simply working around that by forcing both sides of the
comparison to u16.Also I realized on some compilers I was generating another instruction for
big endian systems such as PowerPC since it was masking the value before
doing the comparison. So to resolve that I have simply pulled the mask out
and wrapped it in an #ifndef __BIG_ENDIAN.Lastly I pulled this all out into its own function. I notices there are
similar checks in a number of other places so this function can be reused
there to help reduce overhead in these paths as well.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller
04 May, 2015
3 commits
-
Avoid recomputing the Ethernet header location and instead just use the
pointer provided by skb->data. The problem with using eth_hdr is that the
compiler wasn't smart enough to realize that skb->head + skb->mac_header
was the same thing as skb->data before it added ETH_HLEN. By just caching
it off before calling skb_pull_inline we can avoid a few unnecessary
instructions.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller -
This change makes it so that we process the address in
is_multicast_ether_addr at the same size as the other calls. This allows
us to avoid duplicate reads when used with other calls such as
is_zero_ether_addr or eth_addr_copy. In addition I have added a 64 bit
version of the function so in eth_type_trans we can process the destination
address as a 64 bit value throughout.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller -
This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to
512 so as a result we can actually ignore the lower 8b when comparing the
Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply
masking the value and comparing it to the byte swapped value for
ETH_P_802_3_MIN.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller
04 Mar, 2015
1 commit
-
Use the built-in function instead of memset.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
03 Mar, 2015
1 commit
-
Now that there are no more users kill dev_rebuild_header and all of it's
implementations.This is long overdue.
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
03 Jan, 2015
1 commit
-
Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.Signed-off-by: Jesse Gross
Signed-off-by: David S. Miller
06 Sep, 2014
1 commit
-
This patch updates some of the flow_dissector api so that it can be used to
parse the length of ethernet buffers stored in fragments. Most of the
changes needed were to __skb_get_poff as it needed to be updated to support
sending a linear buffer instead of a skb.I have split __skb_get_poff into two functions, the first is skb_get_poff
and it retains the functionality of the original __skb_get_poff. The other
function is __skb_get_poff which now works much like __skb_flow_dissect in
relation to skb_flow_dissect in that it provides the same functionality but
works with just a data buffer and hlen instead of needing an skb.Signed-off-by: Alexander Duyck
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
28 Aug, 2014
1 commit
-
DSA is currently registering one packet_type function per EtherType it
needs to intercept in the receive path of a DSA-enabled Ethernet device.
Right now we have three of them: trailer, DSA and eDSA, and there might
be more in the future, this will not scale to the addition of new
protocols.This patch proceeds with adding a new layer of abstraction and two new
functions:dsa_switch_rcv() which will dispatch into the tag-protocol specific
receive function implemented by net/dsa/tag_*.cdsa_slave_xmit() which will dispatch into the tag-protocol specific
transmit function implemented by net/dsa/tag_*.cWhen we do create the per-port slave network devices, we iterate over
the switch protocol to assign the DSA-specific receive and transmit
operations.A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact
that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER
like it used to be.This allows us to greatly simplify the check in eth_type_trans() and
always override the skb->protocol with ETH_P_XDSA for Ethernet switches
tagged protocol, while also reducing the number repetitive slave
netdevice_ops assignments.Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
16 Jul, 2014
1 commit
-
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.Coccinelle patch:
@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)v9: move comments here from the wrong commit
Signed-off-by: Tom Gundersen
Reviewed-by: David Herrmann
Signed-off-by: David S. Miller
17 Jan, 2014
1 commit
-
eth_type_trans() can read uninitialized memory as drivers
do not necessarily pull more than 14 bytes in skb->head before
calling it.As David suggested, we can use skb_header_pointer() to
fix this without breaking some drivers that might not expect
eth_type_trans() pulling 2 additional bytes.Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Signed-off-by: David S. Miller
01 Oct, 2013
1 commit
-
Mark code path's likely/unlikely based on most common usage.
* Very few devices use dsa tags.
* Most traffic is Ethernet (not 802.2)
* No sane person uses trailer type or Novell encapsulationSigned-off-by: Stephen Hemminger
Signed-off-by: David S. Miller