04 Feb, 2017
1 commit
-
[ Upstream commit 7be2c82cfd5d28d7adb66821a992604eb6dd112e ]
Ashizuka reported a highmem oddity and sent a patch for freescale
fec driver.But the problem root cause is that core networking stack
must ensure no skb with highmem fragment is ever sent through
a device that does not assert NETIF_F_HIGHDMA in its features.We need to call illegal_highdma() from harmonize_features()
regardless of CSUM checks.Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: Eric Dumazet
Cc: Pravin Shelar
Reported-by: "Ashizuka, Yuusuke"
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
15 Jan, 2017
2 commits
-
[ Upstream commit 7cfd5fd5a9813f1430290d20c0fead9b4582a307 ]
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1272ce87fa017ca4cf32920764d879656b7a005a ]
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0. However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.This patch adds the check by capping frag0_len with the skb tailroom.
Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman
Signed-off-by: Herbert Xu
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
13 Nov, 2016
1 commit
-
If the bpf program calls bpf_redirect(dev, 0) and dev is
an ipip/ip6tnl, it currently includes the mac header.
e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
of IP-IP.The fix is to pull the mac header. At ingress, skb_postpull_rcsum()
is not needed because the ethhdr should have been pulled once already
and then got pushed back just before calling the bpf_prog.
At egress, this patch calls skb_postpull_rcsum().If bpf_redirect(dev, BPF_F_INGRESS) is called,
it also fails now because it calls dev_forward_skb() which
eventually calls eth_type_trans(skb, dev). The eth_type_trans()
will set skb->type = PACKET_OTHERHOST because the mac address
does not match the redirecting dev->dev_addr. The PACKET_OTHERHOST
will eventually cause the ip_rcv() errors out. To fix this,
____dev_forward_skb() is added.Joint work with Daniel Borkmann.
Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: Martin KaFai Lau
Signed-off-by: David S. Miller
01 Nov, 2016
1 commit
-
Sending zero checksum is ok for TCP, but not for UDP.
UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.Simply replace such checksum by 0xffff, regardless of transport.
This error was caught on SIT tunnels, but seems generic.
Signed-off-by: Eric Dumazet
Cc: Maciej Żenczykowski
Cc: Willem de Bruijn
Acked-by: Maciej Żenczykowski
Signed-off-by: David S. Miller
30 Oct, 2016
2 commits
-
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.19) Revert a netns locking change that causes regressions, from Paul
Moore.20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
... -
When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c
ethtool -K eth0 tso off tx on
psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -Nethtool -K eth0 tx off
psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -Nv2:
- add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
21 Oct, 2016
1 commit
-
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.Thanks to Vladimír Beneš for the initial bug report.
Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca
Reviewed-by: Jiri Benc
Acked-by: Hannes Frederic Sowa
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
19 Oct, 2016
1 commit
-
Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.8.0-rc7 #1
Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
task: ffff88017edfea40 task.stack: ffff88017ee10000
RIP: 0010:[] [] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
Call Trace:
[] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
[] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
[] notifier_call_chain+0x4a/0x70
[] atomic_notifier_call_chain+0x1a/0x20
[] call_netevent_notifiers+0x1b/0x20
[] neigh_update+0x306/0x740
[] neigh_event_ns+0x4e/0xb0
[] arp_process+0x66f/0x700
[] ? common_interrupt+0x8c/0x8c
[] arp_rcv+0x139/0x1d0
[] ? vlan_do_receive+0xda/0x320
[] __netif_receive_skb_core+0x524/0xab0
[] ? dev_queue_xmit+0x10/0x20
[] ? br_forward_finish+0x3d/0xc0 [bridge]
[] ? br_handle_vlan+0xf6/0x1b0 [bridge]
[] __netif_receive_skb+0x18/0x60
[] netif_receive_skb_internal+0x40/0xb0
[] netif_receive_skb+0x1c/0x70
[] br_pass_frame_up+0xc6/0x160 [bridge]
[] ? deliver_clone+0x37/0x50 [bridge]
[] ? br_flood+0xcc/0x160 [bridge]
[] br_handle_frame_finish+0x224/0x4f0 [bridge]
[] br_handle_frame+0x174/0x300 [bridge]
[] __netif_receive_skb_core+0x329/0xab0
[] ? find_next_bit+0x15/0x20
[] ? cpumask_next_and+0x32/0x50
[] ? load_balance+0x178/0x9b0
[] __netif_receive_skb+0x18/0x60
[] netif_receive_skb_internal+0x40/0xb0
[] netif_receive_skb+0x1c/0x70
[] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
[] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
[] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
[] tasklet_action+0xf6/0x110
[] __do_softirq+0xf6/0x280
[] irq_exit+0xdf/0xf0
[] do_IRQ+0x54/0xd0
[] common_interrupt+0x8c/0x8cThe problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.Fix this by advancing the iterator and avoid the infinite loop.
Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel
Reported-by: Tamir Winetroub
Reviewed-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller
16 Oct, 2016
1 commit
-
Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin
11 Oct, 2016
1 commit
-
The __latent_entropy gcc attribute can be used only on functions and
variables. If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents. The variable must
be an integer, an integer array type or a structure with integer fields.These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.Signed-off-by: Emese Revfy
[kees: expanded commit message]
Signed-off-by: Kees Cook
04 Oct, 2016
1 commit
-
This is a respin of a patch to fix a relatively easily reproducible kernel
panic related to the all_adj_list handling for netdevs in recent kernels.The following sequence of commands will reproduce the issue:
ip link add link eth0 name eth0.100 type vlan id 100
ip link add link eth0 name eth0.200 type vlan id 200
ip link add name testbr type bridge
ip link set eth0.100 master testbr
ip link set eth0.200 master testbr
ip link add link testbr mac0 type macvlan
ip link delete dev testbrThis creates an upper/lower tree of (excuse the poor ASCII art):
/---eth0.100-eth0
mac0-testbr-
\---eth0.200-eth0When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
reference to eth0 is added, so this results in a panic.This change adds reference count propagation so things are handled properly.
Matthias Schiffer reported a similar crash in batman-adv:
https://github.com/freifunk-gluon/gluon/issues/680
https://www.open-mesh.org/issues/247which this patch also seems to resolve.
Signed-off-by: Andrew Collins
Signed-off-by: David S. Miller
26 Sep, 2016
1 commit
-
Conflicts:
net/netfilter/core.c
net/netfilter/nf_tables_netdev.cResolve two conflicts before pull request for David's net-next tree:
1) Between c73c24849011 ("netfilter: nf_tables_netdev: remove redundant
ip_hdr assignment") from the net tree and commit ddc8b6027ad0
("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()").2) Between e8bffe0cf964 ("net: Add _nf_(un)register_hooks symbols") and
Aaron Conole's patches to replace list_head with single linked list.Signed-off-by: Pablo Neira Ayuso
25 Sep, 2016
1 commit
-
This commit ensures that the rcu read-side lock is held while the
ingress hook is called. This ensures that a call to nf_hook_slow (and
ultimately nf_ingress) will be read protected.Signed-off-by: Aaron Conole
Signed-off-by: Pablo Neira Ayuso
13 Sep, 2016
1 commit
-
Conflicts:
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/qlogic/qed/qed_dcbx.c
drivers/net/phy/KconfigAll conflicts were cases of overlapping commits.
Signed-off-by: David S. Miller
11 Sep, 2016
1 commit
-
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.Signed-off-by: Javier Martinez Canillas
Signed-off-by: David S. Miller
05 Sep, 2016
1 commit
-
Following few steps will crash kernel -
(a) Create bonding master
> modprobe bonding miimon=50
(b) Create macvlan bridge on eth2
> ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
type macvlan
(c) Now try adding eth2 into the bond
> echo +eth2 > /sys/class/net/bond0/bonding/slaves
Bonding does lots of things before checking if the device enslaved is
busy or not.In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.Signed-off-by: Mahesh Bandewar
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
31 Aug, 2016
1 commit
-
After commit 145dd5f9c88f ("net: flush the softnet backlog in process
context"), we can easily batch calls to flush_all_backlogs() for all
devices processed in rollback_registered_many()Tested:
Before patch, on an idle host.
modprobe dummy numdummies=10000
perf stat -e context-switches -a rmmod dummyPerformance counter stats for 'system wide':
1,211,798 context-switches
1.302137465 seconds time elapsed
After patch:
perf stat -e context-switches -a rmmod dummy
Performance counter stats for 'system wide':
225,523 context-switches
0.721623566 seconds time elapsed
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
27 Aug, 2016
2 commits
-
switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Currently in process_backlog(), the process_queue dequeuing is
performed with local IRQ disabled, to protect against
flush_backlog(), which runs in hard IRQ context.This patch moves the flush operation to a work queue and runs the
callback with bottom half disabled to protect the process_queue
against dequeuing.
Since process_queue is now always manipulated in bottom half context,
the irq disable/enable pair around the dequeue operation are removed.To keep the flush time as low as possible, the flush
works are scheduled on all online cpu simultaneously, using the
high priority work-queue and statically allocated, per cpu,
work structs.Overall this change increases the time required to destroy a device
to improve slightly the packets reinjection performances.Acked-by: Hannes Frederic Sowa
Signed-off-by: Paolo Abeni
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Aug, 2016
1 commit
-
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.Signed-off-by: David S. Miller
14 Aug, 2016
1 commit
-
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:eth0
Signed-off-by: David S. Miller
11 Aug, 2016
1 commit
-
Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.Reviewed-by: Cong Wang
Signed-off-by: Jiri Kosina
Signed-off-by: David S. Miller
29 Jul, 2016
1 commit
-
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash
20 Jul, 2016
1 commit
-
Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.Signed-off-by: Brenden Blanco
Signed-off-by: David S. Miller
10 Jul, 2016
1 commit
-
An important information for the napi_poll tracepoint is knowing
the work done (packets processed) by the napi_poll() call. Add
both the work done and budget, as they are related.Handle trace_napi_poll() param change in dropwatch/drop_monitor
and in python perf script netdev-times.py in backward compat way,
as python fortunately supports optional parameter handling.Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
06 Jul, 2016
1 commit
-
L2 upper device needs to propagate neigh_construct/destroy calls down to
lower devices. Do this by defining default ndo functions and use them in
team, bond, bridge and vlan.Signed-off-by: Jiri Pirko
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
05 Jul, 2016
1 commit
-
Add functions that iterate over lower devices and find port device.
As a dependency add netdev_for_each_all_lower_dev and
netdev_for_each_all_lower_dev_rcu macro with
netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.Also, add functions to return mlxsw struct according to lower device
found and mlxsw_port struct with a reference to lower device.Signed-off-by: Jiri Pirko
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
26 Jun, 2016
1 commit
-
Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc->enqueue(), and in qdisc_drop() helper.I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.Signed-off-by: Eric Dumazet
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
17 Jun, 2016
2 commits
-
The space is missing after ',', and this will introduce much more
noise when checking patch around.Signed-off-by: Wei Tang
Signed-off-by: David S. Miller -
This patch fixes the checkpatch.pl error to dev.c:
ERROR: do not initialise statics to 0
Signed-off-by: Wei Tang
Signed-off-by: David S. Miller
11 Jun, 2016
2 commits
-
We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time. It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.Cc: Vegard Nossum
Cc: George Spelvin
Cc: Al Viro
Signed-off-by: Linus Torvalds -
Respect the stack's xmit_recursion limit for calls into dev_queue_xmit().
Currently, they are not handeled by the limiter when attached to clsact's
egress parent, for example, and a buggy program redirecting it to the
same device again could run into stack overflow eventually. It would be
good if we could notify an admin to give him a chance to react. We reuse
xmit_recursion instead of having one private to eBPF, so that the stack's
current recursion depth will be taken into account as well. Follow-up to
commit 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") and
27b29f63058d ("bpf: add bpf_redirect() helper").Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
09 Jun, 2016
1 commit
-
When in kdump kernel, reduce memory usage by only using a single Queue
Set for multiqueue devices. So make netif_get_num_default_rss_queues()
return one, when in kdump kernel.Signed-off-by: Hariprasad Shenai
Signed-off-by: David S. Miller
08 Jun, 2016
2 commits
-
Instead of using a single bit (__QDISC___STATE_RUNNING)
in sch->__state, use a seqcount.This adds lockdep support, but more importantly it will allow us
to sample qdisc/class statistics without having to grab qdisc root lock.Signed-off-by: Eric Dumazet
Cc: Cong Wang
Cc: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Note: Tom Herbert posted almost same patch 3 months back, but for
different reasons.The reasons we want to get rid of this spin_trylock() are :
1) Under high qdisc pressure, the spin_trylock() has almost no
chance to succeed.2) We loop multiple times in softirq handler, eventually reaching
the max retry count (10), and we schedule ksoftirqd.Since we want to adhere more strictly to ksoftirqd being waked up in
the future (https://lwn.net/Articles/687617/), better avoid spurious
wakeups.3) calls to __netif_reschedule() dirty the cache line containing
q->next_sched, slowing down the owner of qdisc.4) RT kernels can not use the spin_trylock() here.
With help of busylock, we get the qdisc spinlock fast enough, and
the trylock trick brings only performance penalty.Depending on qdisc setup, I observed a gain of up to 19 % in qdisc
performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel)("mpstat -I SCPU 1" is much happier now)
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
17 May, 2016
1 commit
-
Follow-up for 8a3a4c6e7b34 ("net: make sch_handle_ingress() drop
monitor ready") to also make the egress side drop monitor ready.Also here only TC_ACT_SHOT is a clear indication that something
went wrong. Hence don't provide false positives to drop monitors
such as 'perf record -e skb:kfree_skb ...'.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
12 May, 2016
1 commit
-
Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which adds
overhead and makes features such as retaining the ingress device index
more complicated than necessary.This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
in the future.dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).Signed-off-by: David Ahern
Signed-off-by: David S. Miller
09 May, 2016
1 commit
-
TC_ACT_STOLEN is used when ingress traffic is mirred/redirected
to say ifb.Packet is not dropped, but consumed.
Only TC_ACT_SHOT is a clear indication something went wrong.
Signed-off-by: Eric Dumazet
Cc: Jamal Hadi Salim
Acked-by: Alexei Starovoitov
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
05 May, 2016
1 commit
-
This change makes it so that we will strip the TSO_MANGLEID bit if TSO is
not present. This way we will also handle ECN correctly of TSO is not
present.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller