09 Jun, 2016
12 commits
-
Add a helper function: dsa_cpu_port_ethtool_init() which initializes a
custom ethtool_ops structure with custom DSA ethtool operations for CPU
ports. This is a preliminary change to move the initialization outside
of net/dsa/slave.c.Reviewed-by: Vivien Didelot
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
Mimic what net/dsa/dsa.c does and provide a slave MII bus by default
which will be created if the driver implements a phy_read method.Reviewed-by: Andrew Lunn
Reviewed-by: Vivien Didelot
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
Some drivers rely on these two bitmasks to contain the correct values
for them to successfully probe and initialize at drv->setup() time,
calculate correct values to put in both masks as early as possible in
dsa_get_ports_dn().Reviewed-by: Andrew Lunn
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
In case we have multiples trees and switches with the same index, we
need to add another discriminating id: the switch tree.Reviewed-by: Andrew Lunn
Reviewed-by: Vivien Didelot
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
"make htmldocs" complains otherwise:
.//net/core/gen_stats.c:168: warning: No description found for parameter 'running'
.//include/linux/netdevice.h:1867: warning: No description found for parameter 'qdisc_running_key'Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet
Reported-by: kbuild test robot
Signed-off-by: David S. Miller -
When in kdump kernel, reduce memory usage by only using a single Queue
Set for multiqueue devices. So make netif_get_num_default_rss_queues()
return one, when in kdump kernel.Signed-off-by: Hariprasad Shenai
Signed-off-by: David S. Miller -
Sudarsana Reddy Kalluru says:
====================
qed/qede support for dcbnl.This series adds the dcbnl functionality to the driver. Patch (1) adds
the qed infrastucture for querying/configuring the dcbx parameters.
Patch (2) adds the qed infrastructure for dcbnl APIs. And patch (3)
adds the qede support for dcbnl.
====================Signed-off-by: David S. Miller
-
This patch adds the interfaces for ieee/cee dcbnl callbacks and registers
them with the kernel.Signed-off-by: Sudarsana Reddy Kalluru
Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller -
This patch adds the implementation for both cee/ieee dcbnl callbacks by
using the qed query/config APIs.Signed-off-by: Sudarsana Reddy Kalluru
Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller -
Query API reads the dcbx data from the device shared memory and return it
to the caller. The config API configures the user provided dcbx values on
the device, and initiates the dcbx negotiation with the peer.Signed-off-by: Sudarsana Reddy Kalluru
Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller -
The CONFIG_ prefix should only be used for options which
can be configured through Kconfig and not for guarding headers.Signed-off-by: Andreas Ziegler
Signed-off-by: David S. Miller -
The CONFIG_ prefix should only be used for options which
can be configured through Kconfig and not for guarding headers.Signed-off-by: Andreas Ziegler
Signed-off-by: David S. Miller
08 Jun, 2016
28 commits
-
When setting up ILA in a router we noticed that the the encapsulation
is invoked twice: once in the route input path and again upon route
output. To resolve this we add a flag set_csum_neutral for the
ila_update_ipv6_locator. If this flag is set and the checksum
neutral bit is also set we assume that checksum-neutral translation
has already been performed and take no further action. The
flag is set only in ila_output path. The flag is not set for ila_input and
ila_xlat.Tested:
Used 3 netns to set to emulate a router and two hosts. The router
translates SIR addresses between the two destinations in other two netns.
Verified ping and netperf are functional.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
RFC 5961 advises to only accept RST packets containing a seq number
matching the next expected seq number instead of the whole receive
window in order to avoid spoofing attacks.However, this situation is not optimal in the case SACK is in use at the
time the RST is sent. I recently run into a scenario in which packet
losses were high while uploading data to a server, and userspace was
willing to frequently terminate connections by sending a RST. In
this case, the ACK sent on the receiver side (rcv_nxt) is frozen waiting
for a lost packet retransmission and SACK blocks are used to let the
client continue uploading data. At some point later on, the client sends
the RST (snd_nxt), which matches the next expected seq number of the
right-most SACK block on the receiver side which is going forward
receiving data.In this scenario, as RFC 5961 defines, the RST SEQ doesn't match the
frozen main ACK at receiver side and thus gets dropped and a challenge
ACK is sent, which gets usually lost due to network conditions. The main
consequence is that the connection stays alive for a while even if it
made sense to accept the RST. This can get really bad if lots of
connections like this one are created in few seconds, allocating all the
resources of the server easily.For security reasons, not all SACK blocks are checked (there could be a
big amount of SACK blocks => acceptable SEQ numbers). Furthermore, it
wouldn't make sense to check for RST in blocks other than the right-most
received one because the sender is not expected to be sending new data
after the RST. For simplicity, only up to the 4 most recently updated
SACK blocks (selective_acks[4] field) are compared to find the
right-most block, as usually those are the ones with bigger probability
to contain it.This patch was tested in a 3.18 kernel and probed to improve the
situation in the scenario described above.Signed-off-by: Pau Espin Pedrol
Acked-by: Eric Dumazet
Acked-by: Neal Cardwell
Tested-by: Neal Cardwell
Signed-off-by: David S. Miller -
In the current code "ent_per_page" could be more than "conn_num" making
"conn_num" negative after the subtraction. In the next iteration
through the loop then the negative is treated as a very high positive
meaning we don't put a limit on "ent_num". It could lead to memory
corruption.Fixes: dbb799c39717 ('qed: Initialize hardware for new protocols')
Signed-off-by: Dan Carpenter
Acked-by: Yuval Mintz
Signed-off-by: David S. Miller -
David Ahern says:
====================
net: vrf: Add support for local traffic to local addressesAdd support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:$ ip addr show dev red
33: red: mtu 65536 qdisc pfifo_fast state UP group default qlen 1000
link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
inet 1.1.1.1/32 scope global red
valid_lft forever preferred_lft forever
inet6 1111:1::1/128 scope global
valid_lft forever preferred_lft forever$ ip addr show dev eth1
3: eth1: mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms$ ping -c1 -I red 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms--- 2100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms$ ping6 -c1 -I red 1111::1
PING 1111::1(1111::1) from 1111:1::1 red: 56 data bytes
64 bytes from 1111::1: icmp_seq=1 ttl=64 time=0.187 ms--- 1111::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 msThis change also enables use of loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
====================Signed-off-by: David S. Miller
-
Add support for locally originated traffic to VRF-local IPv6 addresses.
Similar to IPv4 a local dst is set on the skb and the packet is
reinserted with a call to netif_rx. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:$ ip addr show dev eth1
4: eth1: mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
valid_lft forever preferred_lft forever$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 msip6_input is exported so the VRF driver can use it for the dst input
function. The dst_alloc function for IPv4 defaults to setting the input and
output functions; IPv6's does not. VRF does not need to duplicate the Rx path
so just export the ipv6 input function.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Add support for locally originated traffic to VRF-local addresses. If
destination device for an skb is the loopback or VRF device then set
its dst to a local version of the VRF cached dst_entry and call netif_rx
to insert the packet onto the rx queue - similar to what is done for
loopback. This patch handles IPv4 support; follow on patch handles IPv6.With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:$ ip addr show dev eth1
4: eth1: mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
valid_lft forever preferred_lft forever$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 msThis patch also enables use of IPv4 loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 msSigned-off-by: David Ahern
Signed-off-by: David S. Miller -
Move the stripping of the ethernet header from is_ip_tx_frame into the
ipv4 and ipv6 outbound functions and collapse vrf_send_v4_prep into
vrf_process_v4_outbound.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
This patch implements direct encapsulation of IPv4 and IPv6 packets
in UDP. This is done a version "1" of GUE and as explained in I-D
draft-ietf-nvo3-gue-03.Changes here are only in the receive path, fou with IPxIPx already
supports the transmit side. Both the normal receive path and
GRO path are modified to check for GUE version and check for
IP version in the case that GUE version is "1".Tested:
IPIP with direct GUE encap
1 TCP_STREAM
4530 Mbps
200 TCP_RR
1297625 tps
135/232/444 90/95/99% latenciesIP4IP6 with direct GUE encap
1 TCP_STREAM
4903 Mbps
200 TCP_RR
1184481 tps
149/253/473 90/95/99% latenciesIP6IP6 direct GUE encap
1 TCP_STREAM
5146 Mbps
200 TCP_RR
1202879 tps
146/251/472 90/95/99% latenciesSIT with direct GUE encap
1 TCP_STREAM
6111 Mbps
200 TCP_RR
1250337 tps
139/241/467 90/95/99% latenciesSigned-off-by: Tom Herbert
Signed-off-by: David S. Miller -
Eric Dumazet says:
====================
net: sched: faster stats gatheringA while back, I sent one RFC patch using lockless stats gathering
on 64bit arches.This patch series does it more cleanly, using a seqcount.
Since qdisc/class stats are written at dequeue() time,
we can ask the dequeue to change the seqcount, so that
stats readers can avoid taking the root qdisc lock,
and instead the typical read_seqcount_{begin|retry} guarded
loop.This does not change fast path costs, as the seqcount
increments are not more expensive than the bit manipulation,
and allows readers to not freeze the fast path anymore.
====================Signed-off-by: David S. Miller
-
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: Eric Dumazet
Cc: Cong Wang
Cc: Jamal Hadi Salim
Cc: John Fastabend
Cc: Kevin Athey
Cc: Xiaotian Pei
Signed-off-by: David S. Miller -
Instead of using a single bit (__QDISC___STATE_RUNNING)
in sch->__state, use a seqcount.This adds lockdep support, but more importantly it will allow us
to sample qdisc/class statistics without having to grab qdisc root lock.Signed-off-by: Eric Dumazet
Cc: Cong Wang
Cc: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Sathya Perla says:
====================
be2net: patch setHi David, the following patch set contains three non-critical fixes that
can go into the net-next tree.Patch 1 fixes the logic for provisioning queue pairs on VFs to take into
account the limit on number of TXQs too as in some profiles the number
of TXQs is less than that of RXQs.Patch 2 enables WoL support from shutdown on Skyhawk.
Patch 3 enhances the logic for provisioning queue pairs on VFs on
SR-IOV over multi-partition configs. Each PF (partition) on a port has to
compute the number of RSS tables it's VFs can use.
====================Signed-off-by: David S. Miller
-
Currently, we do not distribute queue resources to enable RSS for VFs
in multi-channel/partition configurations.
Fix this by having each PF(SRIOV capable) calculate it's share of the
15 RSS Policy Tables available per port before provisioning resources for
all the VFs.
This proportional share calculation is done based on division of the
PF's MAX VFs with the Total MAX VFs on that port. It also needs to
learn about the no: of NIC PFs on the port and subtract that from
the 15 RSS Policy Tables on the port.Signed-off-by: Somnath Kotur
Signed-off-by: Sathya Perla
Signed-off-by: David S. Miller -
Skyhawk does support wake-up from ACPI shutdown state - S5, provided the
platform supports it (like Auxiliary power source etc). The changes listed
below are done to fix this.1) There's no need to defer the HW configuration of WOL to be_suspend().
Remove this in be_suspend() and move it to be_set_wol() ethtool function
so it is configured directly in the context of ethtool. This automatically
takes care of the shutdown case.2) The driver incorrectly uses WOL_CAP field in the FW response to
get_acpi_wol_cap() command, to determine if WOL is enabled. Instead the
driver must rely on the macaddr field in the response to infer WOL state.3) In be_get_config() during init, if we find that WOL is enabled in FW,
call pci_enable_wake() to enable pmcsr.pme_en bit. This is needed to
support persistent WOL configuration provided by the FW in some platforms.4) Remove code in be_set_wol() that writes to PCICFG_PM_CONTROL_OFFSET
to set pme_en bit; pci_enable_wake() sets that.Fixes: 028991e49 ("Enabling Wake-on-LAN is not supported in S5 state")
Signed-off-by: Sriharsha Basavapatna
Signed-off-by: Sathya Perla
Signed-off-by: David S. Miller -
When the PF driver provisions resources for VFs, it currently only looks
at max RSS queues available to calculate the number of VF queue pairs.
This logic breaks when there are less number of TX-queues than RSS-queues.
This patch fixes this problem by using the max-TXQs available in the
PF-pool in the calculations. As a part of this change the
be_calculate_vf_qs() routine is renamed as be_calculate_vf_res() and the
code that calculates limits on other related resources is moved here to
contain all resource calculation code inside one routine.Signed-off-by: Suresh Reddy
Signed-off-by: Sathya Perla
Signed-off-by: David S. Miller -
The driver add hdlc support for Freescale QUICC Engine.
It support NMSI and TSA mode.Signed-off-by: Zhao Qiang
Signed-off-by: David S. Miller -
QE has module to support TDM, some other protocols
supported by QE are based on TDM.
add a qe-tdm lib, this lib provides functions to the protocols
using TDM to configurate QE-TDM.Signed-off-by: Zhao Qiang
Signed-off-by: David S. Miller -
Signed-off-by: Zhao Qiang
Signed-off-by: David S. Miller -
Add tdm clock configuration in both qe clock system and ucc
fast controller.Signed-off-by: Zhao Qiang
Signed-off-by: David S. Miller -
Rx_sync and tx_sync are used by QE-TDM mode,
add them to struct ucc_fast_info.Signed-off-by: Zhao Qiang
Signed-off-by: David S. Miller -
Signed-off-by: Jamal Hadi Salim
Acked-by: Cong Wang -
Jamal Hadi Salim says:
====================
net sched action timestamp improvementsVarious aggregations of duplicated code, fixes and introduction of firstused
timestampv2: add const for source time info per suggestion from Cong
====================Signed-off-by: David S. Miller
-
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Useful to know when the action was first used for accounting
(and debugging)Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
In order to make a filter processed only by hardware, skip_sw flag
should be supplied. This is an addition to the already existing skip_hw
flag (filter will be processed by software only). If no flag is
specified, filter will be processed by both software and hardware.If only hardware offloaded filters exist, fl_classify() will return
without doing anything.A following userspace patch will be sent once kernel patch is accepted.
Example:
tc filter add dev enp0s9 protocol ip prio 20 parent ffff: \
flower \
ip_proto 6 \
indev enp0s9 \
skip_sw \
action skbedit mark 0x1234Signed-off-by: Amir Vadai
Acked-by: Jiri Pirko
Acked-by: John Fastabend
Signed-off-by: David S. Miller -
Yuval Mintz says:
====================
qed: IOV series - relax firmware requirementsIn order for VFs to work, current implementation demands that the VF's
requried storm firmware would be exactly the version that was loaded by
the PF, which is a very harsh requirement.
This patch series is intended to relax this -
the recently submitted firmware is intended to be forward/backward
compatible in its fastpath [slowpath is configured by PF on behalf of VF],
and so VFs would only be required of having the same major faspath HSI in
order to work.Most of the other patches in this series extend current forward
compatibilty of driver to reduce chance of breaking PF/VF compatibility
in the future. A few are unrelated IOV changes.
====================Signed-off-by: David S. Miller
-
If a future VF would send the PF an unknown message, the PF today would
not send a reply. This would have 2 bad effects:
a. VF would have to timeout on the request.
b. If VF were to send an additional message to PF, firmware would mark
it as malicious.Instead, if there's some valid reply-address on the message - let the PF
answer and tell the VF it doesn't know the message.Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller