10 Jan, 2017
40 commits
-
Support for SMC socket monitoring via netlink sockets of protocol
NETLINK_SOCK_DIAG.Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
smc_shutdown() and smc_release() handling
delayed linkgroup cleanup for linkgroups without connectionsSigned-off-by: Ursula Braun
Signed-off-by: David S. Miller -
move RMBE data into user space buffer and update managing cursors
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
copy data to kernel send buffer, and trigger RDMA write
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
send and receive CDC messages (via IB message send and CQE)
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
send and receive LLC messages CONFIRM_LINK (via IB message send and CQE)
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
Prepare the link for RDMA transport:
Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
The base containers for RDMA transport are work requests and completion
queue entries processed through Infiniband verbs:
* allocate and initialize these areas
* map these areas to DMA
* implement the basic communication consisting of work request posting
and receival of completion queue eventsSigned-off-by: Ursula Braun
Signed-off-by: David S. Miller -
* allocate data RMB memory for sending and receiving
* size depends on the maximum socket send and receive buffers
* allocated RMBs are kept during life time of the owning link group
* map the allocated RMBs to DMASigned-off-by: Ursula Braun
Signed-off-by: David S. Miller -
* create smc_connection for SMC-sockets
* determine suitable link group for a connection
* create a new link group if necessarySigned-off-by: Ursula Braun
Signed-off-by: David S. Miller -
* CLC (Connection Layer Control) handshake
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
Connection creation with SMC-R starts through an internal
TCP-connection. The Ethernet interface for this TCP-connection is not
restricted to the Ethernet interface of a RoCE device. Any existing
Ethernet interface belonging to the same physical net can be used, as
long as there is a defined relation between the Ethernet interface and
some RoCE devices. This relation is defined with the help of an
identification string called "Physical Net ID" or short "pnet ID".
Information about defined pnet IDs and their related Ethernet
interfaces and RoCE devices is stored in the SMC-R pnet table.A pnet table entry consists of the identifying pnet ID and the
associated network and IB device.
This patch adds pnet table configuration support using the
generic netlink message interface referring to network and IB device
by their names. Commands exist to add, delete, and display pnet table
entries, and to flush or display the entire pnet table.There are cross-checks to verify whether the ethernet interfaces
or infiniband devices really exist in the system. If either device
is not available, the pnet ID entry is not created.
Loss of network devices and IB devices is also monitored;
a pnet ID entry is removed when an associated network or
IB device is removed.Signed-off-by: Thomas Richter
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
* create a list of SMC IB-devices
Signed-off-by: Ursula Braun
Signed-off-by: David S. Miller -
* enable smc module loading and unloading
* register new socket family
* basic smc socket creation and deletion
* use backing TCP socket to run CLC (Connection Layer Control)
handshake of SMC protocol
* Setup for infiniband traffic is implemented in follow-on patches.
For now fallback to TCP socket is always used.Signed-off-by: Ursula Braun
Reviewed-by: Utz Bacher
Signed-off-by: David S. Miller -
Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.Signed-off-by: Ursula Braun
Reviewed-by: Utz Bacher
Signed-off-by: David S. Miller -
Niklas Söderlund says:
====================
sh_eth: add wake-on-lan support via magic packetThis series adds support for Wake-on-Lan using Magic Packet for a few
models of the sh_eth driver. Patch 1/6 fix a naming error, patch 2/6
adds generic support to control and support WoL while patches 3/6 - 6/6
enable different models.Based ontop of net-next master.
Changes since v2.
- Fix bookkeeping for "active_count" and "event_count" reported in
/sys/kernel/debug/wakeup_sources. Thanks Geert for noticing this.
- Add new patch 1/6 which corrects the name of ECMR_MPDE bit, suggested
by Sergei.
- s/sh7743/sh7734/ in patch 5/6. Thanks Geert for spotting this.
- Spelling improvements suggested by Sergei and Geert.
- Add Tested-by to 3/6 and 4/6.Changes since v1.
- Split generic WoL functionality and device enablement to different
patches.
- Enable more devices then Gen2 after feedback from Geert and
datasheets.
- Do not set mdp->irq_enabled = false and remove specific MagicPacket
interrupt clearing, instead let sh_eth_error() clear the interrupt as
for other EMAC interrupts, thanks Sergei for the suggestion.
- Use the original return logic in sh_eth_resume().
- Moved sh_eth_private variable *clk to top of data structure to avoid
possible gaps due to alignment restrictions.
- Make wol_enabled in sh_eth_private part of the already existing
bitfield instead of a bool.
- Do not initiate mdp->wol_enabled to 0, the struct is kzalloc'ed so
it's already set to 0.
====================Signed-off-by: David S. Miller
-
This is based on public datasheet for sh7763 which shows it has the
same behavior and registers for WoL as other versions of sh_eth.Signed-off-by: Niklas Söderlund
Signed-off-by: David S. Miller -
This is based on public datasheet for sh7734 which shows it has the
same behavior and registers for WoL as other versions of sh_eth.Signed-off-by: Niklas Söderlund
Signed-off-by: David S. Miller -
Geert Uytterhoeven reported WoL worked on his Armadillo board.
Signed-off-by: Niklas Söderlund
Tested-by: Geert Uytterhoeven
Signed-off-by: David S. Miller -
Tested on Gen2 r8a7791/Koelsch.
Signed-off-by: Niklas Söderlund
Tested-by: Geert Uytterhoeven
Signed-off-by: David S. Miller -
Add generic functionality to support Wake-on-LAN using MagicPacket which
are supported by at least a few versions of sh_eth. Only add
functionality for WoL, no specific sh_eth versions are marked to support
WoL yet.WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that sh_eth was not suspended to reduce resume time. To reset the
hardware the driver closes and reopens the device just like it would do
in a normal suspend/resume scenario without WoL enabled, but it both
closes and opens the device in the resume callback since the device
needs to be open for WoL to work.One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.Signed-off-by: Niklas Söderlund
Signed-off-by: David S. Miller -
This bit was wrongly named due to a typo, Sergei checked the SH7734/63
manuals and this bit should be named MPDE.Suggested-by: Sergei Shtylyov
Signed-off-by: Niklas Söderlund
Signed-off-by: David S. Miller -
Jesper Dangaard Brouer says:
====================
net: optimize ICMP-reply code pathThis patchset is optimizing the ICMP-reply code path, for ICMP packets
that gets rate limited. A remote party can easily trigger this code
path by sending packets to port number with no listening service.Generally the patchset moves the sysctl_icmp_msgs_per_sec ratelimit
checking to earlier in the code path and removes an allocation.Use-case: The specific case I experienced this being a bottleneck is,
sending UDP packets to a port with no listener, which obviously result
in kernel replying with ICMP Destination Unreachable (type:3), Port
Unreachable (code:3), which cause the bottleneck.After Eric and Paolo optimized the UDP socket code, the kernels PPS
processing capabilities is lower for no-listen ports, than normal UDP
sockets. This is bad for capacity planning when restarting a service.UDP no-listen benchmark 8xCPUs using pktgen_sample04_many_flows.sh:
Baseline: 6.6 Mpps
Patch: 14.7 Mpps
Driver mlx5 at 50Gbit/s.
====================Signed-off-by: David S. Miller
-
It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.The icmp_global_allow() call states it must be called with BH
disabled. This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock. Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.Suggested-by: Eric Dumazet
Signed-off-by: Jesper Dangaard Brouer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
This patch split the global and per (inet)peer ICMP-reply limiter
code, and moves the global limit check to earlier in the packet
processing path. Thus, avoid spending cycles on ICMP replies that
gets limited/suppressed anyhow.The global ICMP rate limiter icmp_global_allow() is a good solution,
it just happens too late in the process. The kernel goes through the
full route lookup (return path) for the ICMP message, before taking
the rate limit decision of not sending the ICMP reply.Details: The kernels global rate limiter for ICMP messages got added
in commit 4cdf507d5452 ("icmp: add a global rate limitation"). It is
a token bucket limiter with a global lock. It brilliantly avoids
locking congestion by only updating when 20ms (HZ/50) were elapsed. It
can then avoids taking lock when credit is exhausted (when under
pressure) and time constraint for refill is not yet meet.Signed-off-by: Jesper Dangaard Brouer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
on stack"), because struct icmp_bxm no really a large struct, and
allocating and free of this small 112 bytes hurts performance.Fixes: 9a99d4a50cb8 ("icmp: avoid allocating large struct on stack")
Signed-off-by: Jesper Dangaard Brouer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
…git/dhowells/linux-fs
David Howells says:
====================
afs: Refcount afs_call structThese patches provide some tracepoints for AFS and fix a potential leak by
adding refcounting to the afs_call struct.The patches are:
(1) Add some tracepoints for logging incoming calls and monitoring
notifications from AF_RXRPC and data reception.(2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
initially expected. It can be brought back later if needed. This
clears some stuff out that I don't then need to fix up in (4).(3) Allow listen(..., 0) to be used to disable listening. This makes
shutting down the AFS cache manager server in the kernel much easier
and the accounting simpler as we can then be sure that (a) all
preallocated afs_call structs are relesed and (b) no new incoming
calls are going to be started.For the moment, listening cannot be reenabled.
(4) Add refcounting to the afs_call struct to fix a potential multiple
release detected by static checking and add a tracepoint to follow the
lifecycle of afs_call objects.
====================Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli says:
====================
net: dsa: Make dsa_switch_ops constThis patch series allows us to annotate dsa_switch_ops with a const
qualifier.
====================Signed-off-by: David S. Miller
-
Now that we have properly encapsulated and made drivers utilize exported
functions, we can switch dsa_switch_ops to be a annotated with const.Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
In preparation for making struct dsa_switch_ops const, encapsulate it
within a dsa_switch_driver which has a list pointer and a pointer to
dsa_switch_ops. This allows us to take the list_head pointer out of
dsa_switch_ops, which is written to by {un,}register_switch_driver.Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
Utilize the b53 exported functions to fill our bcm_sf2_ops structure,
also making it clear what we utilize and what we specifically override.Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
In preparation for making dsa_switch_ops const, export b53 operations
utilized by other drivers such as bcm_sf2.Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller -
Sergei Shtylyov says:
====================
sh_eth: "intgelligent checksum" related cleanupsHere's a set of 2 patches against DaveM's 'net.git' repo, as they are based
on a couple patches merged there recently; however, the patches are destined
for 'net-next.git' (once 'net.git' gets merged there next time). I'm cleaning
up the "intelligent checksum" related code (however, the driver only disables
this feature for now, theres's no proper offload supprt yet).
====================Signed-off-by: David S. Miller
-
The 'struct sh_eth_cpu_data' field indicating the "intelligent checksum"
support was misnamed 'hw_crc' -- rename it to 'hw_checksum'.Signed-off-by: Sergei Shtylyov
Signed-off-by: David S. Miller -
After checking all the available manuals, I have enough information to
conclude that the 'shift_rd0' flag is only relevant for the Ether cores
supporting so called "intelligent checksum" (and hence having CSMR) which
is indicated by the 'hw_crc' flag. Since all the relevant SoCs now have
both these flags set, we can at last get rid of the former flag...Signed-off-by: Sergei Shtylyov
Signed-off-by: David S. Miller -
While in RUNNING state, phy_state_machine() checks for link changes by
comparing phydev->link before and after calling phy_read_status().
This works as long as it is guaranteed that phydev->link is never
changed outside the phy_state_machine().If in some setups this happens, it causes the state machine to miss
a link loss and remain RUNNING despite phydev->link being 0.This has been observed running a dsa setup with a process continuously
polling the link states over ethtool each second (SNMPD RFC-1213
agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
call phy_read_status() and with that modify the link status - and
with that bricking the phy state machine.This patch adds a fail-safe check while in RUNNING, which causes to
move to CHANGELINK when the link is gone and we are still RUNNING.Signed-off-by: Zefir Kurtisi
Reviewed-by: Florian Fainelli
Signed-off-by: David S. Miller -
Pull networking fixes from David Miller:
1) Fix dumping of nft_quota entries, from Pablo Neira Ayuso.
2) Fix out of bounds access in nf_tables discovered by KASAN, from
Florian Westphal.3) Fix IRQ enabling in dp83867 driver, from Grygorii Strashko.
4) Fix unicast filtering in be2net driver, from Ivan Vecera.
5) tg3_get_stats64() can race with driver close and ethtool
reconfigurations, fix from Michael Chan.6) Fix error handling when pass limit is reached in bpf code gen on
x86. From Daniel Borkmann.7) Don't clobber switch ops and use proper MDIO nested reads and writes
in bcm_sf2 driver, from Florian Fainelli.* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
net: dsa: bcm_sf2: Utilize nested MDIO read/write
net: dsa: bcm_sf2: Do not clobber b53_switch_ops
net: stmmac: fix maxmtu assignment to be within valid range
bpf: change back to orig prog on too many passes
tg3: Fix race condition in tg3_get_stats64().
be2net: fix unicast list filling
be2net: fix accesses to unicast list
netlabel: add CALIPSO to the list of built-in protocols
vti6: fix device register to report IFLA_INFO_KIND
net: phy: dp83867: fix irq generation
amd-xgbe: Fix IRQ processing when running in single IRQ mode
sh_eth: R8A7740 supports packet shecksumming
sh_eth: fix EESIPR values for SH77{34|63}
r8169: fix the typo in the comment
nl80211: fix sched scan netlink socket owner destruction
bridge: netfilter: Fix dropping packets that moving through bridge interface
netfilter: ipt_CLUSTERIP: check duplicate config when initializing
netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
netfilter: nf_tables: fix oob access
netfilter: nft_queue: use raw_smp_processor_id()
... -
Joao Pinto says:
====================
adding new glue driver dwmac-dwc-qos-ethThis patch set contains the porting of the synopsys/dwc_eth_qos.c driver
to the stmmac structure. This operation resulted in the creation of a new
platform glue driver called dwmac-dwc-qos-eth which was based in the
dwc_eth_qos as is.dwmac-dwc-qos-eth inherited dwc_eth_qos DT bindings, to assure that current
and old users can continue to use it as before. We can see this driver as
being deprecated, since all new development will be done in stmmac.Please check each patch for implementation details.
====================Tested-by: Niklas Cassel
Reviewed-by: Lars Persson
Acked-by: Alexandre TORGUE
Signed-off-by: David S. Miller -
This patch adds a new glue driver called dwmac-dwc-qos-eth which
was based in the dwc_eth_qos as is. To assure retro-compatibility a slight
tweak was also added to stmmac_platform.Signed-off-by: Joao Pinto
Tested-by: Niklas Cassel
Reviewed-by: Lars Persson
Acked-by: Alexandre TORGUE
Signed-off-by: David S. Miller