10 Jan, 2017

40 commits

  • Support for SMC socket monitoring via netlink sockets of protocol
    NETLINK_SOCK_DIAG.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • smc_shutdown() and smc_release() handling
    delayed linkgroup cleanup for linkgroups without connections

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • move RMBE data into user space buffer and update managing cursors

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • copy data to kernel send buffer, and trigger RDMA write

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • send and receive CDC messages (via IB message send and CQE)

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • send and receive LLC messages CONFIRM_LINK (via IB message send and CQE)

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Prepare the link for RDMA transport:
    Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The base containers for RDMA transport are work requests and completion
    queue entries processed through Infiniband verbs:
    * allocate and initialize these areas
    * map these areas to DMA
    * implement the basic communication consisting of work request posting
    and receival of completion queue events

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • * allocate data RMB memory for sending and receiving
    * size depends on the maximum socket send and receive buffers
    * allocated RMBs are kept during life time of the owning link group
    * map the allocated RMBs to DMA

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • * create smc_connection for SMC-sockets
    * determine suitable link group for a connection
    * create a new link group if necessary

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • * CLC (Connection Layer Control) handshake

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Connection creation with SMC-R starts through an internal
    TCP-connection. The Ethernet interface for this TCP-connection is not
    restricted to the Ethernet interface of a RoCE device. Any existing
    Ethernet interface belonging to the same physical net can be used, as
    long as there is a defined relation between the Ethernet interface and
    some RoCE devices. This relation is defined with the help of an
    identification string called "Physical Net ID" or short "pnet ID".
    Information about defined pnet IDs and their related Ethernet
    interfaces and RoCE devices is stored in the SMC-R pnet table.

    A pnet table entry consists of the identifying pnet ID and the
    associated network and IB device.
    This patch adds pnet table configuration support using the
    generic netlink message interface referring to network and IB device
    by their names. Commands exist to add, delete, and display pnet table
    entries, and to flush or display the entire pnet table.

    There are cross-checks to verify whether the ethernet interfaces
    or infiniband devices really exist in the system. If either device
    is not available, the pnet ID entry is not created.
    Loss of network devices and IB devices is also monitored;
    a pnet ID entry is removed when an associated network or
    IB device is removed.

    Signed-off-by: Thomas Richter
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Thomas Richter
     
  • * create a list of SMC IB-devices

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • * enable smc module loading and unloading
    * register new socket family
    * basic smc socket creation and deletion
    * use backing TCP socket to run CLC (Connection Layer Control)
    handshake of SMC protocol
    * Setup for infiniband traffic is implemented in follow-on patches.
    For now fallback to TCP socket is always used.

    Signed-off-by: Ursula Braun
    Reviewed-by: Utz Bacher
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Direct call of tcp_set_keepalive() function from protocol-agnostic
    sock_setsockopt() function in net/core/sock.c violates network
    layering. And newly introduced protocol (SMC-R) will need its own
    keepalive function. Therefore, add "keepalive" function pointer
    to "struct proto", and call it from sock_setsockopt() via this pointer.

    Signed-off-by: Ursula Braun
    Reviewed-by: Utz Bacher
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Niklas Söderlund says:

    ====================
    sh_eth: add wake-on-lan support via magic packet

    This series adds support for Wake-on-Lan using Magic Packet for a few
    models of the sh_eth driver. Patch 1/6 fix a naming error, patch 2/6
    adds generic support to control and support WoL while patches 3/6 - 6/6
    enable different models.

    Based ontop of net-next master.

    Changes since v2.
    - Fix bookkeeping for "active_count" and "event_count" reported in
    /sys/kernel/debug/wakeup_sources. Thanks Geert for noticing this.
    - Add new patch 1/6 which corrects the name of ECMR_MPDE bit, suggested
    by Sergei.
    - s/sh7743/sh7734/ in patch 5/6. Thanks Geert for spotting this.
    - Spelling improvements suggested by Sergei and Geert.
    - Add Tested-by to 3/6 and 4/6.

    Changes since v1.
    - Split generic WoL functionality and device enablement to different
    patches.
    - Enable more devices then Gen2 after feedback from Geert and
    datasheets.
    - Do not set mdp->irq_enabled = false and remove specific MagicPacket
    interrupt clearing, instead let sh_eth_error() clear the interrupt as
    for other EMAC interrupts, thanks Sergei for the suggestion.
    - Use the original return logic in sh_eth_resume().
    - Moved sh_eth_private variable *clk to top of data structure to avoid
    possible gaps due to alignment restrictions.
    - Make wol_enabled in sh_eth_private part of the already existing
    bitfield instead of a bool.
    - Do not initiate mdp->wol_enabled to 0, the struct is kzalloc'ed so
    it's already set to 0.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This is based on public datasheet for sh7763 which shows it has the
    same behavior and registers for WoL as other versions of sh_eth.

    Signed-off-by: Niklas Söderlund
    Signed-off-by: David S. Miller

    Niklas Söderlund
     
  • This is based on public datasheet for sh7734 which shows it has the
    same behavior and registers for WoL as other versions of sh_eth.

    Signed-off-by: Niklas Söderlund
    Signed-off-by: David S. Miller

    Niklas Söderlund
     
  • Geert Uytterhoeven reported WoL worked on his Armadillo board.

    Signed-off-by: Niklas Söderlund
    Tested-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Niklas Söderlund
     
  • Tested on Gen2 r8a7791/Koelsch.

    Signed-off-by: Niklas Söderlund
    Tested-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Niklas Söderlund
     
  • Add generic functionality to support Wake-on-LAN using MagicPacket which
    are supported by at least a few versions of sh_eth. Only add
    functionality for WoL, no specific sh_eth versions are marked to support
    WoL yet.

    WoL is enabled in the suspend callback by setting MagicPacket detection
    and disabling all interrupts expect MagicPacket. In the resume path the
    driver needs to reset the hardware to rearm the WoL logic, this prevents
    the driver from simply restoring the registers and to take advantage of
    that sh_eth was not suspended to reduce resume time. To reset the
    hardware the driver closes and reopens the device just like it would do
    in a normal suspend/resume scenario without WoL enabled, but it both
    closes and opens the device in the resume callback since the device
    needs to be open for WoL to work.

    One quirk needed for WoL is that the module clock needs to be prevented
    from being switched off by Runtime PM. To keep the clock alive the
    suspend callback need to call clk_enable() directly to increase the
    usage count of the clock. Then when Runtime PM decreases the clock usage
    count it won't reach 0 and be switched off.

    Signed-off-by: Niklas Söderlund
    Signed-off-by: David S. Miller

    Niklas Söderlund
     
  • This bit was wrongly named due to a typo, Sergei checked the SH7734/63
    manuals and this bit should be named MPDE.

    Suggested-by: Sergei Shtylyov
    Signed-off-by: Niklas Söderlund
    Signed-off-by: David S. Miller

    Niklas Söderlund
     
  • Jesper Dangaard Brouer says:

    ====================
    net: optimize ICMP-reply code path

    This patchset is optimizing the ICMP-reply code path, for ICMP packets
    that gets rate limited. A remote party can easily trigger this code
    path by sending packets to port number with no listening service.

    Generally the patchset moves the sysctl_icmp_msgs_per_sec ratelimit
    checking to earlier in the code path and removes an allocation.

    Use-case: The specific case I experienced this being a bottleneck is,
    sending UDP packets to a port with no listener, which obviously result
    in kernel replying with ICMP Destination Unreachable (type:3), Port
    Unreachable (code:3), which cause the bottleneck.

    After Eric and Paolo optimized the UDP socket code, the kernels PPS
    processing capabilities is lower for no-listen ports, than normal UDP
    sockets. This is bad for capacity planning when restarting a service.

    UDP no-listen benchmark 8xCPUs using pktgen_sample04_many_flows.sh:
    Baseline: 6.6 Mpps
    Patch: 14.7 Mpps
    Driver mlx5 at 50Gbit/s.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
    by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
    as pointed out by Eric Dumazet, but the BH disabled state must be correct.

    The icmp_global_allow() call states it must be called with BH
    disabled. This protection was given by the calls icmp_xmit_lock and
    icmpv6_xmit_lock. Thus, split out local_bh_disable/enable from these
    functions and maintain it explicitly at callers.

    Suggested-by: Eric Dumazet
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • This patch split the global and per (inet)peer ICMP-reply limiter
    code, and moves the global limit check to earlier in the packet
    processing path. Thus, avoid spending cycles on ICMP replies that
    gets limited/suppressed anyhow.

    The global ICMP rate limiter icmp_global_allow() is a good solution,
    it just happens too late in the process. The kernel goes through the
    full route lookup (return path) for the ICMP message, before taking
    the rate limit decision of not sending the ICMP reply.

    Details: The kernels global rate limiter for ICMP messages got added
    in commit 4cdf507d5452 ("icmp: add a global rate limitation"). It is
    a token bucket limiter with a global lock. It brilliantly avoids
    locking congestion by only updating when 20ms (HZ/50) were elapsed. It
    can then avoids taking lock when credit is exhausted (when under
    pressure) and time constraint for refill is not yet meet.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
    on stack"), because struct icmp_bxm no really a large struct, and
    allocating and free of this small 112 bytes hurts performance.

    Fixes: 9a99d4a50cb8 ("icmp: avoid allocating large struct on stack")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • …git/dhowells/linux-fs

    David Howells says:

    ====================
    afs: Refcount afs_call struct

    These patches provide some tracepoints for AFS and fix a potential leak by
    adding refcounting to the afs_call struct.

    The patches are:

    (1) Add some tracepoints for logging incoming calls and monitoring
    notifications from AF_RXRPC and data reception.

    (2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
    initially expected. It can be brought back later if needed. This
    clears some stuff out that I don't then need to fix up in (4).

    (3) Allow listen(..., 0) to be used to disable listening. This makes
    shutting down the AFS cache manager server in the kernel much easier
    and the accounting simpler as we can then be sure that (a) all
    preallocated afs_call structs are relesed and (b) no new incoming
    calls are going to be started.

    For the moment, listening cannot be reenabled.

    (4) Add refcounting to the afs_call struct to fix a potential multiple
    release detected by static checking and add a tracepoint to follow the
    lifecycle of afs_call objects.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Florian Fainelli says:

    ====================
    net: dsa: Make dsa_switch_ops const

    This patch series allows us to annotate dsa_switch_ops with a const
    qualifier.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Now that we have properly encapsulated and made drivers utilize exported
    functions, we can switch dsa_switch_ops to be a annotated with const.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In preparation for making struct dsa_switch_ops const, encapsulate it
    within a dsa_switch_driver which has a list pointer and a pointer to
    dsa_switch_ops. This allows us to take the list_head pointer out of
    dsa_switch_ops, which is written to by {un,}register_switch_driver.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Utilize the b53 exported functions to fill our bcm_sf2_ops structure,
    also making it clear what we utilize and what we specifically override.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In preparation for making dsa_switch_ops const, export b53 operations
    utilized by other drivers such as bcm_sf2.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Sergei Shtylyov says:

    ====================
    sh_eth: "intgelligent checksum" related cleanups

    Here's a set of 2 patches against DaveM's 'net.git' repo, as they are based
    on a couple patches merged there recently; however, the patches are destined
    for 'net-next.git' (once 'net.git' gets merged there next time). I'm cleaning
    up the "intelligent checksum" related code (however, the driver only disables
    this feature for now, theres's no proper offload supprt yet).
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The 'struct sh_eth_cpu_data' field indicating the "intelligent checksum"
    support was misnamed 'hw_crc' -- rename it to 'hw_checksum'.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: David S. Miller

    Sergei Shtylyov
     
  • After checking all the available manuals, I have enough information to
    conclude that the 'shift_rd0' flag is only relevant for the Ether cores
    supporting so called "intelligent checksum" (and hence having CSMR) which
    is indicated by the 'hw_crc' flag. Since all the relevant SoCs now have
    both these flags set, we can at last get rid of the former flag...

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: David S. Miller

    Sergei Shtylyov
     
  • David S. Miller
     
  • While in RUNNING state, phy_state_machine() checks for link changes by
    comparing phydev->link before and after calling phy_read_status().
    This works as long as it is guaranteed that phydev->link is never
    changed outside the phy_state_machine().

    If in some setups this happens, it causes the state machine to miss
    a link loss and remain RUNNING despite phydev->link being 0.

    This has been observed running a dsa setup with a process continuously
    polling the link states over ethtool each second (SNMPD RFC-1213
    agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
    causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
    call phy_read_status() and with that modify the link status - and
    with that bricking the phy state machine.

    This patch adds a fail-safe check while in RUNNING, which causes to
    move to CHANGELINK when the link is gone and we are still RUNNING.

    Signed-off-by: Zefir Kurtisi
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Zefir Kurtisi
     
  • Pull networking fixes from David Miller:

    1) Fix dumping of nft_quota entries, from Pablo Neira Ayuso.

    2) Fix out of bounds access in nf_tables discovered by KASAN, from
    Florian Westphal.

    3) Fix IRQ enabling in dp83867 driver, from Grygorii Strashko.

    4) Fix unicast filtering in be2net driver, from Ivan Vecera.

    5) tg3_get_stats64() can race with driver close and ethtool
    reconfigurations, fix from Michael Chan.

    6) Fix error handling when pass limit is reached in bpf code gen on
    x86. From Daniel Borkmann.

    7) Don't clobber switch ops and use proper MDIO nested reads and writes
    in bcm_sf2 driver, from Florian Fainelli.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
    net: dsa: bcm_sf2: Utilize nested MDIO read/write
    net: dsa: bcm_sf2: Do not clobber b53_switch_ops
    net: stmmac: fix maxmtu assignment to be within valid range
    bpf: change back to orig prog on too many passes
    tg3: Fix race condition in tg3_get_stats64().
    be2net: fix unicast list filling
    be2net: fix accesses to unicast list
    netlabel: add CALIPSO to the list of built-in protocols
    vti6: fix device register to report IFLA_INFO_KIND
    net: phy: dp83867: fix irq generation
    amd-xgbe: Fix IRQ processing when running in single IRQ mode
    sh_eth: R8A7740 supports packet shecksumming
    sh_eth: fix EESIPR values for SH77{34|63}
    r8169: fix the typo in the comment
    nl80211: fix sched scan netlink socket owner destruction
    bridge: netfilter: Fix dropping packets that moving through bridge interface
    netfilter: ipt_CLUSTERIP: check duplicate config when initializing
    netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
    netfilter: nf_tables: fix oob access
    netfilter: nft_queue: use raw_smp_processor_id()
    ...

    Linus Torvalds
     
  • Joao Pinto says:

    ====================
    adding new glue driver dwmac-dwc-qos-eth

    This patch set contains the porting of the synopsys/dwc_eth_qos.c driver
    to the stmmac structure. This operation resulted in the creation of a new
    platform glue driver called dwmac-dwc-qos-eth which was based in the
    dwc_eth_qos as is.

    dwmac-dwc-qos-eth inherited dwc_eth_qos DT bindings, to assure that current
    and old users can continue to use it as before. We can see this driver as
    being deprecated, since all new development will be done in stmmac.

    Please check each patch for implementation details.
    ====================

    Tested-by: Niklas Cassel
    Reviewed-by: Lars Persson
    Acked-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch adds a new glue driver called dwmac-dwc-qos-eth which
    was based in the dwc_eth_qos as is. To assure retro-compatibility a slight
    tweak was also added to stmmac_platform.

    Signed-off-by: Joao Pinto
    Tested-by: Niklas Cassel
    Reviewed-by: Lars Persson
    Acked-by: Alexandre TORGUE
    Signed-off-by: David S. Miller

    jpinto