27 Aug, 2016

26 commits

  • This patch introduces UDP replicast. A concept where we emulate
    multicast by sending multiple unicast messages to configured peers.

    The purpose of replicast is mainly to be able to use TIPC in cloud
    environments where IP multicast is disabled. Using replicas to unicast
    multicast messages is costly as we have to copy each skb and send the
    copies individually.

    Signed-off-by: Richard Alpe
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • Add a function to check if a tipc UDP media address is a multicast
    address or not. This is a purely cosmetic change.

    Signed-off-by: Richard Alpe
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • Split the UDP send function into two. One callback that prepares the
    skb and one transmit function that sends the skb. This will come in
    handy in later patches, when we introduce UDP replicast.

    Signed-off-by: Richard Alpe
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • Split the UDP netlink parse function so that it only parses one
    netlink attribute at the time. This makes the parse function more
    generic and allow future UDP API functions to use it for parsing.

    Signed-off-by: Richard Alpe
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • And while at it, remove the unecessary writing of zeroes to the CPU_MASK_CLEAR
    register since it has no functional use.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Add tso/tso6 support to the alx driver.
    Based on information from the downstream driver at github.com/qca/alx

    Signed-off-by: Tobias Regnery
    Signed-off-by: David S. Miller

    Tobias Regnery
     
  • Nelson Chang says:

    ====================
    net: ethernet: mediatek: modify to use the PDMA for Ethernet RX

    This series have some modifications and refines to support Ethernet RX by the PDMA.

    changes since v4:
    - Remove the redundant OR operation in mtk_hw_init()

    changes since v3:
    - Add GDM hardware settings to send packets to PDMA for RX

    changes since v2:
    - Fix the bugs of PDMA cpu index and interrupt settings in mtk_poll_rx()

    changes since v1:
    - Modify to use the PDMA instead of the QDMA for Ethernet RX
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Because we change to use the PDMA as the Ethernet RX DMA engine,
    the patch modifies to set GDM to send packets to PDMA for RX.

    Acked-by: John Crispin
    Signed-off-by: Nelson Chang
    Signed-off-by: David S. Miller

    Nelson Chang
     
  • Because the PDMA has richer features than the QDMA for Ethernet RX
    (such as multiple RX rings, HW LRO, etc.),
    the patch modifies to use the PDMA to handle Ethernet RX.

    Acked-by: John Crispin
    Signed-off-by: Nelson Chang
    Signed-off-by: David S. Miller

    Nelson Chang
     
  • Florian Fainelli says:

    ====================
    net: dsa: Make bcm_sf2 utilize b53_common

    This patch series makes the bcm_sf2 driver utilize a large number of the core
    functions offered by the b53_common driver since the SWITCH_CORE registers are
    mostly register compatible with the switches driven by b53_common.

    In order to accomplish that, we just override the dsa_driver_ops callbacks that
    we need to. There are still integration specific logic from the bcm_sf2 that we
    cannot absorb into b53_common because it is just not there, mostly in the area
    of link management and power management, but most of the features are within
    b53_common now: VLAN, FDB, bridge

    Along the process, we also improve support for the BCM58xx SoCs, since those
    also have the same version of the switching IP that 7445 has (for which bcm_sf2
    was developed).

    Changes in v3:

    - rebase against 145dd5f9c88f6ee645662df0be003e8f04bdae93 ("net: flush the
    softnet backlog in process context")

    Changes in v2:

    - rebased against "net: dsa: rename switch operations structure"
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Now that we are using b53_common for most VLAN, FDB and bridge
    operations, delete all the redundant code that we had in bcm_sf2.c to
    keep only the integration specific logic that we have to deal with:
    power management, link management and the external interfaces (RGMII,
    MDIO).

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The Broadcom Starfighter2 is almost entirely register compatible with
    B53, yet for historical reasons came up first in the tree and is now
    being updated to utilize b53_common.c to the fullest extent possible. A
    few things need to be adjusted to allow that:

    - the switch "core" registers currently operate on a 32-bit address,
    whereas b53 passes a page + reg pair to offset from, so we need to
    convert that, thankfully there is a generic formula to do that

    - the link managemenent is not self contained with the B53/CORE register
    set, but instead is in the SWITCH_REG block which is part of the
    integration glue logic, so we keep that entirely custom here because
    this really is part of the existing bcm_sf2 implementation

    - there are additional power management constraints on the port's
    memories that make us keep the port_enable/disable callbacks custom
    for now, also, we support tagging whereas b53_common does not support
    that yet

    All the VLAN and bridge code is entirely identical though so, avoid
    duplicating it. Other things will be migrated in the future like EEE and
    possibly Wake-on-LAN.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In order to migrate the bcm_sf2 driver over to the b53 driver for most
    VLAN/FDB/bridge operations, we need to add support for the "join all
    VLANs" register and behavior which allows us to make a given port join
    all VLANs and avoid setting specific VLAN entries when it is leaving the
    bridge.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The 58xx and 7445 chips use the Starfighter2 code, define its MIB layout
    and introduce a helper function: is58xx() which checks for both of these
    IDs for now.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Allocate a device entry for the Broadcom BCM7445 integrated switch
    currently backed by bcm_sf2.c. Since this is the latest generation, it
    has 4 ARL entries, 4K VLANs and uses Port 8 for the CPU/IMP port.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In order to allow drivers to override specific dsa_switch_driver
    callbacks, initialize ds->ops to b53_switch_ops earlier, which avoids
    having to expose this structure to glue drivers.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Jiri Pirko says:

    ====================
    mlxsw: Introduce support for offload forward mark

    Ido says:
    This patchset enables the forwarding of certain control packets by the
    device instead of relying on the CPU to do the forwarding.

    The first two patches simplify the current switchdev offload forward
    infrastructure and make it usable for stacked devices. This is done by
    moving the packet and port marking to the bridge driver instead of the
    switch driver.

    Patches 3-5 add the mlxsw specific bits to support the forward mark.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Instead of trapping certain packets to the CPU and then relying on it to
    flood them we can instead make the device mirror them.

    The following packet types are mirrored:

    * DHCP: Broadcast packets that should be flooded by the device, but also
    trapped in case CPU is running the DHCP server.

    * IGMP query: Multicast packets that need to be forwarded to other
    bridge ports, but also trapped so that receiving netdev will be marked
    as a router port by the bridge driver.

    * ARP request: Broadcast packets that should be forwarded to other
    bridge ports, but also trapped in case requested IP is of the local
    machine.

    * ARP response: Unicast packets that should be forwarded by the bridge
    but also trapped in case response is directed at us.

    Set the trap action of such packets to mirror and mark them using
    'offload_fwd_mark' to prevent the bridge driver from forwarding them
    itself.

    Note that OSPF packets are also marked despite their action being trap.
    The reason for this is that the device traps such packets in the
    pipeline after they were already flooded.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Up until now we only trapped packets to CPU, but we are going to allow
    some packets to be mirrored (trap & forward) to CPU.

    Extend the Rx listener with 'action' member.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Instead of copying & pasting the same struct initialization for every
    Rx listener, just use a macro.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
    port netdevs so that packets being flooded by the device won't be
    flooded twice.

    It works by assigning a unique identifier (the ifindex of the first
    bridge port) to bridge ports sharing the same parent ID. This prevents
    packets from being flooded twice by the same switch, but will flood
    packets through bridge ports belonging to a different switch.

    This method is problematic when stacked devices are taken into account,
    such as VLANs. In such cases, a physical port netdev can have upper
    devices being members in two different bridges, thus requiring two
    different 'offload_fwd_mark's to be configured on the port netdev, which
    is impossible.

    The main problem is that packet and netdev marking is performed at the
    physical netdev level, whereas flooding occurs between bridge ports,
    which are not necessarily port netdevs.

    Instead, packet and netdev marking should really be done in the bridge
    driver with the switch driver only telling it which packets it already
    forwarded. The bridge driver will mark such packets using the mark
    assigned to the ingress bridge port and will prevent the packet from
    being forwarded through any bridge port sharing the same mark (i.e.
    having the same parent ID).

    Remove the current switchdev 'offload_fwd_mark' implementation and
    instead implement the proposed method. In addition, make rocker - the
    sole user of the mark - use the proposed method.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • switchdev_port_same_parent_id() currently expects port netdevs, but we
    need it to support stacked devices in the next patch, so drop the
    NO_RECURSE flag.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Remove unused and useless priv_size member from struct devlink_ops.

    Cc: Jiri Pirko
    Signed-off-by: Ivan Vecera
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ivan Vecera
     
  • Currently in process_backlog(), the process_queue dequeuing is
    performed with local IRQ disabled, to protect against
    flush_backlog(), which runs in hard IRQ context.

    This patch moves the flush operation to a work queue and runs the
    callback with bottom half disabled to protect the process_queue
    against dequeuing.
    Since process_queue is now always manipulated in bottom half context,
    the irq disable/enable pair around the dequeue operation are removed.

    To keep the flush time as low as possible, the flush
    works are scheduled on all online cpu simultaneously, using the
    high priority work-queue and statically allocated, per cpu,
    work structs.

    Overall this change increases the time required to destroy a device
    to improve slightly the packets reinjection performances.

    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Paolo Abeni
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • When I added support to export the vlan entry flags via xstats I forgot to
    add support for the pvid since it is manually matched, so check if the
    entry matches the vlan_group's pvid and set the flag appropriately.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Commit b17c706987fa ("loopback: sctp: add NETIF_F_SCTP_CSUM to device
    features") added NETIF_F_SCTP_CRC to device features for lo device to
    improve the performance of sctp over lo.

    This patch is to add NETIF_F_SCTP_CRC to device features for veth to
    improve the performance of sctp over veth.

    Before this patch:
    ip netns exec cs_client netperf -H 10.167.12.2 -t SCTP_STREAM -- -m 10K
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    212992 212992 10240 10.00 1117.16

    After this patch:
    ip netns exec cs_client netperf -H 10.167.12.2 -t SCTP_STREAM -- -m 10K
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    212992 212992 10240 10.20 1415.22

    Tested-by: Li Shuang
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

26 Aug, 2016

7 commits

  • Since we keep shadow copies of which interrupt sources are enabled
    through the intrl2_*_mask_{set,clear} macros, make sure that the
    ordering in which we do these two operations: update the copy, then
    unmask the register is correct.

    This is not currently a problem because we actually do not use them, but
    we will in a subsequent patch optimizing register accesses, so better be
    safe here.

    Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • per_cpu_inc() is faster (at least on x86) than per_cpu_ptr(xxx)++;

    Signed-off-by: Eric Dumazet
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Adds SNMP counter for drops caused by MD5 mismatches.

    The current syslog might help, but a counter is more precise and helps
    monitoring.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • TCP MD5 mismatches do increment sk_drops counter in all states but
    SYN_RECV.

    This is very unlikely to happen in the real world, but worth adding
    to help diagnostics.

    We increase the parent (listener) sk_drops.

    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Fixes the following sparse warning:

    drivers/net/vmxnet3/vmxnet3_drv.c:1645:1: warning:
    symbol 'vmxnet3_rq_destroy_all_rxdataring' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Signed-off-by: Shrikrishna Khare
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Fix to return error code -ENOMEM from the dma_map_single error
    handling case instead of 0, as done elsewhere in this function.

    Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Remove an open coded simple_open() function and replace file
    operations references to the function with simple_open()
    instead.

    Generated by: scripts/coccinelle/api/simple_open.cocci

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

25 Aug, 2016

7 commits

  • This allows a privileged process to filter by socket mark when
    dumping sockets via INET_DIAG_BY_FAMILY. This is useful on
    systems that use mark-based routing such as Android.

    The ability to filter socket marks requires CAP_NET_ADMIN, which
    is consistent with other privileged operations allowed by the
    SOCK_DIAG interface such as the ability to destroy sockets and
    the ability to inspect BPF filters attached to packet sockets.

    Tested: https://android-review.googlesource.com/261350
    Signed-off-by: Lorenzo Colitti
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     
  • This simplifies the code a bit and also allows inet_diag_bc_audit
    to send to userspace an error that isn't EINVAL.

    Signed-off-by: Lorenzo Colitti
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     
  • Now that the dsa_switch_driver structure contains only function pointers
    as it is supposed to, rename it to the more appropriate dsa_switch_ops,
    uniformly to any other operations structure in the kernel.

    No functional changes here, basically just the result of something like:
    s/dsa_switch_driver *drv/dsa_switch_ops *ops/g

    However keep the {un,}register_switch_driver functions and their
    dsa_switch_drivers list as is, since they represent the -- likely to be
    deprecated soon -- legacy DSA registration framework.

    In the meantime, also fix the following checks from checkpatch.pl to
    make it happy with this patch:

    CHECK: Comparison to NULL could be written "!ops"
    #403: FILE: net/dsa/dsa.c:470:
    + if (ops == NULL) {

    CHECK: Comparison to NULL could be written "ds->ops->get_strings"
    #773: FILE: net/dsa/slave.c:697:
    + if (ds->ops->get_strings != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats"
    #824: FILE: net/dsa/slave.c:785:
    + if (ds->ops->get_ethtool_stats != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_sset_count"
    #835: FILE: net/dsa/slave.c:798:
    + if (ds->ops->get_sset_count != NULL)

    total: 0 errors, 0 warnings, 4 checks, 784 lines checked

    Signed-off-by: Vivien Didelot
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • When ndo_set_rx_mode() is called for bnx2x, as part of process of
    configuring the new MAC address filters [both unicast & multicast]
    driver begins by flushing the existing configuration and then iterating
    over the network device's list of addresses and configures those instead.

    This has the side-effect of creating a short gap where traffic wouldn't
    be properly classified, as no filters are configured in HW.
    While for unicasts this is rather insignificant [as unicast MACs don't
    frequently change while interface is actually running],
    for multicast traffic it does pose an issue as there are multicast-based
    networks where new multicast groups would constantly be removed and
    added.

    This patch tries to remedy this [at least for the newer adapters] -
    Instead of flushing & reconfiguring all existing multicast filters,
    the driver would instead create the approximate hash match that would
    result from the required filters. It would then compare it against the
    currently configured approximate hash match, and only add and remove the
    delta between those.

    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • …l/git/dhowells/linux-fs

    David Howells says:

    ====================
    rxrpc: Add better client conn management strategy

    These two patches add a better client connection management strategy. They
    need to be applied on top of the just-posted fixes.

    (1) Duplicate the connection list and separate out procfs iteration from
    garbage collection. This is necessary for the next patch as with that
    client connections no longer appear on a single list and may not
    appear on a list at all - and really don't want to be exposed to the
    old garbage collector.

    (Note that client conns aren't left dangling, they're also in a tree
    rooted in the local endpoint so that they can be found by a user
    wanting to make a new client call. Service conns do not appear in
    this tree.)

    (2) Implement a better lifetime management and garbage collection strategy
    for client connections.

    In this, a client connection can be in one of five cache states
    (inactive, waiting, active, culled and idle). Limits are set on the
    number of client conns that may be active at any one time and makes
    users wait if they want to start a new call when there isn't capacity
    available.

    To make capacity available, active and idle connections can be culled,
    after a short delay (to allow for retransmission). The delay is
    reduced if the capacity exceeds a tunable threshold.

    If there is spare capacity, client conns are permitted to hang around
    a fair bit longer (tunable) so as to allow reuse of negotiated
    security contexts.

    After this patch, the client conn strategy is separate from that of
    service conns (which continues to use the old code for the moment).

    This difference in strategy is because the client side retains control
    over when it allows a connection to become active, whereas the service
    side has no control over when it sees a new connection or a new call
    on an old connection.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • …l/git/dhowells/linux-fs

    David Howells says:

    ====================
    rxrpc: More fixes

    Here are a couple of fix patches:

    (1) Fix the conn-based retransmission patch posted yesterday. This breaks
    if it actually has to retransmit. However, it seems the likelihood of
    this happening is really low, despite the server I'm testing against
    being located >3000 miles away, and sometime of the time it's handled
    in the call background processor before we manage to disconnect the
    call - hence why I didn't spot it.

    (2) /proc/net/rxrpc_calls can cause a crash it accessed whilst a call is
    being torn down. The window of opportunity is pretty small, however,
    as calls don't stay in this state for long.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Jiri Pirko says:

    ====================
    mlxsw: Offload FDB learning configuration

    Ido says:
    This patchset addresses two long standing issues in the mlxsw driver
    concerning FDB learning.

    Patch 1 limits the number of FDB records processed by the driver in a
    single session. This is useful in situations in which many new records
    need to be processed, thereby causing the RTNL mutex to be held for
    long periods of time.

    Patches 2-6 offload the learning configuration (on / off) of bridge
    ports to the device instead of having the driver decide whether a
    record needs to be learned or not.

    The last patch is fallout and removes configuration no longer necessary
    after the first patches are applied.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller