13 Oct, 2017

34 commits

  • As a preparation for introducing flow control for multicast and datagram
    messaging we need a more strictly defined framework than we have now. A
    socket must be able keep track of exactly how many and which other
    sockets it is allowed to communicate with at any moment, and keep the
    necessary state for those.

    We therefore introduce a new concept we have named Communication Group.
    Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
    The call takes four parameters: 'type' serves as group identifier,
    'instance' serves as an logical member identifier, and 'scope' indicates
    the visibility of the group (node/cluster/zone). Finally, 'flags' makes
    it possible to set certain properties for the member. For now, there is
    only one flag, indicating if the creator of the socket wants to receive
    a copy of broadcast or multicast messages it is sending via the socket,
    and if wants to be eligible as destination for its own anycasts.

    A group is closed, i.e., sockets which have not joined a group will
    not be able to send messages to or receive messages from members of
    the group, and vice versa.

    Any member of a group can send multicast ('group broadcast') messages
    to all group members, optionally including itself, using the primitive
    send(). The messages are received via the recvmsg() primitive. A socket
    can only be member of one group at a time.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We often see a need for a linked list of destination identities,
    sometimes containing a port number, sometimes a node identity, and
    sometimes both. The currently defined struct u32_list is not generic
    enough to cover all cases, so we extend it to contain two u32 integers
    and rename it to struct tipc_dest_list.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We see an increasing need to send multiple single-buffer messages
    of TIPC_SYSTEM_IMPORTANCE to different individual destination nodes.
    Instead of looping over the send queue and sending each buffer
    individually, as we do now, we add a new help function
    tipc_node_distr_xmit() to do this.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In the following commits we will need to handle multiple incoming and
    rejected/returned buffers in the function socket.c::filter_rcv().
    As a preparation for this, we generalize the function by handling
    buffer queues instead of individual buffers. We also introduce a
    help function tipc_skb_reject(), and rename filter_rcv() to
    tipc_sk_filter_rcv() in line with other functions in socket.c.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In the coming commits, functions at the socket level will need the
    ability to read the availability status of a given node. We therefore
    introduce a new function for this purpose, while renaming the existing
    static function currently having the wanted name.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The address given to tipc_connect() is not completely sanity checked,
    under the assumption that this will be done later in the function
    __tipc_sendmsg() when the address is used there.

    However, the latter functon will in the next commits serve as caller
    to several other send functions, so we want to move the corresponding
    sanity check there to the beginning of that function, before we possibly
    need to grab the address stored by tipc_connect(). We must therefore
    be able to trust that this address already has been thoroughly checked.

    We do this in this commit.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As preparation for introducing communication groups, we add the ability
    to issue topology subscriptions and receive topology events from kernel
    space. This will make it possible for group member sockets to keep track
    of other group members.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • The module clock is used for two purposes:
    - Wake-on-LAN (WoL), which is optional,
    - gPTP Timer Increment (GTI) configuration, which is mandatory.

    As the clock is needed for GTI configuration anyway, WoL is always
    available. Hence remove duplication and repeated obtaining of the clock
    by making GTI use the stored clock for WoL use.

    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Niklas Söderlund
    Reviewed-by: Sergei Shtylyov
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     
  • Rafał Miłecki says:

    ====================
    net: support bgmac with B50212E B1 PHY

    I got a report that a board with BCM47189 SoC and B50212E B1 PHY doesn't
    work well some devices as there is massive ping loss. After analyzing
    PHY state it has appeared that is runs in slave mode and doesn't auto
    switch to master properly when needed.

    This patchset fixes this by:
    1) Adding new flag support to the PHY driver for setting master mode
    2) Modifying bgmac to request master mode for reported hardware
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There are 4 very similar PHYs:
    0x600d84a1: BCM54210E (rev B0)
    0x600d84a2: BCM54210E (rev B1)
    0x600d84a5: B50212E (rev B0)
    0x600d84a6: B50212E (rev B1)
    that need setting master mode manually. It's because they run in slave
    mode by default with Automatic Slave/Master configuration disabled which
    can lead to unreliable connection with massive ping loss.

    So far it was reported for a board with BCM47189 SoC and B50212E B1 PHY
    connected to the bgmac supported ethernet device. Telling PHY driver to
    setup PHY properly solves this issue.

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     
  • Some of Broadcom's PHYs run by default in slave mode with Automatic
    Slave/Master configuration disabled. It stops them from working properly
    with some devices.

    So far it has been verified for BCM54210E and BCM50212E which don't
    work well with Intel's I217-LM and I218-LM:
    http://ark.intel.com/products/60019/Intel-Ethernet-Connection-I217-LM
    http://ark.intel.com/products/71307/Intel-Ethernet-Connection-I218-LM
    I was told there is massive ping loss.

    This commit adds support for a new flag which can be set by an ethernet
    driver to fixup PHY setup.

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     
  • If the underlying master ever changes its L2 (e.g. bonding device),
    then make sure that the IPvlan slaves always emit packets with the
    current L2 of the master instead of the stale mac addr which was
    copied during the device creation. The problem can be seen with
    following script -

    #!/bin/bash
    # Create a vEth pair
    ip link add dev veth0 type veth peer name veth1
    ip link set veth0 up
    ip link set veth1 up
    ip link show veth0
    ip link show veth1
    # Create an IPvlan device on one end of this vEth pair.
    ip link add link veth0 dev ipvl0 type ipvlan mode l2
    ip link show ipvl0
    # Change the mac-address of the vEth master.
    ip link set veth0 address 02:11:22:33:44:55

    Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     
  • Alexander Aring says:

    ====================
    sched: act: ife: UAPI checks and performance tweaks

    this patch series contains at first a patch which adds a check for
    IFE_ENCODE and IFE_DECODE when a ife act gets created or updated and adding
    handling of these cases only inside the act callback only.

    The second patch use per-cpu counters and move the spinlock around so that
    the spinlock is less being held in act callback.

    The last patch use rcu for update parameters and also move the spinlock for
    the same purpose as in patch 2.

    Notes:
    - There is still a spinlock around for protecting the metalist and a
    rw-lock for another list. Should be migrated to a rcu list, ife
    possible.

    - I use still dereference in dump callback, so I think what I didn't
    got was what happened when rcu_assign_pointer will do when rcu read
    lock is held. I suppose the pointer will be updated, then we don't
    have any issue here.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch changes the parameter updating via RCU and not protected by a
    spinlock anymore. This reduce the time that the spinlock is being held.

    Signed-off-by: Alexander Aring
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • This patch migrates the current counter handling which is protected by a
    spinlock to a per-cpu counter handling. This reduce the time where the
    spinlock is being held.

    Signed-off-by: Alexander Aring
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • This patch adds the check of the two possible ife handlings encode
    and decode to the init callback. The decode value is for usability
    aspect and used in userspace code only. The current code offers encode
    else decode only. This patch avoids any other option than this.

    Signed-off-by: Alexander Aring
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • Roman Mashak says:

    ====================
    net: sched: Fix IFE meta modules loading

    Adjust module alias names of IFE meta modules and fix the bug that
    prevented auto-loading IFE modules in run-time.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Macro __stringify_1() can stringify a macro argument, however IFE_META_*
    are enums, so they never expand, however request_module expects an integer
    in IFE module name, so as a result it always fails to auto-load.

    Fixes: ef6980b6becb ("introduce IFE action")
    Signed-off-by: Roman Mashak
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Roman Mashak
     
  • Make style of module alias name consistent with other subsystems in kernel,
    for example net devices.

    Fixes: 084e2f6566d2 ("Support to encoding decoding skb mark on IFE action")
    Fixes: 200e10f46936 ("Support to encoding decoding skb prio on IFE action")
    Fixes: 408fbc22ef1e ("net sched ife action: Introduce skb tcindex metadata encap decap")
    Signed-off-by: Roman Mashak
    Signed-off-by: David S. Miller

    Roman Mashak
     
  • Delete unused channel variables in vxge-traffic.

    Signed-off-by: Christos Gkekas
    Signed-off-by: David S. Miller

    Christos Gkekas
     
  • This file contains unnecessary whitespaces as newlines, remove them,
    found by looking at what struct tc_mirred looks like.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • When af_mpls is built-in but the tunnel support is a module,
    we get a link failure:

    net/mpls/af_mpls.o: In function `mpls_init':
    af_mpls.c:(.init.text+0xdc): undefined reference to `ip_tunnel_encap_add_ops'

    This adds a Kconfig statement to prevent the broken
    configuration and force mpls to be a module as well in
    this case.

    Fixes: bdc476413dcd ("ip_tunnel: add mpls over gre support")
    Signed-off-by: Arnd Bergmann
    Acked-by: Amine Kherbouche
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • Ursula Braun says:

    ====================
    net/smc: ib_query_gid() patches

    triggered by Parav Pandit here are 2 cleanup patches for usage of
    ib_query_gid() in the smc-code.

    Thanks, Ursula

    v2 changes advised by Parav Pandit:
    extra check is_vlan_dev() in patch 2/2
    "RoCE" spelling
    added "Reported-by"
    added "Reviewed-by"
    added "Fixes"
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • For RoCEs ib_query_gid() takes a reference count on the net_device.
    This reference count must be decreased by the caller.

    Signed-off-by: Ursula Braun
    Reported-by: Parav Pandit
    Reviewed-by: Parav Pandit
    Fixes: 0cfdd8f92cac ("smc: connection and link group creation")
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • SMC should not open code the function pointer get_netdev of the
    IB device. Replacing ib_query_gid(..., NULL) with
    ib_query_gid(..., gid_attr) allows access to the netdev.

    Signed-off-by: Ursula Braun
    Suggested-by: Parav Pandit
    Reviewed-by: Parav Pandit
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Florian Fainelli says:

    ====================
    Enable ACB for bcm_sf2 and bcmsysport

    This patch series enables Broadcom's Advanced Congestion Buffering mechanism
    which requires cooperation between the CPU/Management Ethernet MAC controller
    and the switch.

    I took the notifier approach because ultimately the information we need to
    carry to the master network device is DSA specific and I saw little room for
    generalizing beyond what DSA requires. Chances are that this is highly specific
    to the Broadcom HW as I don't know of any HW out there that supports something
    nearly similar for similar or identical needs.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Now that we have established the queue mapping between the switch port
    egress queues and the SYSTEMPORT egress queues, we can turn on Advanced
    Congestion Buffering (ACB) at the SYSTEMPORT level. This enables the
    Ethernet MAC controller to get out of band flow control information
    directly from the switch port and queue that it monitors such that its
    internal TDMA can be appropriately backpressured.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Turn on the out of band Advanced Congestion Buffering (ACB) mechanism at
    the switch level now that we have properly established the queue mapping
    between the switch egress queues and the SYSTEMPORT egress queues. This
    allows the switch to correctly backpressure the host system when one of
    its queue drops below the configured thresholds.

    This is also helping achieve so called "lossless" behavior by adapting
    the TX interrupt pacing to the actual speed and capacity of the switch
    port.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Establish a queue mapping between the DSA slave network device queues
    created that correspond to switch port queues, and the transmit queue
    that SYSTEMPORT manages.

    We need to configure the SYSTEMPORT transmit queue with the switch port number
    and switch port queue number in order for the switch and SYSTEMPORT hardware to
    utilize the out of band congestion notification. This hardware mechanism works
    by looking at the switch port egress queue and determines whether there is
    enough buffers for this queue, with that class of service for a successful
    transmission and if not, backpressures the SYSTEMPORT queue that is being used.

    For this to work, we implement a notifier which looks at the
    DSA_PORT_REGISTER event. When DSA network devices are registered, the
    framework calls the DSA notifiers when that happens, extracts the number
    of queues for these devices and their associated port number, remembers
    that in the driver private structure and linearly maps those queues to
    TX rings/queues that we manage.

    This scheme works because DSA slave network deviecs always transmit
    through SYSTEMPORT so when DSA slave network devices are
    destroyed/brought down, the corresponding SYSTEMPORT queues are no
    longer used. Also, by design of the DSA framework, the master network
    device (SYSTEMPORT) is registered first.

    For faster lookups we use an array of up to DSA_MAX_PORTS * number of
    queues per port, and then map pointers to bcm_sysport_tx_ring such that
    our ndo_select_queue() implementation can just index into that array to
    locate the corresponding ring index.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • We need to tell the DSA master network device doing the actual
    transmission what the desired switch port and queue number is for it to
    resolve that to the internal transmit queue it is mapped to.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In preparation for communicating a given DSA network device's port
    number and switch index, create a specialized DSA notifier and two
    events: DSA_PORT_REGISTER and DSA_PORT_UNREGISTER that communicate: the
    slave network device (slave_dev), port number and switch number in the
    tree.

    This will be later used for network device drivers like bcmsysport which
    needs to cooperate with its DSA network devices to set-up queue mapping
    and scheduling.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • This reverts commit df1ec1b9d0df57e96011f175418dc95b1af46821.

    It turns out that memory allocated via dma_alloc_coherent is always
    aligned to the size of the buffer, so there's no way the RRD and RFD
    can ever be in separate 32-bit regions.

    Signed-off-by: Timur Tabi
    Signed-off-by: David S. Miller

    Timur Tabi
     
  • Remove three inline helpers that are no longer needed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Oct, 2017

6 commits

  • Variable old_flags is being assigned but is never read; it is redundant
    and can be removed.

    Cleans up clang warning: Value stored to 'old_flags' is never read

    Signed-off-by: Colin Ian King
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • Tariq Toukan says:

    ====================
    mlx4_en XDP TX improvements

    This patchset contains performance improvements
    to the XDP_TX use case in the mlx4 Eth driver.

    Patch 1 is a simple change in a function parameter type.
    Patch 2 replaces a call to a generic function with the
    relevant parts inlined.
    Patch 3 moves the write of descriptors' constant values
    from data path to control path.

    Series generated against net-next commit:
    833e0e2f24fd net: dst: move cpu inside ifdef to avoid compilation warning
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In XDP_TX, some fields in tx_info and tx_desc are constants across
    all entries of the different XDP_TX rings.
    Assign values to these fields on ring creation time, rather than in
    data-path.

    Patchset performance tests:
    Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
    Single queue no-RSS optimization ON.

    XDP_TX packet rate:
    ------------------------------
    Before | After | Gain |
    13.7 Mpps | 14.0 Mpps | %2.2 |
    ------------------------------

    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Tariq Toukan
     
  • Function mlx4_en_tx_write_desc() is not optimized to use of XDP xmit.
    Use the relevant parts inline instead.

    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Tariq Toukan
     
  • The struct net_device parameter was passed only to extract
    struct mlx4_en_priv out of it.
    Here we pass the priv parameter directly.

    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Tariq Toukan
     
  • The function ipgre_mpls_encap_hlen is local to the source and
    does not need to be in global scope, so make it static.

    Cleans up sparse warning:
    symbol 'ipgre_mpls_encap_hlen' was not declared. Should it be static?

    Fixes: bdc476413dcdb ("ip_tunnel: add mpls over gre support")
    Signed-off-by: Colin Ian King
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Colin Ian King