22 Nov, 2016

33 commits

  • Andrew Lunn says:

    ====================
    Start adding support for mv88e6390

    This is the first patchset implementing support for the mv88e6390
    family. This is a new generation of switch devices and has numerous
    incompatible changes to the registers. These patches allow the switch
    to the detected during probe, and makes the statistics unit work.

    These patches are insufficient to make the mv88e6390 functional. More
    patches will follow.

    v2:
    Move stats code into global1
    Change DT compatible string to mv88e6190
    Fixed mv88e6351 stats which v1 had broken
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Move the stats functions which access global 1 registers into
    global1.c.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • The mv88e6390 uses a different bit to select between bank0 and bank1
    of the statistics. So implement an ops function for this, and pass the
    selector bit to the generic stats read function. Also, the histogram
    selection has moved for the mv88e6390, so abstract its selection as
    well.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Different families have different sets of statistics. Abstract this
    using a stats_get_stats op. The mv88e6390 needs a different
    implementation, which will be added later.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Different families have different sets of statistics. Abstract this
    using a stats_get_sset_count and stats_get_strings op. Each stat has a
    bitmap, and the ops implementer uses a bit map mask to count the
    statistics which apply for the family, or return the list of strings.

    Signed-off-by: Andrew Lunn
    v2:
    Rename functions to avoid _ prefix.
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • The statistics unit on the mv88e6390 needs the histogram mode to be
    configured in a different register compared to other devices. Add an
    ops to do this.

    Signed-off-by: Andrew Lunn
    v2:
    Rename to mv88e6390_g1_stats_set_histogram
    Move into global1.c
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • The MV88E6390 has a control register for what the histogram statistics
    actually contain. This means the stat_snapshot method should not set
    this information. So implement the 6390 stats_snapshot function without
    these bits.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Knowing the family of device belongs to helps with picking the ops
    implementation which is appropriate to the device. So add a comment to
    each structure of ops.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Taking a stats snapshot differs between same families. Abstract this
    into an ops member. At the same time, move the code into global1.[ch],
    since the registers are in the global1 range.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • With the devices added to the tables, the probe will recognize the
    switch. This however is not sufficient to make it work properly, other
    changes are needed because of incompatibilities.

    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • _mv88e6xxx_stats_wait() did not check the return value from
    mv88e6xxx_g1_read(), so the compiler complained about set but unused
    err.

    Signed-off-by: Andrew Lunn
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • The switch needs to be taken out of reset before we can read its ID
    register on the MDIO bus.

    Signed-off-by: Andrew Lunn
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Declare the structure ieee802154_ops as const as it is only passed as an
    argument to the function ieee802154_alloc_hw. This argument is of type
    const struct ieee802154_ops *, so ieee80254_ops structures having this
    property can be declared as const.
    Done using Coccinelle:

    @r1 disable optional_qualifier @
    identifier i;
    position p;
    @@
    static struct ieee802154_ops i@p = {...};

    @ok1@
    identifier r1.i;
    position p;
    expression e1;
    @@
    ieee802154_alloc_hw(e1,&i@p)

    @bad@
    position p!={r1.p,ok1.p};
    identifier r1.i;
    @@
    i@p

    @depends on !bad disable optional_qualifier@
    identifier r1.i;
    @@
    static
    +const
    struct ieee802154_ops i={...};

    @depends on !bad disable optional_qualifier@
    identifier r1.i;
    @@
    +const
    struct ieee802154_ops i;

    The before and after size details of the affected files are:

    text data bss dec hex filename
    8669 1176 16 9861 2685 drivers/net/ieee802154/adf7242.o
    8805 1048 16 9869 268d drivers/net/ieee802154/adf7242.o

    text data bss dec hex filename
    7211 2296 32 9539 2543 drivers/net/ieee802154/atusb.o
    7339 2160 32 9531 253b drivers/net/ieee802154/atusb.o

    Signed-off-by: Bhumika Goyal
    Acked-by: Stefan Schmidt
    Signed-off-by: David S. Miller

    Bhumika Goyal
     
  • Pravin B Shelar says:

    ====================
    geneve: Use LWT more effectively.

    Following patch series make use of geneve LWT code path for
    geneve netdev type of device.
    This allows us to simplify geneve module without changing any
    functionality.

    v2-v3:
    Rebase against latest net-next.

    v1-v2:
    Fix warning reported by kbuild test robot.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Rather than comparing 64-bit tunnel-id, compare tunnel vni
    which is 24-bit id. This also save conversion from vni
    to tunnel id on each tunnel packet receive.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     
  • Geneve already has check for device socket in route
    lookup function. So no need to check it in xmit
    function.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     
  • There are minimal difference in building Geneve header
    between ipv4 and ipv6 geneve tunnels. Following patch
    refactors code to unify it.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     
  • Current geneve implementation has two separate cases to handle.
    1. netdev xmit
    2. LWT xmit.

    In case of netdev, geneve configuration is stored in various
    struct geneve_dev members. For example geneve_addr, ttl, tos,
    label, flags, dst_cache, etc. For LWT ip_tunnel_info is passed
    to the device in ip_tunnel_info.

    Following patch uses ip_tunnel_info struct to store almost all
    of configuration of a geneve netdevice. This allows us to unify
    most of geneve driver code around ip_tunnel_info struct.
    This dramatically simplify geneve code, since it does not
    need to handle two different configuration cases. Removes
    duplicate code, single code path can handle either type
    of geneve devices.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     
  • Florian Westphal says:

    ====================
    tcp: make undo_cwnd mandatory for congestion modules

    highspeed, illinois, scalable, veno and yeah congestion control algorithms
    don't provide a 'cwnd_undo' function. This makes the stack default to a
    'reno undo' which doubles cwnd. However, the ssthresh implementation of
    these algorithms do not halve the slowstart threshold. This causes similar
    issue as the one fixed for dctcp in ce6dd23329b1e ("dctcp: avoid bogus
    doubling of cwnd after loss").

    In light of this it seems better to remove the fallback and make undo_cwnd
    mandatory.

    First patch fixes those spots where reno undo seems incorrect by providing
    .cwnd_undo functions, second patch removes the fallback.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The undo_cwnd fallback in the stack doubles cwnd based on ssthresh,
    which un-does reno halving behaviour.

    It seems more appropriate to let congctl algorithms pair .ssthresh
    and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it
    up for all congestion algorithms that used to rely on the fallback.

    Cc: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • congestion control algorithms that do not halve cwnd in their .ssthresh
    should provide a .cwnd_undo rather than rely on current fallback which
    assumes reno halving (and thus doubles the cwnd).

    All of these do 'something else' in their .ssthresh implementation, thus
    store the cwnd on loss and provide .undo_cwnd to restore it again.

    A followup patch will remove the fallback and all algorithms will
    need to provide a .cwnd_undo function.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Nikolay Aleksandrov says:

    ====================
    bridge: add support for IGMPv3 and MLDv2 querier

    This patch-set adds support for IGMPv3 and MLDv2 querier in the bridge.
    Two new options which can be toggled via netlink and sysfs are added that
    control the version per-bridge:
    multicast_igmp_version - default 2, can be set to 3
    multicast_mld_version - default 1, can be set to 2 (this option is
    disabled if CONFIG_IPV6=n)

    Note that the names do not include "querier", I think that these options
    can be re-used later as more IGMPv3 support is added to the bridge so we
    can avoid adding more options to switch between v2 and v3 behaviour.

    The set uses the already existing br_ip{4,6}_multicast_alloc_query
    functions and adds the appropriate header based on the chosen version.

    For the initial support I have removed the compatibility implementation
    (RFC3376 sec 7.3.1, 7.3.2; RFC3810 sec 8.3.1, 8.3.2), because there are
    some details that we need to sort out.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch adds basic support for MLDv2 queries, the default is MLDv1
    as before. A new multicast option - multicast_mld_version, adds the
    ability to change it between 1 and 2 via netlink and sysfs.
    The MLD option is disabled if CONFIG_IPV6 is disabled.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds basic support for IGMPv3 queries, the default is IGMPv2
    as before. A new multicast option - multicast_igmp_version, adds the
    ability to change it between 2 and 3 via netlink and sysfs. The option
    struct member is in a 4 byte hole in net_bridge.

    There also a few minor style adjustments in br_multicast_new_group and
    br_multicast_add_group.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • The function macvlan_forward_source_one has already checked the flag
    IFF_UP, so needn't check it outside in macvlan_forward_source too.

    Signed-off-by: Gao Feng
    Signed-off-by: David S. Miller

    Gao Feng
     
  • While stressing a 40Gbit mlx4 NIC with busy polling, I found false
    sharing in mlx4 driver that can be easily avoided.

    This patch brings an additional 7 % performance improvement in UDP_RR
    workload.

    1) If we received no frame during one mlx4_en_process_rx_cq()
    invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons

    2) Do not refill rx buffers if we have plenty of them.
    This avoids false sharing and allows some bulk/batch optimizations.
    Page allocator and its locks will thank us.

    Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
    cpu handling NIC IRQ should be changed. We should return budget-1
    instead, to not fool net_rx_action() and its netdev_budget.

    v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0

    Signed-off-by: Eric Dumazet
    Cc: Tariq Toukan
    Reviewed-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • barrier() is a big hammer compared to READ_ONCE(),
    and requires comments explaining what is protected.

    READ_ONCE() is more precise and compiler should generate
    better overall code.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb,
    requesting a cold cache line being read in cpu cache.

    We can avoid this cache line miss for UDP sockets,
    as partial_cov has a meaning only for UDPLite.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Daniel Borkmann says:

    ====================
    Couple of BPF refcount fixes for mlx5

    Various mlx5 bugs on eBPF refcount handling found during review.
    Last patch in series adds a __must_check to BPF helpers to make
    sure we won't run into it again w/o compiler complaining first.

    v2 -> v3:

    - Just reworked patch 2/4 so we don't need bpf_prog_sub().
    - Rebased, rest as is.

    v1 -> v2:

    - After discussion with Alexei, we agreed upon rebasing the
    patches against net-next.
    - Since net-next, I've also added the __must_check to enforce
    future users to check for errors.
    - Fixed up commit message #2.
    - Simplify assignment from patch #1 based on Saeed's feedback
    on previous set.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail
    with an error, so make sure the caller properly checks their return
    value and not just ignores it, which could worst-case lead to use
    after free.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • mlx5e_xdp_set() is currently the only place where we drop reference on the
    prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
    need to make sure that we eventually release that reference, for example,
    in case the netdev is dismantled, otherwise we leak the program.

    Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
    Signed-off-by: Daniel Borkmann
    Acked-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • There are multiple issues in mlx5e_xdp_set():

    1) The batched bpf_prog_add() is currently not checked for errors. When
    doing so, it should be done at an earlier point in time to makes sure
    that we cannot fail anymore at the time we want to set the program for
    each channel. The batched refs short-cut can only be performed when we
    don't need to perform a reset for changing the rq type and the device
    was in opened state. In case the device was not in opened state, then
    the next mlx5e_open_locked() will aquire the refs from the control prog
    via mlx5e_create_rq(), same when we need to perform a reset.

    2) When swapping the priv->xdp_prog, then no extra reference count must be
    taken since we got that from call path via dev_change_xdp_fd() already.
    Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
    without checking the return code could fail.

    Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
    Signed-off-by: Daniel Borkmann
    Acked-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
    without checking the return value. bpf_prog_add() can fail since 92117d8443bc
    ("bpf: fix refcnt overflow"), so we really must check it. Take the reference
    right when we assign it to the rq from priv->xdp_prog, and just drop the
    reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.

    Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
    Signed-off-by: Daniel Borkmann
    Acked-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

21 Nov, 2016

7 commits