08 Aug, 2017

32 commits

  • Signed-off-by: Yonghong Song
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Yonghong Song
     
  • Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
    style tracepoints. The iovisor/bcc issue #748
    (https://github.com/iovisor/bcc/issues/748) documents this issue.
    For example, if you try to attach a bpf program to tracepoints
    syscalls/sys_enter_newfstat, you will get the following error:
    # ./tools/trace.py t:syscalls:sys_enter_newfstat
    Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
    Failed to attach BPF to tracepoint

    The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
    tracepoints are treated differently from other tracepoints and there
    is no bpf hook to it.

    This patch adds bpf support for these syscalls tracepoints by
    . permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF
    . calling bpf programs in perf_syscall_enter and perf_syscall_exit

    The legality of bpf program ctx access is also checked.
    Function trace_event_get_offsets returns correct max offset for each
    specific syscall tracepoint, which is compared against the maximum offset
    access in bpf program.

    Signed-off-by: Yonghong Song
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Yonghong Song
     
  • The "fixed-link" prop support predated of_property_read_u32_array(), so
    basically had to open-code it. Using the modern API saves 24 bytes of the
    object code (ARM gcc 4.8.5); the only behavior change would be that the
    prop length check is now less strict (however the strict pre-check done
    in of_phy_is_fixed_link() is left intact anyway)...

    Signed-off-by: Sergei Shtylyov
    Reviewed-by: Andrew Lunn
    Reviewed-by: Rob Herring
    Signed-off-by: David S. Miller

    Sergei Shtylyov
     
  • Reporting any return code for a receive buffer as an "rx error" only
    produces alarming noise and the only values that have been observed to be
    used in this field are not error conditions. Change this to a netdev_dbg
    with a more descriptive message.

    Signed-off-by: John Allen
    Signed-off-by: David S. Miller

    John Allen
     
  • David Ahern says:

    ====================
    net: l3mdev: Support for sockets bound to enslaved device

    A missing piece to the VRF puzzle is the ability to bind sockets to
    devices enslaved to a VRF. This patch set adds the enslaved device
    index, sdif, to IPv4 and IPv6 socket lookups. The end result for users
    is the following scope options for services:

    1. "global" services - sockets not bound to any device

    Allows 1 service to work across all network interfaces with
    connected sockets bound to the VRF the connection originates
    (Requires net.ipv4.tcp_l3mdev_accept=1 for TCP and
    net.ipv4.udp_l3mdev_accept=1 for UDP)

    2. "VRF" local services - sockets bound to a VRF

    Sockets work across all network interfaces enslaved to a VRF but
    are limited to just the one VRF.

    3. "device" services - sockets bound to a specific network interface

    Service works only through the one specific interface.

    v3
    - convert __inet_lookup_established in dccp_v4_err; missed in v2

    v2
    - remove sk_lookup struct and add sdif as an argument to existing
    functions

    Changes since RFC:
    - no significant logic changes; mainly whitespace cleanups
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Add a second device index, sdif, to raw socket lookups. sdif is the
    index for ingress devices enslaved to an l3mdev. It allows the lookups
    to consider the enslaved device as well as the L3 domain when searching
    for a socket.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add a second device index, sdif, to inet6 socket lookups. sdif is the
    index for ingress devices enslaved to an l3mdev. It allows the lookups
    to consider the enslaved device as well as the L3 domain when searching
    for a socket.

    TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
    ingress index is obtained from IPCB using inet_sdif and after tcp_v4_rcv
    tcp_v4_sdif is used.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add a second device index, sdif, to udp socket lookups. sdif is the
    index for ingress devices enslaved to an l3mdev. It allows the lookups
    to consider the enslaved device as well as the L3 domain when searching
    for a socket.

    Early demux lookups are handled in the next patch as part of INET_MATCH
    changes.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add a second device index, sdif, to raw socket lookups. sdif is the
    index for ingress devices enslaved to an l3mdev. It allows the lookups
    to consider the enslaved device as well as the L3 domain when searching
    for a socket.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add a second device index, sdif, to inet socket lookups. sdif is the
    index for ingress devices enslaved to an l3mdev. It allows the lookups
    to consider the enslaved device as well as the L3 domain when searching
    for a socket.

    TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
    ingress index is obtained from IPCB using inet_sdif and after the cb move
    in tcp_v4_rcv the tcp_v4_sdif helper is used.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add a second device index, sdif, to udp socket lookups. sdif is the
    index for ingress devices enslaved to an l3mdev. It allows the lookups
    to consider the enslaved device as well as the L3 domain when searching
    for a socket.

    Early demux lookups are handled in the next patch as part of INET_MATCH
    changes.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • …ub/scm/linux/kernel/git/kvalo/wireless-drivers-next

    Kalle Valo says:

    ====================
    wireless-drivers-next patches for 4.14

    The first wireless-drivers-next pull request for 4.14. I'm submitting
    this unusally late in the cycle as my vacation postponed this. But
    even if this is late there's not still that much new features, mostly
    cleanup or fixes.

    Major changes:

    ath10k

    * preparation for wcn3990 support

    iwlwifi

    * Reorganization of the code into separate directories continues

    qtnfmac

    * regulatory support updates

    * add get_channel, dump_survey and channel_switch cfg80211 handlers
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Without CONFIG_PCI_IOV, we get a harmless warning about an
    unused function:

    drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2273:13: error: 'hclge_disable_sriov' defined but not used [-Werror=unused-function]

    The #ifdefs in this driver are obviously wrong, so this just
    removes them and uses an IS_ENABLED() check that does the same
    thing correctly in a more readable way.

    Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • Saeed Mahameed says:

    ====================
    mlx5-shared-2017-08-07

    This series includes some mlx5 updates for both net-next and rdma trees.

    From Saeed,
    Core driver updates to allow selectively building the driver with
    or without some large driver components, such as
    - E-Switch (Ethernet SRIOV support).
    - Multi-Physical Function Switch (MPFs) support.
    For that we split E-Switch and MPFs functionalities into separate files.

    From Erez,
    Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration
    is taking place and until it completes.

    From Rabie,
    Increase the maximum supported flow counters.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Jiri Pirko says:

    ====================
    net: sched: summer cleanup part 2, ndo_setup_tc

    This patchset focuses on ndo_setup_tc and its args.
    Currently there are couple of things that do not make much sense.
    The type is passed in struct tc_to_netdev, but as it is always
    required, should be arg of the ndo. Other things are passed as args
    but they are only relevant for cls offloads and not mqprio. Therefore,
    they should be pushed to struct. As the tc_to_netdev struct in the end
    is just a container of single pointer, we get rid of it and pass the
    struct according to type. So in the end, we have:
    ndo_setup_tc(dev, type, type_data_struct)

    There are couple of cosmetics done on the way to make things smooth.
    Also, reported error is consolidated to eopnotsupp in case the
    asked offload is not supported.

    v1->v2:
    - added forgotten hns3pf bits
    ====================

    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Get rid of struct tc_to_netdev which is now just unnecessary container
    and rather pass per-type structures down to drivers directly.
    Along with that, consolidate the naming of per-type structure variables
    in cls_*.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
    drivers have it like that, so be aligned.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • prio is not cls_flower specific, but it is meaningful for all
    classifiers. Seems that only mlxsw cares about the value. Obviously,
    cls offload in other drivers is broken.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • As ndo_setup_tc is generic offload op for whole tc subsystem, does not
    really make sense to have cls-specific args. So move them under
    cls_common structurure which is embedded in all cls structs.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Similar to the rest offloaders of mqprio, no need to check handle.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Change the flows a bit in preparation of follow-up changes in
    ndo_setup_tc args. Also, change the error code to align with the rest of
    the drivers.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Let dsa_slave_setup_tc be a splitter for specific setup_tc types and
    push out cls_matchall specific code into a separate function.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • To sync-up with the naming in the rest of the driver, rename the cls arg.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Let mlxsw_sp_setup_tc be a splitter for specific setup_tc types and push
    out cls_flower and cls_matchall specific codes into separate functions.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Let mlx5e_rep_setup_tc (former mlx5e_rep_ndo_setup_tc) be a splitter for
    specific setup_tc types and push out cls_flower specific code into
    a separate function.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Let mlx5e_setup_tc (former mlx5e_ndo_setup_tc) be a splitter for specific
    setup_tc types and push out cls_flower and mqprio specific codes into
    separate functions. Also change the return values so they are the same
    as in the rest of the drivers.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Let __ixgbe_setup_tc be a splitter for specific setup_tc types and push out
    cls_u32 and mqprio specific codes into separate functions. Also change
    the return values so they are the same as in the rest of the drivers.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Let cxgb_setup_tc be a splitter for specific setup_tc types and push out
    cls_u32 specific code into a separate function.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Since this is specific to flower now, make it part of the flower offload
    struct.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • In order to be aligned with the rest of the types, rename
    TC_SETUP_MATCHALL to TC_SETUP_CLSMATCHALL.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Since the type is always present, push it to be a separate argument to
    ndo_setup_tc. On the way, name the type enum and use it for arg type.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

07 Aug, 2017

8 commits

  • Read new NIC capability field which represnts 16 MSBs of the max flow
    counters number supported (max_flow_counter_31_16).

    Backward compatibility with older firmware is preserved, the modified
    driver reads max_flow_counter_31_16 as 0 from the older firmware and
    uses up to 64K counters.

    Changed flow counter id from 16 bits to 32 bits. Backward compatibility
    with older firmware is preserved as we kept the 16 LSBs of the counter
    id in place and added 16 MSBs from reserved field.

    Changed the background bulk reading of flow counters to work in chunks
    of at most 32K counters, to make sure we don't attempt to allocate very
    large buffers.

    Signed-off-by: Rabie Loulou
    Reviewed-by: Or Gerlitz
    Signed-off-by: Saeed Mahameed

    Rabie Loulou
     
  • The counter list hardware structure doesn't contain a clear and
    num_of_counters fields, remove them.

    These wrong fields were never used by the driver hence no other driver
    changes.

    Fixes: a351a1b03bf1 ("net/mlx5: Introduce bulk reading of flow counters")
    Signed-off-by: Rabie Loulou
    Reviewed-by: Or Gerlitz
    Signed-off-by: Saeed Mahameed

    Rabie Loulou
     
  • When mlx5_ib registers itself to mlx5_core as an interface, it will
    call mlx5_add_device which will call mlx5_ib interface add callback,
    in case the latter successfully returns, only then mlx5_core will add
    it to the interface list and async events will be forwarded to mlx5_ib.
    Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib
    interface to its devices list, arriving mlx5_core events can be missed
    by the new mlx5_ib registering interface.

    In other words:
    thread 1: mlx5_ib: mlx5_register_interface(dev)
    thread 1: mlx5_core: mlx5_add_device(dev)
    thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add
    thread 2: mlx5_core_event: **new event arrives, forward to dev_list
    thread 1: mlx5_core: add_ctx_to_dev_list(ctx)
    /* previous event was missed by the new interface.*/
    It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device
    but not after.

    We fix this race by accumulating the events that come between the
    ib_register_device (inside mlx5_add_device->(dev->add)) till the adding
    to the list completes and fire them to the new registering interface
    after that.

    Fixes: f1ee87fe55c8 ("net/mlx5: Organize device list API in one place")
    Signed-off-by: Erez Shitrit
    Signed-off-by: Saeed Mahameed

    Erez Shitrit
     
  • Allow to selectively build the driver with or without sriov eswitch, VF
    representors and TC offloads.

    Also remove the need of two ndo ops structures (sriov & basic)
    and keep only one unified ndo ops, compile out VF SRIOV ndos when not
    needed (MLX5_ESWITCH=n), and for VF netdev calling those ndos will result
    in returning -EPERM.

    Signed-off-by: Saeed Mahameed
    Reviewed-by: Or Gerlitz
    Cc: Jes Sorensen
    Cc: kernel-team@fb.com

    Saeed Mahameed
     
  • Multi-Physical Function Switch (MPFs) is required for when multi-PF
    configuration is enabled to allow passing user configured unicast MAC
    addresses to the requesting PF.

    Before this patch eswitch.c used to manage the HW MPFS l2 table,
    E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
    contexts update on unicast mac address list changes, to populate the PF's
    MPFS L2 table accordingly.

    In downstream patch we would like to allow compiling the driver without
    E-Switch functionalities, for that we move MPFS l2 table logic out
    of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
    allow compiling out MPFS for those who don't want Multi-PF support.

    NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
    API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
    responsible of updating its MPFS l2 table on behalf of its VFs.

    Due to this change we also don't require enabling vport(0) (PF vport)
    unicast mac changes events anymore, for when SRIOV is not enabled.
    Which means E-Switch is now activated only on SRIOV activation, and not
    required otherwise.

    Signed-off-by: Saeed Mahameed
    Cc: Jes Sorensen
    Cc: kernel-team@fb.com

    Saeed Mahameed
     
  • Expose MLX5_VPORT_MANAGER macro to check for strict vport manager
    E-switch and MPFS (Multi Physical Function Switch) abilities.

    VPORT manager must be a PF with an ethernet link and with FW advertised
    vport group manager capability

    Replace older checks with the new macro and use it where needed in
    eswitch.c and mlx5e netdev eswitch related flows.

    The same macro will be reused in MPFS separation downstream patch.

    Signed-off-by: Saeed Mahameed

    Saeed Mahameed
     
  • Remove redundant call to unregister vport representor in mlx5e_add error
    flow.

    Hide the representor priv and eswitch internal structures from en_main.c
    as preparation step for downstream patches which would allow building
    the driver without support for representors and eswitch.

    Fixes: 6f08a22c5fb2 ("net/mlx5e: Register/unregister vport representors on interface attach/detach")
    Signed-off-by: Saeed Mahameed
    Reviewed-by: Or Gerlitz

    Saeed Mahameed
     
  • Since we are going to allow building the driver without eswitch support,
    it would be possible to compile out the sriov netdevice ops struct such
    that the basic ops instance will be used for non VF devices too.

    Add missing udp tunnel ndos into mlx5e_netdev_ops_basic.

    While here, rearrange some ndos in the sriov ops struct and put
    vf/eswitch related ndos towards the end of it.

    Signed-off-by: Saeed Mahameed
    Reviewed-by: Or Gerlitz

    Saeed Mahameed