15 Jan, 2017

1 commit

  • [ Upstream commit 4775cc1f2d5abca894ac32774eefc22c45347d1c ]

    We miss to check if the netlink message is actually big enough to contain
    a struct if_stats_msg.

    Add a check to prevent userland from sending us short messages that would
    make us access memory beyond the end of the message.

    Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...")
    Signed-off-by: Mathias Krause
    Cc: Roopa Prabhu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mathias Krause
     

02 Dec, 2016

1 commit


24 Nov, 2016

1 commit


20 Nov, 2016

1 commit


19 Nov, 2016

1 commit


16 Nov, 2016

2 commits

  • rtnl_xdp_size() only considers the size of the actual payload attribute,
    and misses the space taken by the attribute used for nesting (IFLA_XDP).

    Fixes: d1fdd9138682 ("rtnl: add option for setting link xdp prog")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Brenden Blanco
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     
  • The size reported by rtnl_vfinfo_size doesn't match the space used by
    rtnl_fill_vfinfo.

    rtnl_vfinfo_size currently doesn't account for the nest attributes
    used by statistics (added in commit 3b766cd83232), nor for struct
    ifla_vf_tx_rate (since commit ed616689a3d9, which added ifla_vf_rate
    to the dump without removing ifla_vf_tx_rate, but replaced
    ifla_vf_tx_rate with ifla_vf_rate in the size computation).

    Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
    Fixes: ed616689a3d9 ("net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

10 Nov, 2016

1 commit

  • To avoid having dangling function pointers left behind, reset calcit in
    rtnl_unregister(), too.

    This is no issue so far, as only the rtnl core registers a netlink
    handler with a calcit hook which won't be unregistered, but may become
    one if new code makes use of the calcit hook.

    Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
    Cc: Jeff Kirsher
    Cc: Greg Rose
    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     

14 Oct, 2016

1 commit

  • The "vf_vlan_info" struct ends with a 2 byte struct hole so we have to
    memset it to ensure that no stack information is revealed to user space.

    Fixes: 79aab093a0b5 ('net: Update API for VF vlan protocol 802.1ad support')
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

03 Oct, 2016

1 commit

  • With the newly added support for IFLA_VF_VLAN_LIST netlink messages,
    we get a warning about potential uninitialized variable use in
    the parsing of the user input when enabling the -Wmaybe-uninitialized
    warning:

    net/core/rtnetlink.c: In function 'do_setvfinfo':
    net/core/rtnetlink.c:1756:9: error: 'ivvl$' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    I have not been able to prove whether it is possible to arrive in
    this code with an empty IFLA_VF_VLAN_LIST block, but if we do,
    then ndo_set_vf_vlan gets called with uninitialized arguments.

    This adds an explicit check for an empty list, making it obvious
    to the reader and the compiler that this cannot happen.

    Fixes: 79aab093a0b5 ("net: Update API for VF vlan protocol 802.1ad support")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Moshe Shemesh
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

24 Sep, 2016

1 commit

  • Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
    the ability for user-space application to specify it for the VF, as an
    option to support 802.1ad.
    We adjusted IP Link tool to support this option.

    For future use cases, the new UAPI supports multiple vlans. For now we
    limit the list size to a single vlan in kernel.
    Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
    compatibility with older versions of IP Link tool.

    Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
    We kept 802.1Q as the drivers' default vlan protocol.
    Suitable ip link tool command examples:
    Set vf vlan protocol 802.1ad:
    ip link set eth0 vf 1 vlan 100 proto 802.1ad
    Set vf to VST (802.1Q) mode:
    ip link set eth0 vf 1 vlan 100 proto 802.1Q
    Or by omitting the new parameter
    ip link set eth0 vf 1 vlan 100

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Moshe Shemesh
     

19 Sep, 2016

1 commit

  • Add a nested attribute of offload stats to if_stats_msg
    named IFLA_STATS_LINK_OFFLOAD_XSTATS.
    Under it, add SW stats, meaning stats only per packets that went via
    slowpath to the cpu, named IFLA_OFFLOAD_XSTATS_CPU_HIT.

    Signed-off-by: Nogah Frankel
    Signed-off-by: Jiri Pirko
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nogah Frankel
     

10 Sep, 2016

1 commit


02 Sep, 2016

1 commit

  • fdb dumps spanning multiple skb's currently restart from the first
    interface again for every skb. This results in unnecessary
    iterations on the already visited interfaces and their fdb
    entries. In large scale setups, we have seen this to slow
    down fdb dumps considerably. On a system with 30k macs we
    see fdb dumps spanning across more than 300 skbs.

    To fix the problem, this patch replaces the existing single fdb
    marker with three markers: netdev hash entries, netdevs and fdb
    index to continue where we left off instead of restarting from the
    first netdev. This is consistent with link dumps.

    In the process of fixing the performance issue, this patch also
    re-implements fix done by
    commit 472681d57a5d ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
    (with an internal fix from Wilson Kok) in the following ways:
    - change ndo_fdb_dump handlers to return error code instead
    of the last fdb index
    - use cb->args strictly for dump frag markers and not error codes.
    This is consistent with other dump functions.

    Below results were taken on a system with 1000 netdevs
    and 35085 fdb entries:
    before patch:
    $time bridge fdb show | wc -l
    15065

    real 1m11.791s
    user 0m0.070s
    sys 1m8.395s

    (existing code does not return all macs)

    after patch:
    $time bridge fdb show | wc -l
    35085

    real 0m2.017s
    user 0m0.113s
    sys 0m1.942s

    Signed-off-by: Roopa Prabhu
    Signed-off-by: Wilson Kok
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

24 Aug, 2016

1 commit

  • Since the features bit field has bits for internal only use as well, it
    may happen that the kernel exports RTAX_FEATURES attribute with zero
    value which is pointless.

    Fix this by making sure the attribute is added only if the exported
    value is non-zero.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     

21 Jul, 2016

1 commit


20 Jul, 2016

1 commit

  • Sets the bpf program represented by fd as an early filter in the rx path
    of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
    Providing a negative value as fd clears the program. Getting the fd back
    via rtnl is not possible, therefore reading of this value merely
    provides a bool whether the program is valid on the link or not.

    Signed-off-by: Brenden Blanco
    Signed-off-by: David S. Miller

    Brenden Blanco
     

01 Jul, 2016

1 commit

  • This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
    will be triggered when tx_queue_len. It could be used by net device
    who want to do some processing at that time. An example is tun who may
    want to resize tx array when tx_queue_len is changed.

    Cc: John Fastabend
    Signed-off-by: Jason Wang
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Jason Wang
     

30 Jun, 2016

1 commit

  • This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
    which allows to export per-slave statistics if the master device supports
    the linkxstats callback. The attribute is passed down to the linkxstats
    callback and it is up to the callback user to use it (an example has been
    added to the only current user - the bridge). This allows us to query only
    specific slaves of master devices like bridge ports and export only what
    we're interested in instead of having to dump all ports and searching only
    for a single one. This will be used to export per-port IGMP/MLD stats and
    also per-port vlan stats in the future, possibly other statistics as well.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

16 Jun, 2016

1 commit

  • qdisc are changed under RTNL protection and often
    while blocking BH and root qdisc spinlock.

    When lots of skbs need to be dropped, we free
    them under these locks causing TX/RX freezes,
    and more generally latency spikes.

    This commit adds rtnl_kfree_skbs(), used to queue
    skbs for deferred freeing.

    Actual freeing happens right after RTNL is released,
    with appropriate scheduling points.

    rtnl_qdisc_drop() can also be used in place
    of disc_drop() when RTNL is held.

    qdisc_reset_queue() and __qdisc_reset_queue() get
    the new behavior, so standard qdiscs like pfifo, pfifo_fast...
    have their ->reset() method automatically handled.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 May, 2016

1 commit


05 May, 2016

1 commit

  • The stack object “map” has a total size of 32 bytes. Its last 4
    bytes are padding generated by compiler. These padding bytes are
    not initialized and sent out via “nla_put”.

    Signed-off-by: Kangjie Lu
    Signed-off-by: David S. Miller

    Kangjie Lu
     

03 May, 2016

2 commits

  • Add callbacks to calculate the size and fill link extended statistics
    which can be split into multiple messages and are dumped via the new
    rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute.
    Also add that attribute to the idx mask check since it is expected to
    be able to save state and resume dumping (e.g. future bridge per-vlan
    stats will be dumped via this attribute and callbacks).
    Each link type should nest its private attributes under the per-link type
    attribute. This allows to have any number of separated private attributes
    and to avoid one call to get the dev link type.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • The new prividx argument allows the current dumping device to save a
    private state counter which would enable it to continue dumping from
    where it left off. And the idxattr is used to save the current idx user
    so multiple prividx using attributes can be requested at the same time
    as suggested by Roopa Prabhu.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

27 Apr, 2016

1 commit


26 Apr, 2016

1 commit


22 Apr, 2016

1 commit


21 Apr, 2016

1 commit

  • This patch adds a new RTM_GETSTATS message to query link stats via netlink
    from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
    returns a lot more than just stats and is expensive in some cases when
    frequent polling for stats from userspace is a common operation.

    RTM_GETSTATS is an attempt to provide a light weight netlink message
    to explicity query only link stats from the kernel on an interface.
    The idea is to also keep it extensible so that new kinds of stats can be
    added to it in the future.

    This patch adds the following attribute for NETDEV stats:
    struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
    [IFLA_STATS_LINK_64] = { .len = sizeof(struct rtnl_link_stats64) },
    };

    Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
    a single interface or all interfaces with NLM_F_DUMP.

    Future possible new types of stat attributes:
    link af stats:
    - IFLA_STATS_LINK_IPV6 (nested. for ipv6 stats)
    - IFLA_STATS_LINK_MPLS (nested. for mpls/mdev stats)
    extended stats:
    - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
    vlan, vxlan etc)
    - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
    available via ethtool today)

    This patch also declares a filter mask for all stat attributes.
    User has to provide a mask of stats attributes to query. filter mask
    can be specified in the new hdr 'struct if_stats_msg' for stats messages.
    Other important field in the header is the ifindex.

    This api can also include attributes for global stats (eg tcp) in the future.
    When global stats are included in a stats msg, the ifindex in the header
    must be zero. A single stats message cannot contain both global and
    netdev specific stats. To easily distinguish them, netdev specific stat
    attributes name are prefixed with IFLA_STATS_LINK_

    Without any attributes in the filter_mask, no stats will be returned.

    This patch has been tested with mofified iproute2 ifstat.

    Suggested-by: Jamal Hadi Salim
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

20 Apr, 2016

2 commits


19 Apr, 2016

1 commit


01 Apr, 2016

1 commit


24 Mar, 2016

1 commit

  • Pull networking bugfixes from David Miller:
    "Several bug fixes rolling in, some for changes introduced in this
    merge window, and some for problems that have existed for some time:

    1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda.

    2) The new DST_CACHE should be a silent config option, from Dave
    Jones.

    3) inet_current_timestamp() unintentionally truncates timestamps to
    16-bit, from Deepa Dinamani.

    4) Missing reference to netns in ppp, from Guillaume Nault.

    5) Free memory reference in hv_netvsc driver, from Haiyang Zhang.

    6) Missing kernel doc documentation for function arguments in various
    spots around the networking, from Luis de Bethencourt.

    7) UDP stopped receiving broadcast packets properly, due to
    overzealous multicast checks, fix from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    net: ping: make ping_v6_sendmsg static
    hv_netvsc: Fix the order of num_sc_offered decrement
    net: Fix typos and whitespace.
    hv_netvsc: Fix the array sizes to be max supported channels
    hv_netvsc: Fix accessing freed memory in netvsc_change_mtu()
    ppp: take reference on channels netns
    net: Reset encap_level to avoid resetting features on inner IP headers
    net: mediatek: fix checking for NULL instead of IS_ERR() in .probe
    net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY
    at803x: fix reset handling
    AF_VSOCK: Shrink the area influenced by prepare_to_wait
    Revert "vsock: Fix blocking ops call in prepare_to_wait"
    macb: fix PHY reset
    ipv4: initialize flowi4_flags before calling fib_lookup()
    fsl/fman: Workaround for Errata A-007273
    ipv4: fix broadcast packets reception
    net: hns: bug fix about the overflow of mss
    net: hns: adds limitation for debug port mtu
    net: hns: fix the bug about mtu setting
    net: hns: fixes a bug of RSS
    ...

    Linus Torvalds
     

23 Mar, 2016

1 commit

  • Pull more rdma updates from Doug Ledford:
    "Round two of 4.6 merge window patches.

    This is a monster pull request. I held off on the hfi1 driver updates
    (the hfi1 driver is intimately tied to the qib driver and the new
    rdmavt software library that was created to help both of them) in my
    first pull request. The hfi1/qib/rdmavt update is probably 90% of
    this pull request. The hfi1 driver is being left in staging so that
    it can be fixed up in regards to the API that Al and yourself didn't
    like. Intel has agreed to do the work, but in the meantime, this
    clears out 300+ patches in the backlog queue and brings my tree and
    their tree closer to sync.

    This also includes about 10 patches to the core and a few to mlx5 to
    create an infrastructure for configuring SRIOV ports on IB devices.
    That series includes one patch to the net core that we sent to netdev@
    and Dave Miller with each of the three revisions to the series. We
    didn't get any response to the patch, so we took that as implicit
    approval.

    Finally, this series includes Intel's new iWARP driver for their x722
    cards. It's not nearly the beast as the hfi1 driver. It also has a
    linux-next merge issue, but that has been resolved and it now passes
    just fine.

    Summary:

    - A few minor core fixups needed for the next patch series

    - The IB SRIOV series. This has bounced around for several versions.
    Of note is the fact that the first patch in this series effects the
    net core. It was directed to netdev and DaveM for each iteration
    of the series (three versions total). Dave did not object, but did
    not respond either. I've taken this as permission to move forward
    with the series.

    - The new Intel X722 iWARP driver

    - A huge set of updates to the Intel hfi1 driver. Of particular
    interest here is that we have left the driver in staging since it
    still has an API that people object to. Intel is working on a fix,
    but getting these patches in now helps keep me sane as the upstream
    and Intel's trees were over 300 patches apart"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
    IB/ipoib: Allow mcast packets from other VFs
    IB/mlx5: Implement callbacks for manipulating VFs
    net/mlx5_core: Implement modify HCA vport command
    net/mlx5_core: Add VF param when querying vport counter
    IB/ipoib: Add ndo operations for configuring VFs
    IB/core: Add interfaces to control VF attributes
    IB/core: Support accessing SA in virtualized environment
    IB/core: Add subnet prefix to port info
    IB/mlx5: Fix decision on using MAD_IFC
    net/core: Add support for configuring VF GUIDs
    IB/{core, ulp} Support above 32 possible device capability flags
    IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
    net/mlx5_core: Introduce offload arithmetic hardware capabilities
    net/mlx5_core: Refactor device capability function
    net/mlx5_core: Fix caching ATOMIC endian mode capability
    ib_srpt: fix a WARN_ON() message
    i40iw: Replace the obsolete crypto hash interface with shash
    IB/hfi1: Add SDMA cache eviction algorithm
    IB/hfi1: Switch to using the pin query function
    IB/hfi1: Specify mm when releasing pages
    ...

    Linus Torvalds
     

22 Mar, 2016

2 commits

  • Add two new NLAs to support configuration of Infiniband node or port
    GUIDs. New applications can choose to use this interface to configure
    GUIDs with iproute2 with commands such as:

    ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
    ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

    A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the
    request to change the GUID.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • It can be useful to report dev->gso_max_segs and dev->gso_max_size
    so that "ip -d link" can display them to help debugging.

    For the moment, these attributes are read-only.

    Signed-off-by: Eric Dumazet
    Cc: Petri Gynther
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Mar, 2016

1 commit


03 Mar, 2016

1 commit


27 Feb, 2016

1 commit

  • When the send skbuff reaches the end, nlmsg_put and friends returns
    -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
    within a for_each_netdev loop and the first fdb entry of a following
    netdev could fit in the remaining skbuff. This breaks the mechanism
    of cb->args[0] and idx to keep track of the entries that are already
    dumped, which results missing entries in bridge fdb show command.

    Signed-off-by: Minoura Makoto
    Signed-off-by: David S. Miller

    MINOURA Makoto / 箕浦 真
     

11 Feb, 2016

1 commit

  • Add support for filtering link dumps by master device and kind, similar
    to the filtering implemented for neighbor dumps.

    Each net_device that exists adds between 1196 bytes (eth) and 1556 bytes
    (bridge) to the link dump. As the number of interfaces increases so does
    the amount of data pushed to user space for a link list. If the user
    only wants to see a list of specific devices (e.g., interfaces enslaved
    to a specific bridge or a list of VRFs) most of that data is thrown away.
    Passing the filters to the kernel to have only relevant data returned
    makes the dump more efficient.

    Signed-off-by: David Ahern
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    David Ahern