08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Feb, 2017

1 commit

  • Commit 91572088e3fd ("net: use core MTU range checking in core net
    infra") changed the openvswitch internal device to use the core net
    infra for controlling the MTU range, but failed to actually set the
    max_mtu as described in the commit message, which now defaults to
    ETH_DATA_LEN.

    This patch fixes this by setting max_mtu to ETH_MAX_MTU after
    ether_setup() call.

    Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
    Signed-off-by: Jarno Rajahalme
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

09 Jan, 2017

1 commit

  • The network device operation for reading statistics is only called
    in one place, and it ignores the return value. Having a structure
    return value is potentially confusing because some future driver could
    incorrectly assume that the return value was used.

    Fix all drivers with ndo_get_stats64 to have a void function.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

21 Oct, 2016

1 commit

  • geneve:
    - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
    - This one isn't quite as straight-forward as others, could use some
    closer inspection and testing

    macvlan:
    - set min/max_mtu

    tun:
    - set min/max_mtu, remove tun_net_change_mtu

    vxlan:
    - Merge __vxlan_change_mtu back into vxlan_change_mtu
    - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
    change_mtu function
    - This one is also not as straight-forward and could use closer inspection
    and testing from vxlan folks

    bridge:
    - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
    change_mtu function

    openvswitch:
    - set min/max_mtu, remove internal_dev_change_mtu
    - note: max_mtu wasn't checked previously, it's been set to 65535, which
    is the largest possible size supported

    sch_teql:
    - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

    macsec:
    - min_mtu = 0, max_mtu = 65535

    macvlan:
    - min_mtu = 0, max_mtu = 65535

    ntb_netdev:
    - min_mtu = 0, max_mtu = 65535

    veth:
    - min_mtu = 68, max_mtu = 65535

    8021q:
    - min_mtu = 0, max_mtu = 65535

    CC: netdev@vger.kernel.org
    CC: Nicolas Dichtel
    CC: Hannes Frederic Sowa
    CC: Tom Herbert
    CC: Daniel Borkmann
    CC: Alexander Duyck
    CC: Paolo Abeni
    CC: Jiri Benc
    CC: WANG Cong
    CC: Roopa Prabhu
    CC: Pravin B Shelar
    CC: Sabrina Dubroca
    CC: Patrick McHardy
    CC: Stephen Hemminger
    CC: Pravin Shelar
    CC: Maxim Krasnyansky
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     

13 Oct, 2016

1 commit


06 Aug, 2016

1 commit

  • net_device->ndo_set_rx_headroom (introduced in
    871b642adebe300be2e50aa5f65a418510f636ec) says

    "Setting a negtaive value reset the rx headroom
    to the default value".

    It seems that the OVS implementation in
    3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets
    dev->needed_headroom unconditionally.

    This doesn't have an immediate effect, but can mess up later
    LL_RESERVED_SPACE calculations, such as done in
    net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
    from a skb_panic raised there after the length calculations had given
    the wrong result.

    Note the other current users of this interface
    (drivers/net/tun.c:tun_set_headroom and
    drivers/net/veth.c:veth_set_rx_headroom) are both checking this
    correctly thus need no modification.

    Thanks to Ben for some pointers from the crash dumps!

    Cc: Benjamin Poirier
    Cc: Paolo Abeni
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
    Signed-off-by: Ian Wienand
    Signed-off-by: David S. Miller

    Ian Wienand
     

03 Jun, 2016

1 commit


17 Apr, 2016

1 commit


19 Mar, 2016

1 commit


02 Mar, 2016

1 commit

  • This patch implements bookkeeping support to compute the maximum
    headroom for all the devices in each datapath. When said value
    changes, the underlying devs are notified via the
    ndo_set_rx_headroom method.

    This also increases the internal vports xmit performance.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Oct, 2015

2 commits

  • With use of lwtunnel, we can directly call dev_queue_xmit()
    rather than calling netdev vport send operation.
    Following change make tunnel vport code bit cleaner.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Acked-by: Jiri Benc
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • "openvswitch: Remove vport stats" removed the per-vport statistics, in
    order to use the netdev's statistics fields.
    "openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats
    to user-space, by using the provided netdev_ops to collate them - but ovs
    internal devices still use an unallocated dev->tstats field to count
    packets, which are no longer exported by this api.

    Allocate the dev->tstats field for ovs internal devices, and wire up
    ndo_get_stats64 with the original implementation of
    ovs_vport_get_stats().

    On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs,
    unmasking a full-on panic on arm64:

    =============%] internal_dev_recv+0xa8/0x170 [openvswitch]
    [] do_output.isra.31+0x60/0x19c [openvswitch]
    [] do_execute_actions+0x208/0x11c0 [openvswitch]
    [] ovs_execute_actions+0xc8/0x238 [openvswitch]
    [] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch]
    [] genl_family_rcv_msg+0x1b0/0x310
    [] genl_rcv_msg+0xa4/0xe4
    [] netlink_rcv_skb+0xb0/0xdc
    [] genl_rcv+0x38/0x50
    [] netlink_unicast+0x164/0x210
    [] netlink_sendmsg+0x304/0x368
    [] sock_sendmsg+0x30/0x4c
    [SNIP]
    Kernel panic - not syncing: Fatal exception in interrupt
    =============%
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    James Morse
     

30 Aug, 2015

1 commit


28 Aug, 2015

1 commit


22 Jul, 2015

2 commits


06 Nov, 2014

1 commit


29 Oct, 2014

1 commit

  • The internal and netdev vport remain part of openvswitch.ko. Encap
    vports including vxlan, gre, and geneve can be built as separate
    modules and are loaded on demand. Modules can be unloaded after use.
    Datapath ports keep a reference to the vport module during their
    lifetime.

    Allows to remove the error prone maintenance of the global list
    vport_ops_list.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

24 Jul, 2014

1 commit


16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

02 Jul, 2014

1 commit


14 May, 2014

1 commit

  • net: get rid of SET_ETHTOOL_OPS

    Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
    This does that.

    Mostly done via coccinelle script:
    @@
    struct ethtool_ops *ops;
    struct net_device *dev;
    @@
    - SET_ETHTOOL_OPS(dev, ops);
    + dev->ethtool_ops = ops;

    Compile tested only, but I'd seriously wonder if this broke anything.

    Suggested-by: Dave Miller
    Signed-off-by: Wilfried Klaebe
    Acked-by: Felipe Balbi
    Signed-off-by: David S. Miller

    Wilfried Klaebe
     

02 Nov, 2013

1 commit


20 Jun, 2013

1 commit


15 Jun, 2013

1 commit


30 Apr, 2013

1 commit


20 Apr, 2013

1 commit


16 Apr, 2013

1 commit

  • Currently OVS uses combination of genl and rtnl lock to protect
    datapath state. This was done due to networking stack locking.
    But this has complicated locking and there are few lock ordering
    issues with new tunneling protocols.
    Following patch simplifies locking by introducing new ovs mutex
    and now this lock is used to protect entire ovs state.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     

18 Mar, 2013

1 commit

  • Conflicts:
    net/openvswitch/vport-internal_dev.c

    Jesse Gross says:

    ====================
    A couple of minor enhancements for net-next/3.10. The largest is an
    extension to allow variable length metadata to be passed to userspace
    with packets.

    There is a merge conflict in net/openvswitch/vport-internal_dev.c:
    A existing commit modifies internal_dev_mac_addr() and a new commit
    deletes it. The new one is correct, so you can just remove that function.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Jan, 2013

1 commit

  • Use strlcpy where possible to ensure the string is \0 terminated.
    Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN
    and custom defines.
    Use snprintf instead of sprint.
    Remove unnecessary inits of ->fw_version
    Remove unnecessary inits of drvinfo struct.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

04 Jan, 2013

1 commit


05 Dec, 2012

1 commit


23 Aug, 2012

1 commit

  • Following patch adds support for network namespace to openvswitch.
    Since it must release devices when namespaces are destroyed, a
    side effect of this patch is that the module no longer keeps a
    refcount but instead cleans up any state when it is unloaded.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     

26 May, 2012

1 commit

  • It's possible that packets that are sent on internal devices (from
    the OVS perspective) have already traversed the local IP stack.
    After they go through the internal device, they will again travel
    through the IP stack which may get confused by the presence of
    existing information in the skb. The problem can be observed
    when switching between namespaces. This clears out that information
    to avoid problems but deliberately leaves other metadata alone.
    This is to provide maximum flexibility in chaining together OVS
    and other Linux components.

    Signed-off-by: Jesse Gross

    Jesse Gross
     

04 May, 2012

1 commit


16 Feb, 2012

1 commit


17 Jan, 2012

1 commit


04 Dec, 2011

1 commit

  • Open vSwitch is a multilayer Ethernet switch targeted at virtualized
    environments. In addition to supporting a variety of features
    expected in a traditional hardware switch, it enables fine-grained
    programmatic extension and flow-based control of the network.
    This control is useful in a wide variety of applications but is
    particularly important in multi-server virtualization deployments,
    which are often characterized by highly dynamic endpoints and the need
    to maintain logical abstractions for multiple tenants.

    The Open vSwitch datapath provides an in-kernel fast path for packet
    forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
    which is able to accept configuration from a variety of sources and
    translate it into packet processing rules.

    See http://openvswitch.org for more information and userspace
    utilities.

    Signed-off-by: Jesse Gross

    Jesse Gross