25 Dec, 2016

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

31 Oct, 2016

1 commit


21 Oct, 2016

2 commits

  • geneve:
    - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
    - This one isn't quite as straight-forward as others, could use some
    closer inspection and testing

    macvlan:
    - set min/max_mtu

    tun:
    - set min/max_mtu, remove tun_net_change_mtu

    vxlan:
    - Merge __vxlan_change_mtu back into vxlan_change_mtu
    - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
    change_mtu function
    - This one is also not as straight-forward and could use closer inspection
    and testing from vxlan folks

    bridge:
    - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
    change_mtu function

    openvswitch:
    - set min/max_mtu, remove internal_dev_change_mtu
    - note: max_mtu wasn't checked previously, it's been set to 65535, which
    is the largest possible size supported

    sch_teql:
    - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

    macsec:
    - min_mtu = 0, max_mtu = 65535

    macvlan:
    - min_mtu = 0, max_mtu = 65535

    ntb_netdev:
    - min_mtu = 0, max_mtu = 65535

    veth:
    - min_mtu = 68, max_mtu = 65535

    8021q:
    - min_mtu = 0, max_mtu = 65535

    CC: netdev@vger.kernel.org
    CC: Nicolas Dichtel
    CC: Hannes Frederic Sowa
    CC: Tom Herbert
    CC: Daniel Borkmann
    CC: Alexander Duyck
    CC: Paolo Abeni
    CC: Jiri Benc
    CC: WANG Cong
    CC: Roopa Prabhu
    CC: Pravin B Shelar
    CC: Sabrina Dubroca
    CC: Patrick McHardy
    CC: Stephen Hemminger
    CC: Pravin Shelar
    CC: Maxim Krasnyansky
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     
  • Currently, GRO can do unlimited recursion through the gro_receive
    handlers. This was fixed for tunneling protocols by limiting tunnel GRO
    to one level with encap_mark, but both VLAN and TEB still have this
    problem. Thus, the kernel is vulnerable to a stack overflow, if we
    receive a packet composed entirely of VLAN headers.

    This patch adds a recursion counter to the GRO layer to prevent stack
    overflow. When a gro_receive function hits the recursion limit, GRO is
    aborted for this skb and it is processed normally. This recursion
    counter is put in the GRO CB, but could be turned into a percpu counter
    if we run out of space in the CB.

    Thanks to Vladimír Beneš for the initial bug report.

    Fixes: CVE-2016-7039
    Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
    Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Jiri Benc
    Acked-by: Hannes Frederic Sowa
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

19 Oct, 2016

1 commit


18 Oct, 2016

1 commit

  • args.u.name_type is of type unsigned int and is always >= 0.

    This fixes the following GCC warning:

    net/8021q/vlan.c: In function ‘vlan_ioctl_handler’:
    net/8021q/vlan.c:574:14: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]

    Signed-off-by: Tobias Klauser
    Signed-off-by: David S. Miller

    Tobias Klauser
     

14 Aug, 2016

1 commit


24 Jul, 2016

1 commit


17 Jul, 2016

1 commit

  • macsec can't cope with mtu frames which need vlan tag insertion, and
    vlan device set the default mtu equal to the underlying dev's one.
    By default vlan over macsec devices use invalid mtu, dropping
    all the large packets.
    This patch adds a netif helper to check if an upper vlan device
    needs mtu reduction. The helper is used during vlan devices
    initialization to set a valid default and during mtu updating to
    forbid invalid, too bit, mtu values.
    The helper currently only check if the lower dev is a macsec device,
    if we get more users, we need to update only the helper (possibly
    reserving an additional IFF bit).

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

06 Jul, 2016

1 commit


01 Jun, 2016

1 commit

  • The MAC address of the physical interface is only copied to the VLAN
    when it is first created, resulting in an inconsistency after MAC
    address changes of only newly created VLANs having an up-to-date MAC.

    The VLANs should continue inheriting the MAC address of the physical
    interface until the VLAN MAC address is explicitly set to any value.
    This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
    to the MAC of the physical interface and thus for DAD to behave as
    expected.

    Signed-off-by: Mike Manning
    Signed-off-by: David S. Miller

    Mike Manning
     

18 Mar, 2016

1 commit


26 Feb, 2016

1 commit


22 Feb, 2016

1 commit

  • Currently vlan device inherits unicast filtering flag from underlying
    device. If underlying device doesn't support unicast filter, this will
    put vlan device into promiscuous mode when it's stacked.

    Tun on IFF_UNICAST_FLT on the vlan device in any case so that it does
    not go into promiscuous mode needlessly. If underlying device does not
    support unicast filtering, that device will enter promiscuous mode.

    Signed-off-by: Zhang Shengju
    Signed-off-by: David S. Miller

    Zhang Shengju
     

18 Feb, 2016

1 commit


16 Dec, 2015

3 commits

  • The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the
    set of features for offloading all checksums. This is a mask of the
    checksum offload related features bits. It is incorrect to set both
    NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for
    features of a device.

    This patch:
    - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where
    NETIF_F_ALL_CSUM is being used as a mask).
    - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to
    use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • The SCTP checksum is really a CRC and is very different from the
    standards 1's complement checksum that serves as the checksum
    for IP protocols. This offload interface is also very different.
    Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these
    differences. The term CSUM should be reserved in the stack to refer
    to the standard 1's complement IP checksum.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • We need to be able to propagate static FDB entries and certain bridge
    port attributes (e.g. learning, flooding) down to the port netdev
    driver when bridge port is a VLAN interface.

    Achieve that by setting ndo_bridge* and ndo_fdb* in vlan_netdev_ops to
    the corresponding switchdev_port* functions. This is consistent with
    team and bond devices.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

18 Nov, 2015

1 commit

  • When a vlan is configured with REORDER_HEADER set to 0, the vlan
    header is put back into the packet and makes it appear that
    the vlan header is still there even after it's been processed.
    This posses a problem for bridge and macvlan ports. The packets
    passed to those device may be forwarded and at the time of the
    forward, vlan headers end up being unexpectedly present.

    With the patch, we make sure that we do not put the vlan header
    back (when REORDER_HEADER is 0) if a bridge or macvlan has
    been configured on top of the vlan device.

    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

04 Nov, 2015

1 commit


19 Aug, 2015

1 commit


02 Jun, 2015

1 commit

  • Currently packets with non-hardware-accelerated vlan cannot be handled
    by GRO. This causes low performance for 802.1ad and stacked vlan, as their
    vlan tags are currently not stripped by hardware.

    This patch adds GRO support for non-hardware-accelerated vlan and
    improves receive performance of them.

    Test Environment:
    vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599)

    Result:

    - Before

    $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 60.00 5233.17

    Rx side CPU usage:
    %usr %sys %irq %soft %idle
    0.27 58.03 0.00 41.70 0.00

    - After

    $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 60.00 7586.85

    Rx side CPU usage:
    %usr %sys %irq %soft %idle
    0.50 25.83 0.00 59.53 14.14

    [ Register VLAN offloads with priority 10 -DaveM ]

    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    Toshiaki Makita
     

14 May, 2015

1 commit

  • Currently vlan notifier handler will try to update all vlans
    for a device when that device comes up. A problem occurs,
    however, when the vlan device was set to promiscuous, but not
    by the user (ex: a bridge). In that case, dev->gflags are
    not updated. What results is that the lower device ends
    up with an extra promiscuity count. Here are the
    backtraces that prove this:
    [62852.052179] [] __dev_set_promiscuity+0x38/0x1e0
    [62852.052186] [] ? _raw_spin_unlock_bh+0x1b/0x40
    [62852.052188] [] ? dev_set_rx_mode+0x2e/0x40
    [62852.052190] [] dev_set_promiscuity+0x24/0x50
    [62852.052194] [] vlan_dev_open+0xd5/0x1f0 [8021q]
    [62852.052196] [] __dev_open+0xbf/0x140
    [62852.052198] [] __dev_change_flags+0x9d/0x170
    [62852.052200] [] dev_change_flags+0x29/0x60

    The above comes from the setting the vlan device to IFF_UP state.

    [62852.053569] [] __dev_set_promiscuity+0x38/0x1e0
    [62852.053571] [] ? vlan_dev_set_rx_mode+0x2b/0x30
    [8021q]
    [62852.053573] [] __dev_change_flags+0xe5/0x170
    [62852.053645] [] dev_change_flags+0x29/0x60
    [62852.053647] [] vlan_device_event+0x18a/0x690
    [8021q]
    [62852.053649] [] notifier_call_chain+0x4c/0x70
    [62852.053651] [] raw_notifier_call_chain+0x16/0x20
    [62852.053653] [] call_netdevice_notifiers+0x2d/0x60
    [62852.053654] [] __dev_notify_flags+0x33/0xa0
    [62852.053656] [] dev_change_flags+0x52/0x60
    [62852.053657] [] do_setlink+0x397/0xa40

    And this one comes from the notification code. What we end
    up with is a vlan with promiscuity count of 1 and and a physical
    device with a promiscuity count of 2. They should both have
    a count 1.

    To resolve this issue, vlan code can use dev_get_flags() api
    which correctly masks promiscuity and allmulti flags.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

03 Apr, 2015

1 commit


30 Mar, 2015

1 commit

  • Stacked vlan devices curretly have few features (GRO, HIGHDMA, LLTX).
    Since we have software fallbacks in case the NIC can not handle some
    features for multiple vlans, we can add the same features as the lower
    vlan devices for stacked vlan devices.

    This allows stacked vlan devices to create large (GSO) packets and not to
    segment packets. Those packets will be segmented by software on the real
    device, or even can be segmented by the NIC once TSO for multiple vlans
    becomes enabled by the following patches.

    The exception is those related to FCoE, which does not have a software
    fallback.

    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    Toshiaki Makita
     

19 Mar, 2015

1 commit

  • When a networking device is taken down that has a non-trivial number
    of VLAN devices configured under it, we eat a full synchronize_net()
    for every such VLAN device.

    This is because of the call chain:

    NETDEV_DOWN notifier
    --> vlan_device_event()
    --> dev_change_flags()
    --> __dev_change_flags()
    --> __dev_close()
    --> __dev_close_many()
    --> dev_deactivate_many()
    --> synchronize_net()

    This is kind of rediculous because we already have infrastructure for
    batching doing operation X to a list of net devices so that we only
    incur one sync.

    So make use of that by exporting dev_close_many() and adjusting it's
    interfaace so that the caller can fully manage the batch list. Use
    this in vlan_device_event() and all the overhead goes away.

    Reported-by: Salam Noureddine
    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2015

1 commit


03 Mar, 2015

1 commit


24 Jan, 2015

1 commit


14 Jan, 2015

1 commit


12 Dec, 2014

1 commit


22 Nov, 2014

2 commits


08 Oct, 2014

1 commit

  • Testing xmit_more support with netperf and connected UDP sockets,
    I found strange dst refcount false sharing.

    Current handling of IFF_XMIT_DST_RELEASE is not optimal.

    Dropping dst in validate_xmit_skb() is certainly too late in case
    packet was queued by cpu X but dequeued by cpu Y

    The logical point to take care of drop/force is in __dev_queue_xmit()
    before even taking qdisc lock.

    As Julian Anastasov pointed out, need for skb_dst() might come from some
    packet schedulers or classifiers.

    This patch adds new helper to cleanly express needs of various drivers
    or qdiscs/classifiers.

    Drivers that need skb_dst() in their ndo_start_xmit() should call
    following helper in their setup instead of the prior :

    dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
    ->
    netif_keep_dst(dev);

    Instead of using a single bit, we use two bits, one being
    eventually rebuilt in bonding/team drivers.

    The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
    rebuilt in bonding/team. Eventually, we could add something
    smarter later.

    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Aug, 2014

1 commit

  • Currently the functionality to untag traffic on input resides
    as part of the vlan module and is build only when VLAN support
    is enabled in the kernel. When VLAN is disabled, the function
    vlan_untag() turns into a stub and doesn't really untag the
    packets. This seems to create an interesting interaction
    between VMs supporting checksum offloading and some network drivers.

    There are some drivers that do not allow the user to change
    tx-vlan-offload feature of the driver. These drivers also seem
    to assume that any VLAN-tagged traffic they transmit will
    have the vlan information in the vlan_tci and not in the vlan
    header already in the skb. When transmitting skbs that already
    have tagged data with partial checksum set, the checksum doesn't
    appear to be updated correctly by the card thus resulting in a
    failure to establish TCP connections.

    The following is a packet trace taken on the receiver where a
    sender is a VM with a VLAN configued. The host VM is running on
    doest not have VLAN support and the outging interface on the
    host is tg3:
    10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
    (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
    offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
    -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
    4294837885 ecr 0,nop,wscale 7], length 0
    10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
    (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
    offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
    -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
    4294838888 ecr 0,nop,wscale 7], length 0

    This connection finally times out.

    I've only access to the TG3 hardware in this configuration thus have
    only tested this with TG3 driver. There are a lot of other drivers
    that do not permit user changes to vlan acceleration features, and
    I don't know if they all suffere from a similar issue.

    The patch attempt to fix this another way. It moves the vlan header
    stipping code out of the vlan module and always builds it into the
    kernel network core. This way, even if vlan is not supported on
    a virtualizatoin host, the virtual machines running on top of such
    host will still work with VLANs enabled.

    CC: Patrick McHardy
    CC: Nithin Nayak Sujir
    CC: Michael Chan
    CC: Jiri Pirko
    Signed-off-by: Vladislav Yasevich
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

30 Jul, 2014

1 commit


17 Jul, 2014

1 commit


16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

08 Jul, 2014

1 commit