16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

17 Feb, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-02-16

    1) Make struct xfrm_input_afinfo const, nothing writes to it.
    From Florian Westphal.

    2) Remove all places that write to the afinfo policy backend
    and make the struct const then.
    From Florian Westphal.

    3) Prepare for packet consuming gro callbacks and add
    ESP GRO handlers. ESP packets can be decapsulated
    at the GRO layer then. It saves a round through
    the stack for each ESP packet.

    Please note that this has a merge coflict between commit

    63fca65d0863 ("net: add confirm_neigh method to dst_ops")

    from net-next and

    3d7d25a68ea5 ("xfrm: policy: remove garbage_collect callback")
    a2817d8b279b ("xfrm: policy: remove family field")

    from ipsec-next.

    The conflict can be solved as it is done in linux-next.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Feb, 2017

1 commit

  • Add a skb_gro_flush_final helper to prepare for consuming
    skbs in call_gro_receive. We will extend this helper to not
    touch the skb if the skb is consumed by a gro callback with
    a followup patch. We need this to handle the upcomming IPsec
    ESP callbacks as they reinject the skb to the napi_gro_receive
    asynchronous. The handler is used in all gro_receive functions
    that can call the ESP gro handlers.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

11 Feb, 2017

1 commit


09 Feb, 2017

1 commit

  • The stack must not pass packets to device drivers that are shorter
    than the minimum link layer header length.

    Previously, packet sockets would drop packets smaller than or equal
    to dev->hard_header_len, but this has false positives. Zero length
    payload is used over Ethernet. Other link layer protocols support
    variable length headers. Support for validation of these protocols
    removed the min length check for all protocols.

    Introduce an explicit dev->min_header_len parameter and drop all
    packets below this value. Initially, set it to non-zero only for
    Ethernet and loopback. Other protocols can follow in a patch to
    net-next.

    Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
    Reported-by: Sowmini Varadhan
    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

30 Jan, 2017

1 commit

  • This patch adds devm_alloc_etherdev_mqs function and devm_alloc_etherdev
    macro. These can be used for simpler netdev allocation without having to
    care about calling free_netdev.

    Thanks to this change drivers, their error paths and removal paths may
    get simpler by a bit.

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     

08 Nov, 2016

1 commit

  • The default TX queue length of Ethernet devices have been a magic
    constant of 1000, ever since the initial git import.

    Looking back in historical trees[1][2] the value used to be 100,
    with the same comment "Ethernet wants good queues". The commit[3]
    that changed this from 100 to 1000 didn't describe why, but from
    conversations with Robert Olsson it seems that it was changed
    when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
    link speed increased x10 the queue size were also adjusted. This
    value later caused much heartache for the bufferbloat community.

    This patch merely moves the value into a defined constant.

    [1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
    [2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
    [3] https://git.kernel.org/tglx/history/c/98921832c232

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

31 Oct, 2016

1 commit


21 Oct, 2016

1 commit

  • Currently, GRO can do unlimited recursion through the gro_receive
    handlers. This was fixed for tunneling protocols by limiting tunnel GRO
    to one level with encap_mark, but both VLAN and TEB still have this
    problem. Thus, the kernel is vulnerable to a stack overflow, if we
    receive a packet composed entirely of VLAN headers.

    This patch adds a recursion counter to the GRO layer to prevent stack
    overflow. When a gro_receive function hits the recursion limit, GRO is
    aborted for this skb and it is processed normally. This recursion
    counter is put in the GRO CB, but could be turned into a percpu counter
    if we run out of space in the CB.

    Thanks to Vladimír Beneš for the initial bug report.

    Fixes: CVE-2016-7039
    Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
    Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Jiri Benc
    Acked-by: Hannes Frederic Sowa
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

13 Oct, 2016

1 commit

  • With centralized MTU checking, there's nothing productive done by
    eth_change_mtu that isn't already done in dev_set_mtu, so mark it as
    deprecated and remove all usage of it in the kernel. All callers have been
    audited for calls to alloc_etherdev* or ether_setup directly, which means
    they all have a valid dev->min_mtu and dev->max_mtu. Now eth_change_mtu
    prints out a netdev_warn about being deprecated, for the benefit of
    out-of-tree drivers that might be utilizing it.

    Of note, dvb_net.c actually had dev->mtu = 4096, while using
    eth_change_mtu, meaning that if you ever tried changing it's mtu, you
    couldn't set it above 1500 anymore. It's now getting dev->max_mtu also set
    to 4096 to remedy that.

    v2: fix up lantiq_etop, missed breakage due to drive not compiling on x86

    CC: netdev@vger.kernel.org
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     

25 Feb, 2016

1 commit


07 Jan, 2016

1 commit

  • A repeating pattern in drivers has become to use OF node information
    and, if not found, platform specific host information to extract the
    ethernet address for a given device.

    Currently this is done with a call to of_get_mac_address() and then
    some ifdef'd stuff for SPARC.

    Consolidate this into a portable routine, and provide the
    arch_get_platform_mac_address() weak function hook for all
    architectures to implement if they want.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Sep, 2015

1 commit

  • Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC))
    generated suboptimal assembler code in eth_get_headlen().

    This early return coding style is usually not an issue, on super scalar CPUs,
    but the compiler choose to put the return statement after this very unlikely
    branch, thus creating larger jump down to the likely code path.

    Performance wise, I could measure slightly less L1-icache-load-misses
    and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
    use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

02 Sep, 2015

1 commit


10 Aug, 2015

1 commit

  • This patch fix double word "the the" in
    Documentation/DocBook/networking/API-eth-get-headlen.html
    Documentation/DocBook/networking/netdev.html
    Documentation/DocBook/networking.xml

    These files are generated from comment in source,
    so I have to fix comment in net/ethernet/eth.c.

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     

05 Jun, 2015

1 commit

  • This patch adds full IPv6 addresses into flow_keys and uses them as
    input to the flow hash function. The implementation supports either
    IPv4 or IPv6 addresses in a union, and selector is used to determine
    how may words to input to jhash2.

    We also add flow_get_u32_dst and flow_get_u32_src functions which are
    used to get a u32 representation of the source and destination
    addresses. For IPv6, ipv6_addr_hash is called. These functions retain
    getting the legacy values of src and dst in flow_keys.

    With this patch, Ethertype and IP protocol are now included in the
    flow hash input.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

02 Jun, 2015

1 commit

  • When we scan a packet for GRO processing, we want to see the most
    common packet types in the front of the offload_base list.

    So add a priority field so we can handle this properly.

    IPv4/IPv6 get the highest priority with the implicit zero priority
    field.

    Next comes ethernet with a priority of 10, and then we have the MPLS
    types with a priority of 15.

    Suggested-by: Eric Dumazet
    Suggested-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    David S. Miller
     

14 May, 2015

2 commits


06 May, 2015

1 commit

  • This change does two things. First it fixes a sparse error for the fact
    that the __be16 degrades to an integer. Since that is actually what I am
    kind of doing I am simply working around that by forcing both sides of the
    comparison to u16.

    Also I realized on some compilers I was generating another instruction for
    big endian systems such as PowerPC since it was masking the value before
    doing the comparison. So to resolve that I have simply pulled the mask out
    and wrapped it in an #ifndef __BIG_ENDIAN.

    Lastly I pulled this all out into its own function. I notices there are
    similar checks in a number of other places so this function can be reused
    there to help reduce overhead in these paths as well.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

04 May, 2015

3 commits

  • Avoid recomputing the Ethernet header location and instead just use the
    pointer provided by skb->data. The problem with using eth_hdr is that the
    compiler wasn't smart enough to realize that skb->head + skb->mac_header
    was the same thing as skb->data before it added ETH_HLEN. By just caching
    it off before calling skb_pull_inline we can avoid a few unnecessary
    instructions.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This change makes it so that we process the address in
    is_multicast_ether_addr at the same size as the other calls. This allows
    us to avoid duplicate reads when used with other calls such as
    is_zero_ether_addr or eth_addr_copy. In addition I have added a 64 bit
    version of the function so in eth_type_trans we can process the destination
    address as a 64 bit value throughout.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to
    512 so as a result we can actually ignore the lower 8b when comparing the
    Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply
    masking the value and comparing it to the byte swapped value for
    ETH_P_802_3_MIN.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

04 Mar, 2015

1 commit


03 Mar, 2015

1 commit


03 Jan, 2015

1 commit


06 Sep, 2014

1 commit

  • This patch updates some of the flow_dissector api so that it can be used to
    parse the length of ethernet buffers stored in fragments. Most of the
    changes needed were to __skb_get_poff as it needed to be updated to support
    sending a linear buffer instead of a skb.

    I have split __skb_get_poff into two functions, the first is skb_get_poff
    and it retains the functionality of the original __skb_get_poff. The other
    function is __skb_get_poff which now works much like __skb_flow_dissect in
    relation to skb_flow_dissect in that it provides the same functionality but
    works with just a data buffer and hlen instead of needing an skb.

    Signed-off-by: Alexander Duyck
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexander Duyck
     

28 Aug, 2014

1 commit

  • DSA is currently registering one packet_type function per EtherType it
    needs to intercept in the receive path of a DSA-enabled Ethernet device.
    Right now we have three of them: trailer, DSA and eDSA, and there might
    be more in the future, this will not scale to the addition of new
    protocols.

    This patch proceeds with adding a new layer of abstraction and two new
    functions:

    dsa_switch_rcv() which will dispatch into the tag-protocol specific
    receive function implemented by net/dsa/tag_*.c

    dsa_slave_xmit() which will dispatch into the tag-protocol specific
    transmit function implemented by net/dsa/tag_*.c

    When we do create the per-port slave network devices, we iterate over
    the switch protocol to assign the DSA-specific receive and transmit
    operations.

    A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact
    that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER
    like it used to be.

    This allows us to greatly simplify the check in eth_type_trans() and
    always override the skb->protocol with ETH_P_XDSA for Ethernet switches
    tagged protocol, while also reducing the number repetitive slave
    netdevice_ops assignments.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     

16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

17 Jan, 2014

1 commit

  • eth_type_trans() can read uninitialized memory as drivers
    do not necessarily pull more than 14 bytes in skb->head before
    calling it.

    As David suggested, we can use skb_header_pointer() to
    fix this without breaking some drivers that might not expect
    eth_type_trans() pulling 2 additional bytes.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2013

2 commits


21 Sep, 2013

1 commit

  • removed these checkpatch.pl warnings:
    net/ethernet/eth.c:61: WARNING: Use #include instead of
    net/ethernet/eth.c:136: WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ...
    net/ethernet/eth.c:181: ERROR: space prohibited before that close parenthesis ')'

    Signed-off-by: Avinash Kumar
    Signed-off-by: David S. Miller

    Avinash Kumar
     

17 Jul, 2013

1 commit


28 Mar, 2013

1 commit

  • Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
    an 802.3 frame. Frames with a lower value in the ethernet type field
    are Ethernet II.

    Also update all the users of this value that David Miller and
    I could find to use the new constant.

    Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
    should be >= not >.

    As suggested by Jesse Gross.

    Compile tested only.

    Cc: David Miller
    Cc: Jesse Gross
    Cc: Karsten Keil
    Cc: John W. Linville
    Cc: Johannes Berg
    Cc: Bart De Schuymer
    Cc: Stephen Hemminger
    Cc: Patrick McHardy
    Cc: Marcel Holtmann
    Cc: Gustavo Padovan
    Cc: Johan Hedberg
    Cc: linux-bluetooth@vger.kernel.org
    Cc: netfilter-devel@vger.kernel.org
    Cc: bridge@lists.linux-foundation.org
    Cc: linux-wireless@vger.kernel.org
    Cc: linux1394-devel@lists.sourceforge.net
    Cc: linux-media@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: dev@openvswitch.org
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Stefan Richter
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

22 Jan, 2013

1 commit

  • When we set mac address, software mac address in system and hardware mac
    address all need to be updated. Current eth_mac_addr() doesn't allow
    callers to implement error handling nicely.

    This patch split eth_mac_addr() to prepare part and real commit part,
    then we can prepare first, and try to change hardware address, then do
    the real commit if hardware address is set successfully.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Amos Kong
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

04 Jan, 2013

1 commit


20 Jul, 2012

1 commit

  • The Ethernet II wrapper is only used by IPX protocol, may have once
    been used by Appletalk but not currently. Therefore it makes sense to
    move it to the IPX dust bin and drop the exports.

    Build tested only.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

11 Jul, 2012

1 commit


30 Jun, 2012

1 commit