14 Dec, 2020

1 commit

  • ethernet controller driver call .of_get_mac_address() to get
    the mac address from devictree tree, if these properties are
    not present, then try to read from nvmem.

    For example, read MAC address from nvmem:
    of_get_mac_address()
    of_get_mac_addr_nvmem()
    nvmem_get_mac_address()

    i.MX6x/7D/8MQ/8MM platforms ethernet MAC address read from
    nvmem ocotp eFuses, but it requires to swap the six bytes
    order.

    The patch add optional property "nvmem_macaddr_swap" to swap
    macaddr bytes order.

    Signed-off-by: Fugang Duan

    Fugang Duan
     

24 May, 2020

1 commit

  • There's currently only a single devres helper in net/ - devm variant
    of alloc_etherdev. Let's move it to net/devres.c with the intention of
    assing a second one: devm_register_netdev(). This new routine will need
    to know the address of the release function of devm_alloc_etherdev() so
    that it can verify (using devres_find()) that the struct net_device
    that's being passed to it is also resource managed.

    Signed-off-by: Bartosz Golaszewski
    Signed-off-by: David S. Miller

    Bartosz Golaszewski
     

27 Jan, 2020

1 commit

  • All usage of this function was removed three years ago, and the
    function was marked as deprecated:
    a52ad514fdf3 ("net: deprecate eth_change_mtu, remove usage")
    So I think we can remove it now.

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     

08 Nov, 2019

1 commit

  • KCSAN reported a data-race [1]

    While we can use READ_ONCE() on the read sides,
    we need to make sure hh->hh_len is written last.

    [1]

    BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output

    write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
    eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
    neigh_hh_init net/core/neighbour.c:1463 [inline]
    neigh_resolve_output net/core/neighbour.c:1480 [inline]
    neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
    neigh_output include/net/neighbour.h:511 [inline]
    ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
    __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
    __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
    ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:294 [inline]
    ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
    dst_output include/net/dst.h:436 [inline]
    NF_HOOK include/linux/netfilter.h:305 [inline]
    ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
    ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
    rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
    neigh_resolve_output net/core/neighbour.c:1479 [inline]
    neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
    neigh_output include/net/neighbour.h:511 [inline]
    ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
    __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
    __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
    ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:294 [inline]
    ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
    dst_output include/net/dst.h:436 [inline]
    NF_HOOK include/linux/netfilter.h:305 [inline]
    ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
    ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
    rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: events rt6_probe_deferred

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Jun, 2019

1 commit


03 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


08 May, 2019

1 commit

  • There was NVMEM support added to of_get_mac_address, so it could now
    return ERR_PTR encoded error values, so we need to adjust all current
    users of of_get_mac_address to this new fact.

    While at it, remove superfluous is_valid_ether_addr as the MAC address
    returned from of_get_mac_address is always valid and checked by
    is_valid_ether_addr anyway.

    Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address")
    Signed-off-by: Petr Štetiar
    Signed-off-by: David S. Miller

    Petr Štetiar
     

06 May, 2019

1 commit

  • Frames get processed by DSA and redirected to switch port net devices
    based on the ETH_P_XDSA multiplexed packet_type handler found by the
    network stack when calling eth_type_trans().

    The running assumption is that once the DSA .rcv function is called, DSA
    is always able to decode the switch tag in order to change the skb->dev
    from its master.

    However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105,
    user of DSA_TAG_PROTO_8021Q) where this assumption is not completely
    true, since switch tagging piggybacks on the absence of a vlan_filtering
    bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't
    rely on switch tagging, but on a different mechanism. So it would make
    sense to at least be able to terminate that.

    Having DSA receive traffic it can't decode would put it in an impossible
    situation: the eth_type_trans() function would invoke the DSA .rcv(),
    which could not change skb->dev, then eth_type_trans() would be invoked
    again, which again would call the DSA .rcv, and the packet would never
    be able to exit the DSA filter and would spiral in a loop until the
    whole system dies.

    This happens because eth_type_trans() doesn't actually look at the skb
    (so as to identify a potential tag) when it deems it as being
    ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer
    installed (therefore it's a DSA master) and that there exists a .rcv
    callback (everybody except DSA_TAG_PROTO_NONE has that). This is
    understandable as there are many switch tags out there, and exhaustively
    checking for all of them is far from ideal.

    The solution lies in introducing a filtering function for each tagging
    protocol. In the absence of a filtering function, all traffic is passed
    to the .rcv DSA callback. The tagging protocol should see the filtering
    function as a pre-validation that it can decode the incoming skb. The
    traffic that doesn't match the filter will bypass the DSA .rcv callback
    and be left on the master netdevice, which wasn't previously possible.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

24 Apr, 2019

2 commits

  • Update all users of eth_get_headlen to pass network device, fetch
    network namespace from it and pass it down to the flow dissector.
    This commit is a noop until administrator inserts BPF flow dissector
    program.

    Cc: Maxim Krasnyansky
    Cc: Saeed Mahameed
    Cc: Jeff Kirsher
    Cc: intel-wired-lan@lists.osuosl.org
    Cc: Yisen Zhuang
    Cc: Salil Mehta
    Cc: Michael Chan
    Cc: Igor Russkikh
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     
  • This new argument will be used in the next patches for the
    eth_get_headlen use case. eth_get_headlen calls flow dissector
    with only data (without skb) so there is currently no way to
    pull attached BPF flow dissector program. With this new argument,
    we can amend the callers to explicitly pass network namespace
    so we can use attached BPF program.

    Signed-off-by: Stanislav Fomichev
    Reviewed-by: Saeed Mahameed
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

23 Feb, 2019

1 commit


04 Dec, 2018

1 commit


16 Nov, 2018

1 commit

  • netperf udp stream shows that eth_type_trans takes certain cpu,
    so adjust the mac address check order, and firstly check if it
    is device address, and only check if it is multicast address
    only if not the device address.

    After this change:
    To unicast, and skb dst mac is device mac, this is most of time
    reduce a comparision
    To unicast, and skb dst mac is not device mac, nothing change
    To multicast, increase a comparision

    Before:
    1.03% [kernel] [k] eth_type_trans

    After:
    0.78% [kernel] [k] eth_type_trans

    Signed-off-by: Zhang Yu
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     

26 Jun, 2018

1 commit

  • Manage pending per-NAPI GRO packets via list_head.

    Return an SKB pointer from the GRO receive handlers. When GRO receive
    handlers return non-NULL, it means that this SKB needs to be completed
    at this time and removed from the NAPI queue.

    Several operations are greatly simplified by this transformation,
    especially timing out the oldest SKB in the list when gro_count
    exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
    in reverse order.

    Signed-off-by: David S. Miller

    David Miller
     

08 May, 2018

1 commit

  • When the core networking needs to detect the transport offset in a given
    packet and parse it explicitly, a full-blown flow_keys struct is used for
    storage.
    This patch introduces a smaller keys store, rework the basic flow dissect
    helper to use it, and apply this new helper where possible - namely in
    skb_probe_transport_header(). The used flow dissector data structures
    are renamed to match more closely the new role.

    The above gives ~50% performance improvement in micro benchmarking around
    skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
    to the smaller memset. Small, but measurable improvement is measured also
    in macro benchmarking.

    v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
    as per DaveM suggestion

    Suggested-by: David Miller
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

17 Feb, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-02-16

    1) Make struct xfrm_input_afinfo const, nothing writes to it.
    From Florian Westphal.

    2) Remove all places that write to the afinfo policy backend
    and make the struct const then.
    From Florian Westphal.

    3) Prepare for packet consuming gro callbacks and add
    ESP GRO handlers. ESP packets can be decapsulated
    at the GRO layer then. It saves a round through
    the stack for each ESP packet.

    Please note that this has a merge coflict between commit

    63fca65d0863 ("net: add confirm_neigh method to dst_ops")

    from net-next and

    3d7d25a68ea5 ("xfrm: policy: remove garbage_collect callback")
    a2817d8b279b ("xfrm: policy: remove family field")

    from ipsec-next.

    The conflict can be solved as it is done in linux-next.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Feb, 2017

1 commit

  • Add a skb_gro_flush_final helper to prepare for consuming
    skbs in call_gro_receive. We will extend this helper to not
    touch the skb if the skb is consumed by a gro callback with
    a followup patch. We need this to handle the upcomming IPsec
    ESP callbacks as they reinject the skb to the napi_gro_receive
    asynchronous. The handler is used in all gro_receive functions
    that can call the ESP gro handlers.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

11 Feb, 2017

1 commit


09 Feb, 2017

1 commit

  • The stack must not pass packets to device drivers that are shorter
    than the minimum link layer header length.

    Previously, packet sockets would drop packets smaller than or equal
    to dev->hard_header_len, but this has false positives. Zero length
    payload is used over Ethernet. Other link layer protocols support
    variable length headers. Support for validation of these protocols
    removed the min length check for all protocols.

    Introduce an explicit dev->min_header_len parameter and drop all
    packets below this value. Initially, set it to non-zero only for
    Ethernet and loopback. Other protocols can follow in a patch to
    net-next.

    Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
    Reported-by: Sowmini Varadhan
    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

30 Jan, 2017

1 commit

  • This patch adds devm_alloc_etherdev_mqs function and devm_alloc_etherdev
    macro. These can be used for simpler netdev allocation without having to
    care about calling free_netdev.

    Thanks to this change drivers, their error paths and removal paths may
    get simpler by a bit.

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     

08 Nov, 2016

1 commit

  • The default TX queue length of Ethernet devices have been a magic
    constant of 1000, ever since the initial git import.

    Looking back in historical trees[1][2] the value used to be 100,
    with the same comment "Ethernet wants good queues". The commit[3]
    that changed this from 100 to 1000 didn't describe why, but from
    conversations with Robert Olsson it seems that it was changed
    when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
    link speed increased x10 the queue size were also adjusted. This
    value later caused much heartache for the bufferbloat community.

    This patch merely moves the value into a defined constant.

    [1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
    [2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
    [3] https://git.kernel.org/tglx/history/c/98921832c232

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

31 Oct, 2016

1 commit


21 Oct, 2016

1 commit

  • Currently, GRO can do unlimited recursion through the gro_receive
    handlers. This was fixed for tunneling protocols by limiting tunnel GRO
    to one level with encap_mark, but both VLAN and TEB still have this
    problem. Thus, the kernel is vulnerable to a stack overflow, if we
    receive a packet composed entirely of VLAN headers.

    This patch adds a recursion counter to the GRO layer to prevent stack
    overflow. When a gro_receive function hits the recursion limit, GRO is
    aborted for this skb and it is processed normally. This recursion
    counter is put in the GRO CB, but could be turned into a percpu counter
    if we run out of space in the CB.

    Thanks to Vladimír Beneš for the initial bug report.

    Fixes: CVE-2016-7039
    Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
    Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Jiri Benc
    Acked-by: Hannes Frederic Sowa
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

13 Oct, 2016

1 commit

  • With centralized MTU checking, there's nothing productive done by
    eth_change_mtu that isn't already done in dev_set_mtu, so mark it as
    deprecated and remove all usage of it in the kernel. All callers have been
    audited for calls to alloc_etherdev* or ether_setup directly, which means
    they all have a valid dev->min_mtu and dev->max_mtu. Now eth_change_mtu
    prints out a netdev_warn about being deprecated, for the benefit of
    out-of-tree drivers that might be utilizing it.

    Of note, dvb_net.c actually had dev->mtu = 4096, while using
    eth_change_mtu, meaning that if you ever tried changing it's mtu, you
    couldn't set it above 1500 anymore. It's now getting dev->max_mtu also set
    to 4096 to remedy that.

    v2: fix up lantiq_etop, missed breakage due to drive not compiling on x86

    CC: netdev@vger.kernel.org
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     

25 Feb, 2016

1 commit


07 Jan, 2016

1 commit

  • A repeating pattern in drivers has become to use OF node information
    and, if not found, platform specific host information to extract the
    ethernet address for a given device.

    Currently this is done with a call to of_get_mac_address() and then
    some ifdef'd stuff for SPARC.

    Consolidate this into a portable routine, and provide the
    arch_get_platform_mac_address() weak function hook for all
    architectures to implement if they want.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Sep, 2015

1 commit

  • Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC))
    generated suboptimal assembler code in eth_get_headlen().

    This early return coding style is usually not an issue, on super scalar CPUs,
    but the compiler choose to put the return statement after this very unlikely
    branch, thus creating larger jump down to the likely code path.

    Performance wise, I could measure slightly less L1-icache-load-misses
    and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
    use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

02 Sep, 2015

1 commit


10 Aug, 2015

1 commit

  • This patch fix double word "the the" in
    Documentation/DocBook/networking/API-eth-get-headlen.html
    Documentation/DocBook/networking/netdev.html
    Documentation/DocBook/networking.xml

    These files are generated from comment in source,
    so I have to fix comment in net/ethernet/eth.c.

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     

05 Jun, 2015

1 commit

  • This patch adds full IPv6 addresses into flow_keys and uses them as
    input to the flow hash function. The implementation supports either
    IPv4 or IPv6 addresses in a union, and selector is used to determine
    how may words to input to jhash2.

    We also add flow_get_u32_dst and flow_get_u32_src functions which are
    used to get a u32 representation of the source and destination
    addresses. For IPv6, ipv6_addr_hash is called. These functions retain
    getting the legacy values of src and dst in flow_keys.

    With this patch, Ethertype and IP protocol are now included in the
    flow hash input.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

02 Jun, 2015

1 commit

  • When we scan a packet for GRO processing, we want to see the most
    common packet types in the front of the offload_base list.

    So add a priority field so we can handle this properly.

    IPv4/IPv6 get the highest priority with the implicit zero priority
    field.

    Next comes ethernet with a priority of 10, and then we have the MPLS
    types with a priority of 15.

    Suggested-by: Eric Dumazet
    Suggested-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    David S. Miller
     

14 May, 2015

2 commits


06 May, 2015

1 commit

  • This change does two things. First it fixes a sparse error for the fact
    that the __be16 degrades to an integer. Since that is actually what I am
    kind of doing I am simply working around that by forcing both sides of the
    comparison to u16.

    Also I realized on some compilers I was generating another instruction for
    big endian systems such as PowerPC since it was masking the value before
    doing the comparison. So to resolve that I have simply pulled the mask out
    and wrapped it in an #ifndef __BIG_ENDIAN.

    Lastly I pulled this all out into its own function. I notices there are
    similar checks in a number of other places so this function can be reused
    there to help reduce overhead in these paths as well.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

04 May, 2015

3 commits

  • Avoid recomputing the Ethernet header location and instead just use the
    pointer provided by skb->data. The problem with using eth_hdr is that the
    compiler wasn't smart enough to realize that skb->head + skb->mac_header
    was the same thing as skb->data before it added ETH_HLEN. By just caching
    it off before calling skb_pull_inline we can avoid a few unnecessary
    instructions.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This change makes it so that we process the address in
    is_multicast_ether_addr at the same size as the other calls. This allows
    us to avoid duplicate reads when used with other calls such as
    is_zero_ether_addr or eth_addr_copy. In addition I have added a 64 bit
    version of the function so in eth_type_trans we can process the destination
    address as a 64 bit value throughout.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to
    512 so as a result we can actually ignore the lower 8b when comparing the
    Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply
    masking the value and comparing it to the byte swapped value for
    ETH_P_802_3_MIN.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck