08 Nov, 2019

3 commits


04 Jul, 2019

1 commit

  • dev_init_scheduler() and dev_activate() expect the caller to
    hold RTNL. Since we don't want blackhole device to be initialized
    per ns, we are initializing at init.

    [ 3.855027] Call Trace:
    [ 3.855034] dump_stack+0x67/0x95
    [ 3.855037] lockdep_rcu_suspicious+0xd5/0x110
    [ 3.855044] dev_init_scheduler+0xe3/0x120
    [ 3.855048] ? net_olddevs_init+0x60/0x60
    [ 3.855050] blackhole_netdev_init+0x45/0x6e
    [ 3.855052] do_one_initcall+0x6c/0x2fa
    [ 3.855058] ? rcu_read_lock_sched_held+0x8c/0xa0
    [ 3.855066] kernel_init_freeable+0x1e5/0x288
    [ 3.855071] ? rest_init+0x260/0x260
    [ 3.855074] kernel_init+0xf/0x180
    [ 3.855076] ? rest_init+0x260/0x260
    [ 3.855078] ret_from_fork+0x24/0x30

    Fixes: 4de83b88c66 ("loopback: create blackhole net device similar to loopack.")
    Reported-by: Geert Uytterhoeven
    Cc: Eric Dumazet
    Signed-off-by: Mahesh Bandewar
    Tested-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

02 Jul, 2019

1 commit

  • Create a blackhole net device that can be used for "dead"
    dst entries instead of loopback device. This blackhole device differs
    from loopback in few aspects: (a) It's not per-ns. (b) MTU on this
    device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
    and (d) since it's not registered it won't have ifindex.

    Lower MTU effectively make the device not pass the MTU check during
    the route check when a dst associated with the skb is dead.

    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

13 Apr, 2019

1 commit


20 Oct, 2018

1 commit

  • At least UDP / TCP stacks can now cook skbs with a tstamp using
    MONOTONIC base (or arbitrary values with SCM_TXTIME)

    Since loopback driver does not call (directly or indirectly)
    skb_scrub_packet(), we need to clear skb->tstamp so that
    net_timestamp_check() can eventually resample the time,
    using ktime_get_real().

    Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
    Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Cc: Soheil Hassas Yeganeh
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Sep, 2018

1 commit


08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Mar, 2017

2 commits

  • Following checkpatch.pl recommendations (which include
    replacing with the , since linux/io.h includes
    it).

    Signed-off-by: Ezequiel Lara Gomez
    Signed-off-by: David S. Miller

    Ezequiel Lara Gomez
     
  • This enables developing code that uses SOF_TIMESTAMPING_TX_SOFTWARE
    by using localhost addresses (without needing to send packets outside),
    as well as enabling unit and functional testing of TX timestamping code
    without needing hardware support or network access.

    It also fulfills the expectation of software network devices supporting
    software-based timestamping.

    Tested on qemu using txtimestamping.c from the kernel selftests, and
    ethtool -T.

    Signed-off-by: Ezequiel Lara Gomez
    Signed-off-by: David S. Miller

    Ezequiel Lara Gomez
     

11 Feb, 2017

1 commit


09 Feb, 2017

1 commit

  • The stack must not pass packets to device drivers that are shorter
    than the minimum link layer header length.

    Previously, packet sockets would drop packets smaller than or equal
    to dev->hard_header_len, but this has false positives. Zero length
    payload is used over Ethernet. Other link layer protocols support
    variable length headers. Support for validation of these protocols
    removed the min length check for all protocols.

    Introduce an explicit dev->min_header_len parameter and drop all
    packets below this value. Initially, set it to non-zero only for
    Ethernet and loopback. Other protocols can follow in a patch to
    net-next.

    Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
    Reported-by: Sowmini Varadhan
    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

09 Jan, 2017

1 commit

  • The network device operation for reading statistics is only called
    in one place, and it ignores the return value. Having a structure
    return value is potentially confusing because some future driver could
    incorrectly assume that the return value was used.

    Fix all drivers with ndo_get_stats64 to have a void function.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

25 Dec, 2016

1 commit


04 Jun, 2016

1 commit

  • NETIF_F_GSO_SOFTWARE was defined to list all GSO software types, so lets
    make use of it in loopback code. Note that veth/vxlan/others already
    uses it.

    Within this patch series, this patch causes lo to pick up SCTP GSO feature
    automatically (as it's added to NETIF_F_GSO_SOFTWARE) and thus avoiding
    segmentation if possible.

    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Xin Long
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

16 Dec, 2015

1 commit

  • The SCTP checksum is really a CRC and is very different from the
    standards 1's complement checksum that serves as the checksum
    for IP protocols. This offload interface is also very different.
    Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these
    differences. The term CSUM should be reserved in the stack to refer
    to the standard 1's complement IP checksum.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

19 Aug, 2015

1 commit


08 Oct, 2014

1 commit

  • Testing xmit_more support with netperf and connected UDP sockets,
    I found strange dst refcount false sharing.

    Current handling of IFF_XMIT_DST_RELEASE is not optimal.

    Dropping dst in validate_xmit_skb() is certainly too late in case
    packet was queued by cpu X but dequeued by cpu Y

    The logical point to take care of drop/force is in __dev_queue_xmit()
    before even taking qdisc lock.

    As Julian Anastasov pointed out, need for skb_dst() might come from some
    packet schedulers or classifiers.

    This patch adds new helper to cleanly express needs of various drivers
    or qdiscs/classifiers.

    Drivers that need skb_dst() in their ndo_start_xmit() should call
    following helper in their setup instead of the prior :

    dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
    ->
    netif_keep_dst(dev);

    Instead of using a single bit, we use two bits, one being
    eventually rebuilt in bonding/team drivers.

    The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
    rebuilt in bonding/team. Eventually, we could add something
    smarter later.

    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

15 Mar, 2014

1 commit

  • Replace the bh safe variant with the hard irq safe variant.

    We need a hard irq safe variant to deal with netpoll transmitting
    packets from hard irq context, and we need it in most if not all of
    the places using the bh safe variant.

    Except on 32bit uni-processor the code is exactly the same so don't
    bother with a bh variant, just have a hard irq safe variant that
    everyone can use.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

25 Feb, 2014

1 commit

  • Drivers are allowed to set NETIF_F_SCTP_CSUM if they have
    hardware crc32c checksumming support for the SCTP protocol.
    Currently, NETIF_F_SCTP_CSUM flag is available in igb,
    ixgbe, i40e/i40evf drivers and for vlan devices.

    If we don't have NETIF_F_SCTP_CSUM then crc32c is done
    through CPU instructions, invoked from crypto layer, or
    if not available as slow-path fallback in software.

    Currently, loopback device propagates checksum offloading
    feature flags in dev->features, but is missing SCTP checksum
    offloading. Therefore, account for NETIF_F_SCTP_CSUM as
    well.

    Before patch:

    ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
    SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    4194304 4194304 4096 10.00 4683.50

    After patch:

    ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
    SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    4194304 4194304 4096 10.00 15348.26

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Feb, 2014

1 commit


14 Feb, 2014

1 commit

  • We are trying to mirror the local traffic from lo to eth0,
    allowing setting mac address of lo to eth0 would make
    the ether addresses in these packets correct, so that
    we don't have to modify the ether header again.

    Since usually no one cares about its mac address (all-zero),
    it is safe to allow those who care to set its mac address.

    Cc: Hannes Frederic Sowa
    Cc: Neil Horman
    Cc: Stephen Hemminger
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    WANG Cong
     

17 Jan, 2014

1 commit

  • None of these files are actually using any __init type directives
    and hence don't need to include . Most are just a
    left over from __devinit and __cpuinit removal, or simply due to
    code getting copied from one driver to the next.

    This covers everything under drivers/net except for wireless, which
    has been submitted separately.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Paul Gortmaker
     

06 Nov, 2013

1 commit

  • In order to enable lockdep on seqcount/seqlock structures, we
    must explicitly initialize any locks.

    The u64_stats_sync structure, uses a seqcount, and thus we need
    to introduce a u64_stats_init() function and use it to initialize
    the structure.

    This unfortunately adds a lot of fairly trivial initialization code
    to a number of drivers. But the benefit of ensuring correctness makes
    this worth while.

    Because these changes are required for lockdep to be enabled, and the
    changes are quite trivial, I've not yet split this patch out into 30-some
    separate patches, as I figured it would be better to get the various
    maintainers thoughts on how to best merge this change along with
    the seqcount lockdep enablement.

    Feedback would be appreciated!

    Signed-off-by: John Stultz
    Acked-by: Julian Anastasov
    Signed-off-by: Peter Zijlstra
    Cc: Alexey Kuznetsov
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: Hideaki YOSHIFUJI
    Cc: James Morris
    Cc: Jesse Gross
    Cc: Mathieu Desnoyers
    Cc: "Michael S. Tsirkin"
    Cc: Mirko Lindner
    Cc: Patrick McHardy
    Cc: Roger Luethi
    Cc: Rusty Russell
    Cc: Simon Horman
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Thomas Petazzoni
    Cc: Wensong Zhang
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     

18 Sep, 2013

1 commit

  • It has recently turned up that we have a number of long standing bugs
    in the network stack cleanup code with use of the loopback device
    after it has been freed that have not turned up because in most cases
    the storage allocated to the loopback device is not reused, when those
    accesses happen.

    Set looback_dev to NULL to trigger oopses instead of silent data corrupt
    when we hit this class of bug.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

27 Jan, 2013

1 commit

  • Ben Greear reported crashes in ip_rcv_finish() on a stress
    test involving many macvlans.

    We tracked the bug to a dst use after free. ip_rcv_finish()
    was calling dst->input() and got garbage for dst->input value.

    It appears the bug is in loopback driver, lacking
    a skb_dst_force() before calling netif_rx().

    As a result, a non refcounted dst, normally protected by a
    RCU read_lock section, was escaping this section and could
    be freed before the packet being processed.

    [] loopback_xmit+0x64/0x83
    [] dev_hard_start_xmit+0x26c/0x35e
    [] dev_queue_xmit+0x2c4/0x37c
    [] ? dev_hard_start_xmit+0x35e/0x35e
    [] ? eth_header+0x28/0xb6
    [] neigh_resolve_output+0x176/0x1a7
    [] ip_finish_output2+0x297/0x30d
    [] ? ip_finish_output2+0x137/0x30d
    [] ip_finish_output+0x63/0x68
    [] ip_output+0x61/0x67
    [] dst_output+0x17/0x1b
    [] ip_local_out+0x1e/0x23
    [] ip_queue_xmit+0x315/0x353
    [] ? ip_send_unicast_reply+0x2cc/0x2cc
    [] tcp_transmit_skb+0x7ca/0x80b
    [] tcp_connect+0x53c/0x587
    [] ? getnstimeofday+0x44/0x7d
    [] ? ktime_get_real+0x11/0x3e
    [] tcp_v4_connect+0x3c2/0x431
    [] __inet_stream_connect+0x84/0x287
    [] ? inet_stream_connect+0x22/0x49
    [] ? _local_bh_enable_ip+0x84/0x9f
    [] ? local_bh_enable+0xd/0x11
    [] ? lock_sock_nested+0x6e/0x79
    [] ? inet_stream_connect+0x22/0x49
    [] inet_stream_connect+0x33/0x49
    [] sys_connect+0x75/0x98

    This bug was introduced in linux-2.6.35, in commit
    7fee226ad2397b (net: add a noref bit on skb dst)

    skb_dst_force() is enforced in dev_queue_xmit() for devices having a
    qdisc.

    Reported-by: Ben Greear
    Signed-off-by: Eric Dumazet
    Tested-by: Ben Greear
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Sep, 2012

1 commit

  • loopback current mtu of 16436 bytes allows no more than 3 MSS TCP
    segments per frame, or 48 Kbytes. Changing mtu to 64K allows TCP
    stack to build large frames and significantly reduces stack overhead.

    Performance boost on bulk TCP transferts can be up to 30 %, partly
    because we now have one ACK message for two 64KB segments, and a lower
    probability of hitting /proc/sys/net/ipv4/tcp_reordering default limit.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Aug, 2012

1 commit

  • As pointed out, there are places, that access net->loopback_dev->ifindex
    and after ifindex generation is made per-net this value becomes constant
    equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
    it where appropriate.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

23 Jul, 2012

1 commit

  • Fix race condition in several network drivers when reading stats on 32bit
    UP architectures. These drivers update their stats in a BH context and
    therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
    instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
    stats.

    Signed-off-by: Kevin Groeneveld
    Signed-off-by: David S. Miller

    Kevin Groeneveld
     

29 Mar, 2012

1 commit


17 Nov, 2011

1 commit

  • Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
    enabled by default. The check heuristics is altered a bit here,
    so it hits other people than before. The default shouldn't be
    trusted for performance-critical cases anyway.

    For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.

    Signed-off-by: Michał Mirosław
    Signed-off-by: David S. Miller

    Michał Mirosław
     

09 May, 2011

1 commit

  • This patch enables ethtool to set the loopback mode on a given interface.
    By configuring the interface in loopback mode in conjunction with a policy
    route / rule, a userland application can stress the egress / ingress path
    exposing the flows of the change in progress and potentially help developer(s)
    understand the impact of those changes without even sending a packet out
    on the network.

    Following set of commands illustrates one such example -
    a) ip -4 addr add 192.168.1.1/24 dev eth1
    b) ip -4 rule add from all iif eth1 lookup 250
    c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
    d) arp -Ds 192.168.1.100 eth1
    e) arp -Ds 192.168.1.200 eth1
    f) sysctl -w net.ipv4.ip_nonlocal_bind=1
    g) sysctl -w net.ipv4.conf.all.accept_local=1
    # Assuming that the machine has 8 cores
    h) taskset 000f netserver -L 192.168.1.200
    i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30

    Signed-off-by: Mahesh Bandewar
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Mahesh Bandewar