17 Nov, 2011

1 commit

  • Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
    enabled by default. The check heuristics is altered a bit here,
    so it hits other people than before. The default shouldn't be
    trusted for performance-critical cases anyway.

    For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.

    Signed-off-by: Michał Mirosław
    Signed-off-by: David S. Miller

    Michał Mirosław
     

09 May, 2011

1 commit

  • This patch enables ethtool to set the loopback mode on a given interface.
    By configuring the interface in loopback mode in conjunction with a policy
    route / rule, a userland application can stress the egress / ingress path
    exposing the flows of the change in progress and potentially help developer(s)
    understand the impact of those changes without even sending a packet out
    on the network.

    Following set of commands illustrates one such example -
    a) ip -4 addr add 192.168.1.1/24 dev eth1
    b) ip -4 rule add from all iif eth1 lookup 250
    c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
    d) arp -Ds 192.168.1.100 eth1
    e) arp -Ds 192.168.1.200 eth1
    f) sysctl -w net.ipv4.ip_nonlocal_bind=1
    g) sysctl -w net.ipv4.conf.all.accept_local=1
    # Assuming that the machine has 8 cores
    h) taskset 000f netserver -L 192.168.1.200
    i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30

    Signed-off-by: Mahesh Bandewar
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

18 Apr, 2011

1 commit

  • Several tests in the ipv6 routing code check IFF_LOOPBACK, and
    allowing stacking such as VLAN'ing on top of loopback results in a
    netdevice which reports IFF_LOOPBACK but really isn't the loopback
    device.

    Instead of spamming the ipv6 routing code with even more special tests,
    simply disallow VLAN over loopback.

    The result of this patch is:

    # modprobe 8021q
    # vconfig add lo 43
    ERROR: trying to add VLAN #43 to IF -:lo:- error: Operation not supported

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

18 Feb, 2011

1 commit


06 Oct, 2010

1 commit

  • In various situations, a device provides a packet to our stack and we
    drop it before it enters protocol stack :
    - softnet backlog full (accounted in /proc/net/softnet_stat)
    - bad vlan tag (not accounted)
    - unknown/unregistered protocol (not accounted)

    We can handle a per-device counter of such dropped frames at core level,
    and automatically adds it to the device provided stats (rx_dropped), so
    that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)

    This is a generalization of commit 8990f468a (net: rx_dropped
    accounting), thus reverting it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Sep, 2010

1 commit

  • loopback driver uses dev->ml_priv to store its percpu stats pointer.
    It uses ugly casts "(void __percpu __force *)" to shut up sparse
    complains.

    Define an union to better document we use ml_priv in loopback driver and
    define a lstats field with appropriate types.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Jul, 2010

1 commit

  • There is a small possibility that a reader gets incorrect values on 32
    bit arches. SNMP applications could catch incorrect counters when a
    32bit high part is changed by another stats consumer/provider.

    One way to solve this is to add a rtnl_link_stats64 param to all
    ndo_get_stats64() methods, and also add such a parameter to
    dev_get_stats().

    Rule is that we are not allowed to use dev->stats64 as a temporary
    storage for 64bit stats, but a caller provided area (usually on stack)

    Old drivers (only providing get_stats() method) need no changes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Jun, 2010

1 commit

  • Commit 6b10de38f0ef (loopback: Implement 64bit stats on 32bit arches)
    introduced 64bit stats in loopback driver, using a private seqcount and
    private helpers.

    David suggested to introduce a generic infrastructure, added in (net:
    Introduce u64_stats_sync infrastructure)

    This patch reimplements loopback 64bit stats using the u64_stats_sync
    infrastructure.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Jun, 2010

1 commit

  • Uses a seqcount_t to synchronize stat producer and consumer, for packets
    and bytes counter, now u64 types.

    (dropped counter being rarely used, stay a native "unsigned long" type)

    No noticeable performance impact on x86, as it only adds two increments
    per frame. It might be more expensive on arches where smp_wmb() is not
    free.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to net drivers.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Eric Dumazet
    Cc: Arnd Bergmann
    Signed-off-by: David S. Miller

    Tejun Heo
     

15 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

02 Dec, 2009

1 commit


26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

03 Oct, 2009

1 commit

  • Use this_cpu_ptr and __this_cpu_ptr in locations where straight
    transformations are possible because per_cpu_ptr is used with
    either smp_processor_id() or raw_smp_processor_id().

    cc: David Howells
    Acked-by: Tejun Heo
    cc: Ingo Molnar
    cc: Rusty Russell
    cc: Eric Dumazet
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

01 Sep, 2009

1 commit


06 Jul, 2009

1 commit


19 May, 2009

1 commit

  • One point of contention in high network loads is the dst_release() performed
    when a transmited skb is freed. This is because NIC tx completion calls
    dev_kree_skb() long after original call to dev_queue_xmit(skb).

    CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
    quite visible if one CPU is 100% handling softirqs for a network device,
    since dst_clone() is done by other cpus, involving cache line ping pongs.

    It seems right place to release dst is in dev_hard_start_xmit(), for most
    devices but ones that are virtual, and some exceptions.

    David Miller suggested to define a new device flag, set in alloc_netdev_mq()
    (so that most devices set it at init time), and carefuly unset in devices
    which dont want a NULL skb->dst in their ndo_start_xmit().

    List of devices that must clear this flag is :

    - loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
    already need to have a dst_entry attached."
    - appletalk/ipddp.c : needs skb->dst in its xmit function

    - And all devices that call again dev_queue_xmit() from their xmit function
    (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Apr, 2009

1 commit

  • We can in some situations drop packets in netif_rx()

    loopback driver does not report these (unlikely) drops to its stats,
    and incorrectly change packets/bytes counts.

    After this patch applied, "ifconfig lo" can reports these drops as in :

    # ifconfig lo
    lo Link encap:Local Loopback
    inet addr:127.0.0.1 Mask:255.0.0.0
    UP LOOPBACK RUNNING MTU:16436 Metric:1
    RX packets:692562900 errors:3228 dropped:3228 overruns:0 frame:0
    TX packets:692562900 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:2865674174 (2.6 GiB) TX bytes:2865674174 (2.6 GiB)

    I initialy chose to reflect those errors only in tx_dropped/tx_errors, but David
    convinced me that it was really RX errors, as loopback_xmit() really starts
    a RX process. (calling eth_type_trans() for example, that itself pulls the ethernet header)

    These errors are accounted in rx_dropped/rx_errors.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Nov, 2008

1 commit

  • This patch moves neigh_setup and hard_start_xmit into the network device ops
    structure. For bisection, fix all the previously converted drivers as well.
    Bonding driver took the biggest hit on this.

    Added a prefetch of the hard_start_xmit in the fast path to try and reduce
    any impact this would have.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

20 Nov, 2008

1 commit


08 Nov, 2008

2 commits

  • I was recently hunting a bug that occurred in network namespace
    cleanup. In looking at the code it became apparrent that we have
    and will continue to have cases where if we have anything going
    on in a network namespace there will be assumptions that the
    loopback device is present. Things like sending igmp unsubscribe
    messages when we bring down network devices invokes the routing
    code which assumes that at least the loopback driver is present.

    Therefore to avoid magic initcall ordering hackery that is hard
    to follow and hard to get right insert a call to register the
    loopback device directly from net_dev_init(). This guarantes
    that the loopback device is the first device registered and
    the last network device to go away.

    But do it carefully so we register the loopback device after
    we clear dev_boot_phase.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This reverts commit ae33bc40c0d96d02f51a996482ea7e41c5152695.

    David S. Miller
     

06 Nov, 2008

1 commit

  • I was recently hunting a bug that occurred in network namespace
    cleanup. In looking at the code it became apparrent that we have
    and will continue to have cases where if we have anything going
    on in a network namespace there will be assumptions that the
    loopback device is present. Things like sending igmp unsubscribe
    messages when we bring down network devices invokes the routing
    code which assumes that at least the loopback driver is present.

    Therefore to avoid magic initcall ordering hackery that is hard
    to follow and hard to get right insert a call to register the
    loopback device directly from net_dev_init(). This guarantes
    that the loopback device is the first device registered and
    the last network device to go away.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

04 Nov, 2008

1 commit


31 Oct, 2008

1 commit


16 Aug, 2008

3 commits

  • Now that the network stack can handle inbound packets with partial
    checksums, we should no longer clobber the ip_summed field in the
    loopback driver. This is because CHECKSUM_UNNECESSARY implies that
    the checksum field is actually valid which is not true for loopback
    packets since it's only partial (and thus complemented).

    This allows packets from lo to then be SNATed to an external source
    while still preserving the checksum's validity.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • It hasn't been enabled for a long time and the generic GSO
    engine is better documentation of what is expected of a
    device implementing TSO.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch enables TSO since the loopback device is naturally
    capable of handling packets of any size. This also means that
    we won't enable GSO on lo which is good until GSO is fixed to
    preserve netfilter state as netfilter treats loopback packets
    in a special way.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Jul, 2008

1 commit


26 Mar, 2008

1 commit


29 Jan, 2008

1 commit


13 Jan, 2008

1 commit


13 Nov, 2007

1 commit


27 Oct, 2007

1 commit

  • It is not safe to to place struct pernet_operations in a special section.
    We need struct pernet_operations to last until we call unregister_pernet_subsys.
    Which doesn't happen until module unload.

    So marking struct pernet_operations is a disaster for modules in two ways.
    - We discard it before we call the exit method it points to.
    - Because I keep struct pernet_operations on a linked list discarding
    it for compiled in code removes elements in the middle of a linked
    list and does horrible things for linked insert.

    So this looks safe assuming __exit_refok is not discarded
    for modules.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

16 Oct, 2007

1 commit


11 Oct, 2007

5 commits

  • With the net namespaces many code leaved the __init section,
    thus making the kernel occupy more memory than it did before.
    Since we have a config option that prohibits the namespace
    creation, the functions that initialize/finalize some netns
    stuff are simply not needed and can be freed after the boot.

    Currently, this is almost not noticeable, since few calls
    are no longer in __init, but when the namespaces will be
    merged it will be possible to free more code. I propose to
    use the __net_init, __net_exit and __net_initdata "attributes"
    for functions/variables that are not used if the CONFIG_NET_NS
    is not set to save more space in memory.

    The exiting functions cannot just reside in the __exit section,
    as noticed by David, since the init section will have
    references on it and the compilation will fail due to modpost
    checks. These references can exist, since the init namespace
    never dies and the exit callbacks are never called. So I
    introduce the __exit_refok attribute just like it is already
    done with the __init_refok.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • A hint as to why it is safe to use per cpu variables,
    and note that we actually can have multiple instances
    of the loopback device now.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Loopback device is special. It should be initialized at the very
    beginning. Initialization order has been changed by
    Eric W. Biederman and this change is non-obvious
    and important enough to add proper comment.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Since hardware header operations are part of the protocol class
    not the device instance, make them into a separate object and
    save memory.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • It talks about __get_cpu_var() which the driver no longer
    does.

    Signed-off-by: David S. Miller

    David S. Miller