26 Jun, 2016

1 commit

  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Jan, 2015

1 commit

  • tc code implicitly considers skb->protocol even in case of accelerated
    vlan paths and expects vlan protocol type here. However, on rx path,
    if the vlan header was already stripped, skb->protocol contains value
    of next header. Similar situation is on tx path.

    So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.

    Reported-by: Jamal Hadi Salim
    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

13 Jan, 2015

1 commit


08 Oct, 2014

1 commit

  • Testing xmit_more support with netperf and connected UDP sockets,
    I found strange dst refcount false sharing.

    Current handling of IFF_XMIT_DST_RELEASE is not optimal.

    Dropping dst in validate_xmit_skb() is certainly too late in case
    packet was queued by cpu X but dequeued by cpu Y

    The logical point to take care of drop/force is in __dev_queue_xmit()
    before even taking qdisc lock.

    As Julian Anastasov pointed out, need for skb_dst() might come from some
    packet schedulers or classifiers.

    This patch adds new helper to cleanly express needs of various drivers
    or qdiscs/classifiers.

    Drivers that need skb_dst() in their ndo_start_xmit() should call
    following helper in their setup instead of the prior :

    dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
    ->
    netif_keep_dst(dev);

    Instead of using a single bit, we use two bits, one being
    eventually rebuilt in bonding/team drivers.

    The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
    rebuilt in bonding/team. Eventually, we could add something
    smarter later.

    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Sep, 2014

1 commit

  • Add __rcu notation to qdisc handling by doing this we can make
    smatch output more legible. And anyways some of the cases should
    be using rcu_dereference() see qdisc_all_tx_empty(),
    qdisc_tx_chainging(), and so on.

    Also *wake_queue() API is commonly called from driver timer routines
    without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
    around netif_wake_subqueue and netif_tx_wake_queue.

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

02 Sep, 2014

2 commits


25 Aug, 2014

1 commit


16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

05 Jul, 2012

1 commit


04 May, 2012

1 commit


06 Dec, 2011

1 commit


03 Dec, 2011

1 commit


01 Dec, 2011

1 commit

  • We need rcu_read_lock() protection before using dst_get_neighbour(), and
    we must cache its value (pass it to __teql_resolve())

    teql_master_xmit() is called under rcu_read_lock_bh() protection, its
    not enough.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Nov, 2011

1 commit

  • Create separate queue state flags so that either the stack or drivers
    can turn on XOFF. Added a set of functions used in the stack to determine
    if a queue is really stopped (either by stack or driver)

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     

18 Jul, 2011

1 commit


25 Jan, 2011

1 commit


21 Jan, 2011

1 commit

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


14 Jan, 2011

1 commit

  • After recent changes, (percpu stats on vlan/tunnels...), we dont need
    anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

    Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

    1) ixgbe can be converted to use existing tx_ring counters.

    2) macvlan incremented txq->tx_dropped, it can use the
    dev->stats.tx_dropped counter.

    3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
    Now we have ndo_get_stats64(), use it, even for "unsigned long"
    fields (No need to bring back a struct net_device_stats)

    4) gianfar adds a stats structure per tx queue to hold
    tx_bytes/tx_packets

    This removes a lockdep warning (and possible lockup) in rndis gadget,
    calling dev_get_stats() from hard IRQ context.

    Ref: http://www.spinics.net/lists/netdev/msg149202.html

    Reported-by: Neil Jones
    Signed-off-by: Eric Dumazet
    CC: Jarek Poplawski
    CC: Alexander Duyck
    CC: Jeff Kirsher
    CC: Sandeep Gopalpet
    CC: Michal Nazarewicz
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Nov, 2010

1 commit


12 Oct, 2010

1 commit

  • Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
    dirtying neighbour in stress situation (many different flows / dsts)

    Dirtying takes place because of read_lock(&n->lock) and n->used writes.

    Switching to a seqlock, and writing n->used only on jiffies changes
    permits less dirtying.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Aug, 2010

1 commit


17 Jun, 2010

1 commit

  • https://bugzilla.kernel.org/show_bug.cgi?id=16183

    The sch_teql module, which can be used to load balance over a set of
    underlying interfaces, stopped working after 2.6.30 and has been
    broken in all kernels since then for any underlying interface which
    requires the addition of link level headers.

    The problem is that the transmit routine relies on being able to
    access the destination address in the skb in order to do address
    resolution once it has decided which underlying interface it is going
    to transmit through.

    In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by
    default for all interfaces, which causes the destination address to be
    released before the transmit routine for the interface is called.

    The solution is to clear that flag for teql interfaces.

    Signed-off-by: Tom Hughes
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Hughes
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Nov, 2009

1 commit


01 Sep, 2009

1 commit


06 Jul, 2009

1 commit


13 Jun, 2009

1 commit


03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 May, 2009

1 commit

  • We would like to get rid of netdev->trans_start = jiffies; that about all net
    drivers have to use in their start_xmit() function, and use txq->trans_start
    instead.

    This can be done generically in core network, as suggested by David.

    Some devices, (particularly loopback) dont need trans_start update, because
    they dont have transmit watchdog. We could add a new device flag, or rely
    on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
    different than -1. Use a helper function to hide our choice.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 May, 2009

1 commit

  • We can slightly reduce size of teqlN structure, not duplicating stats
    structure in teql_master but using stats field from net_device.stats
    for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped
    values.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 May, 2009

1 commit

  • It is illegal to dereference a skb after a successful ndo_start_xmit()
    call. We must store skb length in a local variable instead.

    Bug was introduced in 2.6.27 by commit 0abf77e55a2459aa9905be4b226e4729d5b4f0cb
    (net_sched: Add accessor function for packet length for qdiscs)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Jan, 2009

1 commit


14 Nov, 2008

1 commit

  • After implementing qdisc->ops->peek() and changing sch_netem into
    classless qdisc there are no more qdisc->ops->requeue() users. This
    patch removes this method with its wrappers (qdisc_requeue()), and
    also unused qdisc->requeue structure. There are a few minor fixes of
    warnings (htb_enqueue()) and comments btw.

    The idea to kill ->requeue() and a similar patch were first developed
    by David S. Miller.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

31 Oct, 2008

1 commit


30 Aug, 2008

1 commit

  • Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where
    appropriate. The only difference is while dev is deactivated, when
    currently we can use a sleeping qdisc with the lock of noop_qdisc.
    This shouldn't be dangerous since after deactivation root lock could
    be used only by gen_estimator code, but looks wrong anyway.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

01 Aug, 2008

1 commit

  • When support for multiple TX queues were added, the
    netif_tx_lock() routines we converted to iterate over
    all TX queues and grab each queue's spinlock.

    This causes heartburn for lockdep and it's not a healthy
    thing to do with lots of TX queues anyways.

    So modify this to use a top-level lock and a "frozen"
    state for the individual TX queues.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Jul, 2008

1 commit