06 Jan, 2012

1 commit

  • Not now, but it looks you are correct. q->qdisc is NULL until another
    additional qdisc is attached (beside tfifo). See 50612537e9ab2969312.
    The following patch should work.

    From: Hagen Paul Pfeifer

    netem: catch NULL pointer by updating the real qdisc statistic

    Reported-by: Vijay Subramanian
    Signed-off-by: Hagen Paul Pfeifer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

31 Dec, 2011

2 commits

  • Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
    reintroduced classful functionality to netem, but broke basic netem
    behavior :

    netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]

    If qdisc is changed, time constraints are not respected and other qdisc
    can destroy skb->cb[] and block netem at dequeue time.

    Fix this by always using internal tfifo, and optionally attach a child
    qdisc to netem (or a tree of qdiscs)

    Example of use :

    DEV=eth3
    tc qdisc del dev $DEV root
    tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
    tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
    burst 20480 limit 20480 mtu 1514 rate 32000bps

    qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms
    Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0)
    rate 18416bit 3pps backlog 0b 0p requeues 0
    qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
    Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0)
    backlog 0b 5p requeues 0

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     

25 Dec, 2011

1 commit

  • commit 6373a9a286 (netem: use vmalloc for distribution table) added a
    regression, since vfree() is called while holding a spinlock and BH
    being disabled.

    Fix this by doing the pointers swap in critical section, and freeing
    after spinlock release.

    Also add __GFP_NOWARN to the kmalloc() try, since we fallback to
    vmalloc().

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Dec, 2011

1 commit

  • The new netem loss model is configured with nested netlink messages.
    This code is being overly strict about sizes, and is easily confused
    by padding (or possible future expansion). Also message
    for gemodel is incorrect.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

13 Dec, 2011

1 commit

  • This extension can be used to simulate special link layer
    characteristics. Simulate because packet data is not modified, only the
    calculation base is changed to delay a packet based on the original
    packet size and artificial cell information.

    packet_overhead can be used to simulate a link layer header compression
    scheme (e.g. set packet_overhead to -20) or with a positive
    packet_overhead value an additional MAC header can be simulated. It is
    also possible to "replace" the 14 byte Ethernet header with something
    else.

    cell_size and cell_overhead can be used to simulate link layer schemes,
    based on cells, like some TDMA schemes. Another application area are MAC
    schemes using a link layer fragmentation with a (small) header each.
    Cell size is the maximum amount of data bytes within one cell. Cell
    overhead is an additional variable to change the per-cell-overhead
    (e.g. 5 byte header per fragment).

    Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per
    cell overhead 5 byte):

    tc qdisc add dev eth0 root netem rate 5kbit 20 100 5

    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: Florian Westphal
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

02 Dec, 2011

1 commit


01 Dec, 2011

1 commit

  • Currently netem is not in the ability to emulate channel bandwidth. Only static
    delay (and optional random jitter) can be configured.

    To emulate the channel rate the token bucket filter (sch_tbf) can be used. But
    TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot
    be 0. Also the idea behind TBF is that the credit (token in buckets) fills if
    no packet is transmitted. So that there is always a "positive" credit for new
    packets. In real life this behavior contradicts the law of nature where
    nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s
    link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0
    seconds.

    Netem is an excellent place to implement a rate limiting feature: static
    delay is already implemented, tfifo already has time information and the
    user can skip TBF configuration completely.

    This patch implement rate feature which can be configured via tc. e.g:

    tc qdisc add dev eth0 root netem rate 10kbit

    To emulate a link of 5000byte/s and add an additional static delay of 10ms:

    tc qdisc add dev eth0 root netem delay 10ms rate 5KBps

    Note: similar to TBF the rate extension is bounded to the kernel timing
    system. Depending on the architecture timer granularity, higher rates (e.g.
    10mbit/s and higher) tend to transmission bursts. Also note: further queues
    living in network adaptors; see ethtool(8).

    Signed-off-by: Hagen Paul Pfeifer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

22 Jun, 2011

1 commit

  • Remove linux/mm.h inclusion from netdevice.h -- it's unused (I've checked manually).

    To prevent mm.h inclusion via other channels also extract "enum dma_data_direction"
    definition into separate header. This tiny piece is what gluing netdevice.h with mm.h
    via "netdevice.h => dmaengine.h => dma-mapping.h => scatterlist.h => mm.h".
    Removal of mm.h from scatterlist.h was tried and was found not feasible
    on most archs, so the link was cutoff earlier.

    Hope people are OK with tiny include file.

    Note, that mm_types.h is still dragged in, but it is a separate story.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

31 Mar, 2011

1 commit


25 Feb, 2011

7 commits


25 Jan, 2011

1 commit


21 Jan, 2011

2 commits

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 371121057607e (net: QDISC_STATE_RUNNING dont need atomic bit
    ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
    the cache line containing qdisc lock and often dirtied fields.

    I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
    mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

    Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
    "unsigned int" for __state container, reducing by 8 bytes Qdisc size.

    Introduce helpers to hide implementation details.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Oct, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Nov, 2009

1 commit


20 Apr, 2009

1 commit

  • Alex Sidorenko reported:

    "while experimenting with 'netem' we have found some strange behaviour. It
    seemed that ingress delay as measured by 'ping' command shows up on some
    hosts but not on others.

    After some investigation I have found that the problem is that skbuff->tstamp
    field value depends on whether there are any packet sniffers enabled. That
    is:

    - if any ptype_all handler is registered, the tstamp field is as expected
    - if there are no ptype_all handlers, the tstamp field does not show the delay"

    This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
    on ingress path (with act_mirred) adding a check, so minimal overhead on
    the fast path, but only when sniffers etc. are active.

    Since netem at ingress seems to logically emulate a network before a host,
    tstamp is zeroed to trigger the update and pretend delays are from the
    outside.

    Reported-by: Alex Sidorenko
    Tested-by: Alex Sidorenko
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

23 Dec, 2008

1 commit

  • While implementing a TCQ_F_THROTTLED flag there was used an smp_wmb()
    in qdisc_watchdog(), but since this flag is practically used only in
    sch_netem(), and since it's not even clear what reordering is avoided
    here (TCQ_F_THROTTLED vs. __QDISC_STATE_SCHED?) it seems the barrier
    could be safely removed.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

16 Dec, 2008

1 commit


15 Dec, 2008

1 commit


20 Nov, 2008

1 commit


14 Nov, 2008

1 commit

  • After implementing qdisc->ops->peek() and changing sch_netem into
    classless qdisc there are no more qdisc->ops->requeue() users. This
    patch removes this method with its wrappers (qdisc_requeue()), and
    also unused qdisc->requeue structure. There are a few minor fixes of
    warnings (htb_enqueue()) and comments btw.

    The idea to kill ->requeue() and a similar patch were first developed
    by David S. Miller.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

04 Nov, 2008

1 commit


02 Nov, 2008

2 commits

  • After removing netem classful functionality we are sure its inner
    qdisc is tfifo, so we can replace qdisc->ops->requeue() method with
    open code. After this patch there are no more ops->requeue() users.

    The idea of this patch is by Patrick McHardy.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Patrick McHardy noticed that: "a lot of the functionality of netem
    requires the inner tfifo anyways and rate-limiting is usually done
    on top of netem. So I would suggest so either hard-wire the tfifo
    qdisc or at least make the assumption that inner qdiscs are
    work-conserving.", and later: "- a lot of other qdiscs still don't
    work as inner qdiscs of netem [...]".

    So, according to his suggestion, this patch removes classful options
    of netem. The main reason of this change is to remove ops->requeue()
    method, which is currently used only by netem.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

31 Oct, 2008

3 commits

  • This patch adds qdisc_peek_dequeued() wrapper to emulate peek method
    with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until
    dequeuing. This is mainly for compatibility reasons not to break some
    strange configs because peeking is expected for non-work-conserving
    parent qdiscs to query work-conserving child qdiscs.

    This implementation requires using qdisc_dequeue_peeked() wrapper
    instead of directly calling qdisc->dequeue() for all qdiscs ever
    querried with qdisc->ops->peek() or qdisc_peek_dequeued().

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() pair.
    After this patch the only remaining user of qdisc->ops->requeue() is
    netem_enqueue(). Based on ideas of Herbert Xu, Patrick McHardy and
    David S. Miller.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Add qdisc->ops->peek() implementation for work-conserving qdiscs.
    With feedback from Patrick McHardy.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

03 Sep, 2008

1 commit


30 Aug, 2008

1 commit

  • Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where
    appropriate. The only difference is while dev is deactivated, when
    currently we can use a sleeping qdisc with the lock of noop_qdisc.
    This shouldn't be dangerous since after deactivation root lock could
    be used only by gen_estimator code, but looks wrong anyway.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

05 Aug, 2008

1 commit

  • Patrick McHardy noticed that it would be nice to
    handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag
    __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit().

    David Miller spotted a serious bug in the first
    version of this patch.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski