31 Mar, 2011

1 commit


25 Jan, 2011

1 commit


21 Jan, 2011

2 commits

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 371121057607e (net: QDISC_STATE_RUNNING dont need atomic bit
    ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
    the cache line containing qdisc lock and often dirtied fields.

    I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
    mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

    Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
    "unsigned int" for __state container, reducing by 8 bytes Qdisc size.

    Introduce helpers to hide implementation details.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Oct, 2010

1 commit


10 Oct, 2010

1 commit


07 Jun, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Nov, 2009

1 commit


07 Oct, 2009

1 commit

  • Jarek Poplawski a écrit :
    >
    >
    > Hmm... So you made me to do some "real" work here, and guess what?:
    > there is one serious checkpatch warning! ;-) Plus, this new parameter
    > should be added to the function description. Otherwise:
    > Signed-off-by: Jarek Poplawski
    >
    > Thanks,
    > Jarek P.
    >
    > PS: I guess full "Don't" would show we really mean it...

    Okay :) Here is the last round, before the night !

    Thanks again

    [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

    We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
    is running.

    # tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
    rate 0bit 0pps backlog 0b 0p requeues 0

    User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
    one (because no estimator is active)

    After this patch, tc command output is :
    $ tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    We add a parameter to gnet_stats_copy_rate_est() function so that
    it can use gen_estimator_active(bstats, r), as suggested by Jarek.

    This parameter can be NULL if check is not necessary, (htb for
    example has a mandatory rate estimator)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Sep, 2009

1 commit


18 Aug, 2009

1 commit

  • In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
    for better SMP performance" the definition of struct gnet_stats_basic
    changed incompatibly, as copies of this struct are shipped to
    userland via netlink.

    Restoring old behavior is not welcome, for performance reason.

    Fix is to use a private structure for kernel, and
    teach gnet_stats_copy_basic() to convert from kernel to user land,
    using legacy structure (struct gnet_stats_basic)

    Based on a report and initial patch from Michael Spang.

    Reported-by: Michael Spang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Mar, 2009

1 commit

  • While looking for a possible reason of bugzilla report on HTB oops:
    http://bugzilla.kernel.org/show_bug.cgi?id=12858
    I found the code in htb_delete calling htb_destroy_class on zero
    refcount is very misleading: it can suggest this is a common path, and
    destroy is called under sch_tree_lock. Actually, this can never happen
    like this because before deletion cops->get() is done, and after
    delete a class is still used by tclass_notify. The class destroy is
    always called from cops->put(), so without sch_tree_lock.

    This doesn't mean much now (since 2.6.27) because all vulnerable calls
    were moved from htb_destroy_class to htb_delete, but there was a bug
    in older kernels. The same change is done for other classful scheds,
    which, it seems, didn't have similar locking problems here.

    Reported-by: m0sia
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

01 Feb, 2009

3 commits


13 Jan, 2009

2 commits

  • Currently htb_do_events() breaks events recounting for a level after 2
    jiffies, but there is no reason to repeat this for next levels and
    increase delays even more (with softirqs disabled). htb_dequeue_tree()
    can add to this too, btw. In such a case q->now time is invalid anyway.

    Thanks to Patrick McHardy for spotting an error around earlier version
    of this patch.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Next event time should consider jiffies used for recounting. Otherwise
    qdisc_watchdog_schedule() triggers hrtimer immediately with the event
    in the past, and may cause very high ksoftirqd cpu usage (if highres
    is on).

    There is also removed checking "event" for zero in htb_dequeue(): it's
    always true in this place.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

10 Dec, 2008

2 commits


04 Dec, 2008

5 commits


26 Nov, 2008

1 commit


20 Nov, 2008

1 commit


14 Nov, 2008

1 commit

  • After implementing qdisc->ops->peek() and changing sch_netem into
    classless qdisc there are no more qdisc->ops->requeue() users. This
    patch removes this method with its wrappers (qdisc_requeue()), and
    also unused qdisc->requeue structure. There are a few minor fixes of
    warnings (htb_enqueue()) and comments btw.

    The idea to kill ->requeue() and a similar patch were first developed
    by David S. Miller.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

31 Oct, 2008

1 commit

  • This patch adds qdisc_peek_dequeued() wrapper to emulate peek method
    with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until
    dequeuing. This is mainly for compatibility reasons not to break some
    strange configs because peeking is expected for non-work-conserving
    parent qdiscs to query work-conserving child qdiscs.

    This implementation requires using qdisc_dequeue_peeked() wrapper
    instead of directly calling qdisc->dequeue() for all qdiscs ever
    querried with qdisc->ops->peek() or qdisc_peek_dequeued().

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

30 Aug, 2008

1 commit

  • Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where
    appropriate. The only difference is while dev is deactivated, when
    currently we can use a sleeping qdisc with the lock of noop_qdisc.
    This shouldn't be dangerous since after deactivation root lock could
    be used only by gen_estimator code, but looks wrong anyway.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

27 Aug, 2008

1 commit

  • While passing a qdisc root lock to gen_new_estimator() and
    gen_replace_estimator() dev could be deactivated or even before
    grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so
    using qdisc_root_lock() is not enough. This patch adds
    qdisc_root_sleeping_lock() for this, plus additional checks, where
    necessary.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

18 Aug, 2008

1 commit

  • Based upon a bug report by Josip Rodin.

    Packet schedulers should only return NET_XMIT_DROP iff
    the packet really was dropped. If the packet does reach
    the device after we return NET_XMIT_DROP then TCP can
    crash because it depends upon the enqueue path return
    values being accurate.

    Signed-off-by: David S. Miller

    David S. Miller
     

14 Aug, 2008

1 commit

  • Recent changes introduced a bug in htb_delete(): cl->parent->children
    counter update misses checking cl->parent for NULL, which is used for
    root classes, so deleting them causes an oops.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

05 Aug, 2008

2 commits

  • Patrick McHardy noticed that it would be nice to
    handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag
    __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit().

    David Miller spotted a serious bug in the first
    version of this patch.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Patrick McHardy noticed:
    "The other problem that affects all qdiscs supporting actions is
    TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
    even though the packet is not queued, corrupting upper qdiscs'
    qlen counters."

    and later explained:
    "The reason why it translates it at all seems to be to not increase
    the drops counter. Within a single qdisc this could be avoided by
    other means easily, upper qdiscs would still increase the counter
    when we return anything besides NET_XMIT_SUCCESS though.

    This means we need a new NET_XMIT return value to indicate this to
    the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
    return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
    in dev_queue_xmit, similar to NET_XMIT_BYPASS."

    David Miller noticed:
    "Maybe these NET_XMIT_* values being passed around should be a set of
    bits. They could be composed of base meanings, combined with specific
    attributes.

    So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"

    The attributes get masked out by the top-level ->enqueue() caller,
    such that the base meanings are the only thing that make their
    way up into the stack. If it's only about communication within the
    qdisc tree, let's simply code it that way."

    This patch is trying to realize these ideas.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

26 Jul, 2008

1 commit

  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

20 Jul, 2008

2 commits