08 Jun, 2016

1 commit

  • Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
    agent [1] are problematic at scale :

    For each qdisc/class found in the dump, we currently lock the root qdisc
    spinlock in order to get stats. Sampling stats every 5 seconds from
    thousands of HTB classes is a challenge when the root qdisc spinlock is
    under high pressure. Not only the dumps take time, they also slow
    down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

    An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
    that might need the qdisc lock in fq_codel_dump_stats() and
    fq_codel_dump_class_stats()

    In v2 of this patch, I now use the Qdisc running seqcount to provide
    consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

    I also changed rate estimators to use the same infrastructure
    so that they no longer need to lock root qdisc lock.

    [1]
    http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

    Signed-off-by: Eric Dumazet
    Cc: Cong Wang
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Cc: Kevin Athey
    Cc: Xiaotian Pei
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Apr, 2016

1 commit


30 Sep, 2014

3 commits

  • After previous patches to simplify qstats the qstats can be
    made per cpu with a packed union in Qdisc struct.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This removes the use of qstats->qlen variable from the classifiers
    and makes it an explicit argument to gnet_stats_copy_queue().

    The qlen represents the qdisc queue length and is packed into
    the qstats at the last moment before passnig to user space. By
    handling it explicitely we avoid, in the percpu stats case, having
    to figure out which per_cpu variable to put it in.

    It would probably be best to remove it from qstats completely
    but qstats is a user space ABI and can't be broken. A future
    patch could make an internal only qstats structure that would
    avoid having to allocate an additional u32 variable on the
    Qdisc struct. This would make the qstats struct 128bits instead
    of 128+32.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • In order to run qdisc's without locking statistics and estimators
    need to be handled correctly.

    To resolve bstats make the statistics per cpu. And because this is
    only needed for qdiscs that are running without locks which is not
    the case for most qdiscs in the near future only create percpu
    stats when qdiscs set the TCQ_F_CPUSTATS flag.

    Next because estimators use the bstats to calculate packets per
    second and bytes per second the estimator code paths are updated
    to use the per cpu statistics.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

21 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

11 Jun, 2013

1 commit

  • struct gnet_stats_rate_est contains u32 fields, so the bytes per second
    field can wrap at 34360Mbit.

    Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
    and switch the kernel to use this structure natively.

    This structure is dumped to user space as a new attribute :

    TCA_STATS_RATE_EST64

    Old tc command will now display the capped bps (to 34360Mbit), instead
    of wrapped values, and updated tc command will display correct
    information.

    Old tc command output, after patch :

    eric:~# tc -s -d qd sh dev lo
    qdisc pfifo 8001: root refcnt 2 limit 1000p
    Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
    rate 34360Mbit 189696pps backlog 0b 0p requeues 0

    This patch carefully reorganizes "struct Qdisc" layout to get optimal
    performance on SMP.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Mar, 2011

1 commit


04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Oct, 2009

1 commit

  • Jarek Poplawski a écrit :
    >
    >
    > Hmm... So you made me to do some "real" work here, and guess what?:
    > there is one serious checkpatch warning! ;-) Plus, this new parameter
    > should be added to the function description. Otherwise:
    > Signed-off-by: Jarek Poplawski
    >
    > Thanks,
    > Jarek P.
    >
    > PS: I guess full "Don't" would show we really mean it...

    Okay :) Here is the last round, before the night !

    Thanks again

    [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

    We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
    is running.

    # tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
    rate 0bit 0pps backlog 0b 0p requeues 0

    User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
    one (because no estimator is active)

    After this patch, tc command output is :
    $ tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    We add a parameter to gnet_stats_copy_rate_est() function so that
    it can use gen_estimator_active(bstats, r), as suggested by Jarek.

    This parameter can be NULL if check is not necessary, (htb for
    example has a mandatory rate estimator)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Aug, 2009

1 commit

  • In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
    for better SMP performance" the definition of struct gnet_stats_basic
    changed incompatibly, as copies of this struct are shipped to
    userland via netlink.

    Restoring old behavior is not welcome, for performance reason.

    Fix is to use a private structure for kernel, and
    teach gnet_stats_copy_basic() to convert from kernel to user land,
    using legacy structure (struct gnet_stats_basic)

    Based on a report and initial patch from Michael Spang.

    Reported-by: Michael Spang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Nov, 2008

1 commit

  • Since all other gen_estimator functions use bstats and rate_est params
    together, and searching for them is optimized now, let's use this also
    in gen_estimator_active(). The return type of gen_estimator_active()
    is changed to bool, and gen_find_node() parameters to const, btw.

    In tcf_act_police_locate() a check for ACT_P_CREATED is added before
    calling gen_estimator_active().

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

26 Nov, 2008

1 commit

  • Found that while trying average rate policing, it was possible to
    request average rate policing without a rate estimator. This results
    in no policing which is harmless but incorrect.

    Since policing could be setup in two steps, need to check
    in the kernel.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

29 Jan, 2008

1 commit

  • Convert packet schedulers to use the netlink API. Unfortunately a gradual
    conversion is not possible without breaking compilation in the middle or
    adding lots of casts, so this patch converts them all in one step. The
    patch has been mostly generated automatically with some minor edits to
    at least allow seperate conversion of classifiers and actions.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds