02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

12 Aug, 2017

1 commit


18 May, 2017

1 commit


13 Mar, 2017

1 commit

  • The original reason [1] for having hidden qdiscs (potential scalability
    issues in qdisc_match_from_root() with single linked list in case of large
    amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
    qdisc linked list to hashtable").

    This allows us for bringing more clarity and determinism into the dump by
    making default pfifo qdiscs visible.

    We're not turning this on by default though, at it was deemed [2] too
    intrusive / unnecessary change of default behavior towards userspace.
    Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
    applications to request complete qdisc hierarchy dump, including the
    ones that have always been implicit/invisible.

    Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
    about singletons would require quite some surgery with very little gain
    (seeing no qdisc or seeing noop qdisc in the dump is probably setting
    the same user expectation).

    [1] http://lkml.kernel.org/r/1460732328.10638.74.camel@edumazet-glaptop3.roam.corp.google.com
    [2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.davem@davemloft.net

    Signed-off-by: Jiri Kosina
    Signed-off-by: David S. Miller

    Jiri Kosina
     

08 Nov, 2016

1 commit

  • The default TX queue length of Ethernet devices have been a magic
    constant of 1000, ever since the initial git import.

    Looking back in historical trees[1][2] the value used to be 100,
    with the same comment "Ethernet wants good queues". The commit[3]
    that changed this from 100 to 1000 didn't describe why, but from
    conversations with Robert Olsson it seems that it was changed
    when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
    link speed increased x10 the queue size were also adjusted. This
    value later caused much heartache for the bufferbloat community.

    This patch merely moves the value into a defined constant.

    [1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
    [2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
    [3] https://git.kernel.org/tglx/history/c/98921832c232

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

11 Aug, 2016

1 commit

  • Convert the per-device linked list into a hashtable. The primary
    motivation for this change is that currently, we're not tracking all the
    qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
    performed over the linked list by qdisc_match_from_root() is rather
    expensive.

    The ultimate goal is to get rid of hidden qdiscs completely, which will
    bring much more determinism in user experience.

    Reviewed-by: Cong Wang
    Signed-off-by: Jiri Kosina
    Signed-off-by: David S. Miller

    Jiri Kosina
     

11 Jun, 2016

1 commit

  • __QDISC_STATE_THROTTLED bit manipulation is rather expensive
    for HTB and few others.

    I already removed it for sch_fq in commit f2600cf02b5b
    ("net: sched: avoid costly atomic operation in fq_dequeue()")
    and so far nobody complained.

    When one ore more packets are stuck in one or more throttled
    HTB class, a htb dequeue() performs two atomic operations
    to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
    lock is held.

    Removing this pair of atomic operations bring me a 8 % performance
    increase on 200 TCP_RR tests, in presence of throttled classes.

    This patch has no side effect, since nothing actually uses
    disc_is_throttled() anymore.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 May, 2016

1 commit

  • I found a serious performance bug in packet schedulers using hrtimers.

    sch_htb and sch_fq are definitely impacted by this problem.

    We constantly rearm high resolution timers if some packets are throttled
    in one (or more) class, and other packets are flying through qdisc on
    another (non throttled) class.

    hrtimer_start() does not have the mod_timer() trick of doing nothing if
    expires value does not change :

    if (timer_pending(timer) &&
    timer->expires == expires)
    return 1;

    This issue is particularly visible when multiple cpus can queue/dequeue
    packets on the same qdisc, as hrtimer code has to lock a remote base.

    I used following fix :

    1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
    it.

    2) Cache watchdog prior expiration. hrtimer might provide this, but I
    prefer to not rely on some hrtimer internal.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Aug, 2015

1 commit

  • For classifiers getting invoked via tc_classify(), we always need an
    extra function call into tc_classify_compat(), as both are being
    exported as symbols and tc_classify() itself doesn't do much except
    handling of reclassifications when tp->classify() returned with
    TC_ACT_RECLASSIFY.

    CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
    all others use tc_classify(). When tc actions are being configured
    out in the kernel, tc_classify() effectively does nothing besides
    delegating.

    We could spare this layer and consolidate both functions. pktgen on
    single CPU constantly pushing skbs directly into the netif_receive_skb()
    path with a dummy classifier on ingress qdisc attached, improves
    slightly from 22.3Mpps to 23.1Mpps.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

14 Jan, 2015

2 commits


06 Oct, 2014

1 commit

  • Standard qdisc API to setup a timer implies an atomic operation on every
    packet dequeue : qdisc_unthrottled()

    It turns out this is not really needed for FQ, as FQ has no concept of
    global qdisc throttling, being a qdisc handling many different flows,
    some of them can be throttled, while others are not.

    Fix is straightforward : add a 'bool throttle' to
    qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
    in sch_fq.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Oct, 2014

1 commit

  • Validation of skb can be pretty expensive :

    GSO segmentation and/or checksum computations.

    We can do this without holding qdisc lock, so that other cpus
    can queue additional packets.

    Trick is that requeued packets were already validated, so we carry
    a boolean so that sch_direct_xmit() can validate a fresh skb list,
    or directly use an old one.

    Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
    host.

    Turning TSO on or off had no effect on throughput, only few more cpu
    cycles. Lock contention on qdisc lock disappeared.

    Same if disabling TX checksum offload.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Aug, 2014

1 commit


12 Jun, 2014

1 commit

  • The DRR scheduler requires that items on the active list are work
    conserving, i.e. do not hold on to skbs for throttling purposes, etc.
    Attaching e.g. tbf renders DRR useless because all other classes on the
    active list are delayed as well.

    So, warn users that this configuration won't work as expected; we
    already do this in couple of other qdiscs, see e.g.

    commit b00355db3f88d96810a60011a30cfb2c3469409d
    ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')

    The 'const' change is needed to avoid compiler warning ("discards 'const'
    qualifier from pointer target type").

    tested with:
    drr_hier() {
    parent=$1
    classes=$2
    for i in $(seq 1 $classes); do
    classid=$parent$(printf %x $i)
    tc class add dev eth0 parent $parent classid $classid drr
    tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
    done
    }
    tc qdisc add dev eth0 root handle 1: drr
    drr_hier 1: 32
    tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

10 Dec, 2013

1 commit

  • Commit 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
    added the ability to change default qdisc from pfifo_fast to say fq

    But as most modern ethernet devices are multiqueue, we cant really
    see all the statistics from "tc -s qdisc show", as the default root
    qdisc is mq.

    This patch adds the calls to qdisc_list_add() to mq and mqprio

    Signed-off-by: Eric Dumazet
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Aug, 2013

1 commit

  • By default, the pfifo_fast queue discipline has been used by default
    for all devices. But we have better choices now.

    This patch allow setting the default queueing discipline with sysctl.
    This allows easy use of better queueing disciplines on all devices
    without having to use tc qdisc scripts. It is intended to allow
    an easy path for distributions to make fq_codel or sfq the default
    qdisc.

    This patch also makes pfifo_fast more of a first class qdisc, since
    it is now possible to manually override the default and explicitly
    use pfifo_fast. The behavior for systems who do not use the sysctl
    is unchanged, they still get pfifo_fast

    Also removes leftover random # in sysctl net core.

    Signed-off-by: Stephen Hemminger
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     

01 Aug, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Reflow modified prototypes to 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

13 Feb, 2013

1 commit


16 Apr, 2012

1 commit


06 Jul, 2011

1 commit


31 Mar, 2011

1 commit


02 Jun, 2010

1 commit


02 Apr, 2010

1 commit

  • One of my test machine got a deadlock during "tc" sessions,
    adding/deleting classes & filters, using traffic estimators.

    After some analysis, I believe we have a potential use after free case
    in est_timer() :

    spin_lock(e->stats_lock); << HERE >>
    read_lock(&est_lock);
    if (e->bstats == NULL) << TEST >>
    goto skip;

    Test is done a bit late, because after estimator is killed, and before
    rcu grace period elapsed, we might already have freed/reuse memory where
    e->stats_locks points to (some qdisc->q.lock)

    A possible fix is to respect a rcu grace period at Qdisc dismantle time.

    On 64bit, sizeof(struct Qdisc) is exactly 192 bytes. Adding 16 bytes to
    it (for struct rcu_head) is a problem because it might change
    performance, given QDISC_ALIGNTO is 32 bytes.

    This is why I also change QDISC_ALIGNTO to 64 bytes, to satisfy most
    current alignment requirements.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Jan, 2010

1 commit

  • This adds an additional queuing strategy, called pfifo_head_drop,
    to remove the oldest skb in the case of an overflow within the queue -
    the head element - instead of the last skb (tail). To remove the oldest
    skb in congested situations is useful for sensor network environments
    where newer packets reflect the superior information.

    Reviewed-by: Florian Westphal
    Acked-by: Patrick McHardy
    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Aug, 2009

1 commit

  • dev_queue_xmit enqueue's a skb and calls qdisc_run which
    dequeue's the skb and xmits it. In most cases, the skb that
    is enqueue'd is the same one that is dequeue'd (unless the
    queue gets stopped or multiple cpu's write to the same queue
    and ends in a race with qdisc_run). For default qdiscs, we
    can remove the redundant enqueue/dequeue and simply xmit the
    skb since the default qdisc is work-conserving.

    The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
    default fast queue. The controversial part of the patch is
    incrementing qlen when a skb is requeued - this is to avoid
    checks like the second line below:

    + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
    >> !q->gso_skb &&
    + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {

    Results of a 2 hour testing for multiple netperf sessions (1,
    2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
    aggregate Mb/s across iterations tested with this version on
    System-X boxes with Chelsio 10gbps cards:

    ----------------------------------
    Size | ORG BW NEW BW |
    ----------------------------------
    128K | 156964 159381 |
    256K | 158650 162042 |
    ----------------------------------

    Changes from ver1:

    1. Move sch_direct_xmit declaration from sch_generic.h to
    pkt_sched.h
    2. Update qdisc basic statistics for direct xmit path.
    3. Set qlen to zero in qdisc_reset.
    4. Changed some function names to more meaningful ones.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

15 Jun, 2009

1 commit


09 Jun, 2009

2 commits

  • Change PSCHED_SHIFT from 10 to 6 to increase schedulers time
    resolution. This will increase 16x a number of (internal) ticks per
    nanosecond, and is needed to improve accuracy of schedulers based on
    rate tables, like HTB, TBF or CBQ, with rates above 100Mbit. It is
    assumed this change is safe for 32bit accounting of time diffs up
    to 2 minutes, which should be enough for common use (extremely low
    rate values may overflow, so get inaccurate instead). To make full
    use of this change an updated iproute2 will be needed. (But using
    older iproute2 should be safe too.)

    This change breaks ticks - microseconds similarity, so some minor code
    fixes might be needed. It is also planned to change naming adequately
    eg. to PSCHED_TICKS2NS() etc. in the near future.

    Reported-by: Antonio Almeida
    Tested-by: Antonio Almeida
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
    PSCHED_NS2US() macros to enable changing this value later.

    Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
    definitions. This part of the patch is based on feedback from
    Patrick McHardy .

    Reported-by: Antonio Almeida
    Tested-by: Antonio Almeida
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

01 Feb, 2009

1 commit


23 Sep, 2008

1 commit

  • The current check wrongly uses the state of one (currently the first)
    tx queue for all tx queues in case of non-default qdiscs. This check
    mainly prevented requeuing loop with __netif_schedule(), but now it's
    controlled inside __qdisc_run(), while dequeuing. The wrongness of
    this check was first noticed by Herbert Xu.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

22 Aug, 2008

1 commit

  • Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup())
    without rtnl_lock(), adding and deleting from a qdisc list needs
    additional locking. This patch adds global spinlock qdisc_list_lock
    and wrapper functions for modifying the list. It is considered as a
    temporary solution until hfsc_dequeue(), netem_dequeue() and
    tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone.

    With feedback from Herbert Xu and David S. Miller.

    Signed-off-by: Jarek Poplawski
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

13 Aug, 2008

1 commit

  • Based upon a bug report by Andrew Gallatin on netdev
    with subject "CPU utilization increased in 2.6.27rc"

    In commit 37437bb2e1ae8af470dfcd5b4ff454110894ccaf
    ("pkt_sched: Schedule qdiscs instead of netdev_queue.")
    the test of the queue being stopped was erroneously
    removed from qdisc_run().

    When the TX queue of the device fills up, this omission
    causes lots of extraneous useless work to be queued up
    to softirq context, where we'll just return immediately
    because the device is still stuffed up.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Jul, 2008

1 commit


18 Jul, 2008

3 commits

  • When we have shared qdiscs, packets come out of the qdiscs
    for multiple transmit queues.

    Therefore it doesn't make any sense to schedule the transmit
    queue when logically we cannot know ahead of time the TX
    queue of the SKB that the qdisc->dequeue() will give us.

    Just for sanity I added a BUG check to make sure we never
    get into a state where the noop_qdisc is scheduled.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently it is associated with a netdev_queue, but when we have
    qdisc sharing that no longer makes any sense.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This effectively "flips the switch" by making the core networking
    and multiqueue-aware drivers use the new TX multiqueue structures.

    Non-multiqueue drivers need no changes. The interfaces they use such
    as netif_stop_queue() degenerate into an operation on TX queue zero.
    So everything "just works" for them.

    Code that really wants to do "X" to all TX queues now invokes a
    routine that does so, such as netif_tx_wake_all_queues(),
    netif_tx_stop_all_queues(), etc.

    pktgen and netpoll required a little bit more surgery than the others.

    In particular the pktgen changes, whilst functional, could be largely
    improved. The initial check in pktgen_xmit() will sometimes check the
    wrong queue, which is mostly harmless. The thing to do is probably to
    invoke fill_packet() earlier.

    The bulk of the netpoll changes is to make the code operate solely on
    the TX queue indicated by by the SKB queue mapping.

    Setting of the SKB queue mapping is entirely confined inside of
    net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
    special semantics (drops, for example) it will be implemented here.

    Finally, we now have a "real_num_tx_queues" which is where the driver
    indicates how many TX queues are actually active.

    With IGB changes from Jeff Kirsher.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Jul, 2008

2 commits