04 Jan, 2012

1 commit

  • When trying to allocate ~32768 qdiscs using autohandle mechanism, we can
    fill the space managed by kernel (handles in [8000-FFFF]:0000 range)

    But O(N^2) qdisc_alloc_handle() loops 0x10000 times instead of 0x8000

    time tc add qdisc add dev eth0 parent 10:7fff pfifo limit 10
    RTNETLINK answers: Cannot allocate memory
    real 1m54.826s
    user 0m0.000s
    sys 0m0.004s

    INFO: rcu_sched_state detected stall on CPU 0 (t=60000 jiffies)

    Half number of loops, and add a cond_resched() call.
    We hold rtnl at this point.

    Signed-off-by: Eric Dumazet
    CC: Dave Taht
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Jul, 2011

1 commit


10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

20 May, 2011

1 commit


26 Feb, 2011

1 commit


21 Jan, 2011

2 commits

  • This patch converts stab qdisc management to RCU, so that we can perform
    the qdisc_calculate_pkt_len() call before getting qdisc lock.

    This shortens the lock's held time in __dev_xmit_skb().

    This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
    cache misses and so reducing latencies.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 371121057607e (net: QDISC_STATE_RUNNING dont need atomic bit
    ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
    the cache line containing qdisc lock and often dirtied fields.

    I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
    mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

    Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
    "unsigned int" for __state container, reducing by 8 bytes Qdisc size.

    Introduce helpers to hide implementation details.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


05 Oct, 2010

1 commit

  • ingress being not used very much, and net_device->ingress_queue being
    quite a big object (128 or 256 bytes), use a dynamic allocation if
    needed (tc qdisc add dev eth0 ingress ...)

    dev_ingress_queue(dev) helper should be used only with RTNL taken.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Sep, 2010

1 commit


19 Aug, 2010

1 commit


11 Aug, 2010

1 commit


10 Aug, 2010

1 commit


24 May, 2010

1 commit

  • Ben Pfaff reported a kernel oops and provided a test program to
    reproduce it.

    https://kerneltrap.org/mailarchive/linux-netdev/2010/5/21/6277805

    tc_fill_qdisc() should not be called for builtin qdisc, or it
    dereference a NULL pointer to get device ifindex.

    Fix is to always use tc_qdisc_dump_ignore() before calling
    tc_fill_qdisc().

    Reported-by: Ben Pfaff
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 May, 2010

1 commit

  • If the user has a bad classification configuration, and gets a packet
    that goes through too many steps. Chances are more packets will arrive,
    and the message spew will overrun syslog because it is not rate limited.
    And because it is not tagged with appropriate priority it can't not be screened.

    Added the qdisc to the message to try and give some more context when
    the message does arrive.

    Signed-off-by: Stephen Hemminger
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    stephen hemminger
     

12 Apr, 2010

1 commit


31 Mar, 2010

1 commit

  • These changes were suggested by Alexey Dobriyan :

    - psched_show() does not use any private data so just pass NULL to
    psched_open()

    - remove unnecessary return statement

    Signed-off-by: Tom Goff
    Signed-off-by: David S. Miller

    Tom Goff
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

23 Mar, 2010

1 commit


29 Jan, 2010

1 commit

  • This adds an additional queuing strategy, called pfifo_head_drop,
    to remove the oldest skb in the case of an overflow within the queue -
    the head element - instead of the last skb (tail). To remove the oldest
    skb in congested situations is useful for sensor network environments
    where newer packets reflect the superior information.

    Reviewed-by: Florian Westphal
    Acked-by: Patrick McHardy
    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

11 Nov, 2009

1 commit


07 Oct, 2009

1 commit

  • Jarek Poplawski a écrit :
    >
    >
    > Hmm... So you made me to do some "real" work here, and guess what?:
    > there is one serious checkpatch warning! ;-) Plus, this new parameter
    > should be added to the function description. Otherwise:
    > Signed-off-by: Jarek Poplawski
    >
    > Thanks,
    > Jarek P.
    >
    > PS: I guess full "Don't" would show we really mean it...

    Okay :) Here is the last round, before the night !

    Thanks again

    [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

    We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
    is running.

    # tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
    rate 0bit 0pps backlog 0b 0p requeues 0

    User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
    one (because no estimator is active)

    After this patch, tc command output is :
    $ tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    We add a parameter to gnet_stats_copy_rate_est() function so that
    it can use gen_estimator_active(bstats, r), as suggested by Jarek.

    This parameter can be NULL if check is not necessary, (htb for
    example has a mandatory rate estimator)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Sep, 2009

1 commit


15 Sep, 2009

3 commits

  • After the recent mq change there is the new select_queue qdisc class
    method used in tc_modify_qdisc, but it works OK only for direct child
    qdiscs of mq qdisc. Grandchildren always get the first tx queue, which
    would give wrong qdisc_root etc. results (e.g. for sch_htb as child of
    sch_prio). This patch fixes it by using parent's dev_queue for such
    grandchildren qdiscs. The select_queue method's return type is changed
    BTW.

    With feedback from: Patrick McHardy

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • After the recent mq change using ingress qdisc overwrites dev->qdisc;
    there is also a wrong old qdisc pointer passed to notify_and_destroy.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
    netxen: update copyright
    netxen: fix tx timeout recovery
    netxen: fix file firmware leak
    netxen: improve pci memory access
    netxen: change firmware write size
    tg3: Fix return ring size breakage
    netxen: build fix for INET=n
    cdc-phonet: autoconfigure Phonet address
    Phonet: back-end for autoconfigured addresses
    Phonet: fix netlink address dump error handling
    ipv6: Add IFA_F_DADFAILED flag
    net: Add DEVTYPE support for Ethernet based devices
    mv643xx_eth.c: remove unused txq_set_wrr()
    ucc_geth: Fix hangs after switching from full to half duplex
    ucc_geth: Rearrange some code to avoid forward declarations
    phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
    drivers/net/phy: introduce missing kfree
    drivers/net/wan: introduce missing kfree
    net: force bridge module(s) to be GPL
    Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
    ...

    Fixed up trivial conflicts:

    - arch/x86/include/asm/socket.h

    converted to in the x86 tree. The generic
    header has the same new #define's, so that works out fine.

    - drivers/net/tun.c

    fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
    switched over to using 'tun->socket.sk' instead of the redundantly
    available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
    to the TUN driver") which added a new 'tun->sk' use.

    Noted in 'next' by Stephen Rothwell.

    Linus Torvalds
     

10 Sep, 2009

1 commit

  • When new child qdiscs are attached to the mq qdisc, they are actually
    attached as root qdiscs to the device queues. The lock selection for
    new estimators incorrectly picks the root lock of the existing and
    to be replaced qdisc, which results in a use-after-free once the old
    qdisc has been destroyed.

    Mark mq qdisc instances with a new flag and treat qdiscs attached to
    mq as children similar to regular root qdiscs.

    Additionally prevent estimators from being attached to the mq qdisc
    itself since it only updates its byte and packet counters during dumps.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

06 Sep, 2009

4 commits

  • This patch adds a classful dummy scheduler which can be used as root qdisc
    for multiqueue devices and exposes each device queue as a child class.

    This allows to address queues individually and graft them similar to regular
    classes. Additionally it presents an accumulated view of the statistics of
    all real root qdiscs in the dummy root.

    Two new callbacks are added to the qdisc_ops and qdisc_class_ops:

    - cl_ops->select_queue selects the tx queue number for new child classes.

    - qdisc_ops->attach() overrides root qdisc device grafting to attach
    non-shared qdiscs to the queues.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    David S. Miller
     
  • It will be used in a following patch by the multiqueue qdisc.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Currently the multiqueue integration with the qdisc API suffers from
    a few problems:

    - with multiple queues, all root qdiscs use the same handle. This means
    they can't be exposed to userspace in a backwards compatible fashion.

    - all API operations always refer to queue number 0. Newly created
    qdiscs are automatically shared between all queues, its not possible
    to address individual queues or restore multiqueue behaviour once a
    shared qdisc has been attached.

    - Dumps only contain the root qdisc of queue 0, in case of non-shared
    qdiscs this means the statistics are incomplete.

    This patch reintroduces dev->qdisc, which points to the (single) root qdisc
    from userspace's point of view. Currently it either points to the first
    (non-shared) default qdisc, or a qdisc shared between all queues. The
    following patches will introduce a classful dummy qdisc, which will be used
    as root qdisc and contain the per-queue qdiscs as children.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some schedulers don't support creating, changing or deleting classes.
    Make the respective callbacks optionally and consistently return
    -EOPNOTSUPP for unsupported operations, instead of currently either
    -EOPNOTSUPP, -ENOSYS or no error.

    In case of sch_prio and sch_multiq, the removed operations additionally
    checked for an invalid class. This is not necessary since the class
    argument can only orginate from ->get() or in case of ->change is 0
    for creation of new classes, in which case ->change() incorrectly
    returned -ENOENT.

    As a side-effect, this patch fixes a possible (root-only) NULL pointer
    function call in sch_ingress, which didn't implement a so far mandatory
    ->delete() operation.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

05 Sep, 2009

1 commit

  • If the parent qdisc doesn't support classes, use EOPNOTSUPP.
    If the parent class doesn't exist, use ENOENT. Currently EINVAL
    is returned in both cases.

    Additionally check whether grafting is supported and remove a now
    unnecessary graft function from sch_ingress.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

03 Sep, 2009

1 commit


02 Sep, 2009

1 commit

  • These are full of unresolved problems, mainly that conversions don't
    work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
    tasklets can't be killed from softirq context.

    And when a qdisc gets reset, that's exactly what we need to do here.

    We'll work this out in the net-next-2.6 tree and if warranted we'll
    backport that work to -stable.

    This reverts the following 3 changesets:

    a2cb6a4dd470d7a64255a10b843b0d188416b78f
    ("pkt_sched: Fix bogon in tasklet_hrtimer changes.")

    38acce2d7983632100a9ff3fd20295f6e34074a8
    ("pkt_sched: Convert CBQ to tasklet_hrtimer.")

    ee5f9757ea17759e1ce5503bdae2b07e48e32af9
    ("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Aug, 2009

1 commit

  • Reported by Stephen Rothwell, luckily it's harmless:

    net/sched/sch_api.c: In function 'qdisc_watchdog':
    net/sched/sch_api.c:460: warning: initialization from incompatible pointer type
    net/sched/sch_cbq.c: In function 'cbq_undelay':
    net/sched/sch_cbq.c:595: warning: initialization from incompatible pointer type

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Aug, 2009

1 commit


15 Jun, 2009

1 commit


01 Feb, 2009

1 commit


23 Dec, 2008

1 commit

  • While implementing a TCQ_F_THROTTLED flag there was used an smp_wmb()
    in qdisc_watchdog(), but since this flag is practically used only in
    sch_netem(), and since it's not even clear what reordering is avoided
    here (TCQ_F_THROTTLED vs. __QDISC_STATE_SCHED?) it seems the barrier
    could be safely removed.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski