04 Jan, 2012

1 commit

  • grp->slot_shift is between 22 and 41, so using 32bit wide variables is
    probably a typo.

    This could explain QFQ hangs Dave reported to me, after 2^23 packets ?

    (23 = 64 - 41)

    Reported-by: Dave Taht
    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Dave Taht
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Dec, 2011

1 commit

  • commit 6373a9a286 (netem: use vmalloc for distribution table) added a
    regression, since vfree() is called while holding a spinlock and BH
    being disabled.

    Fix this by doing the pointers swap in critical section, and freeing
    after spinlock release.

    Also add __GFP_NOWARN to the kmalloc() try, since we fallback to
    vmalloc().

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Dec, 2011

1 commit


13 Dec, 2011

1 commit


02 Dec, 2011

1 commit

  • Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit :

    > (Almost) nobody uses RED because they can't figure it out.
    > According to Wikipedia, VJ says that:
    > "there are not one, but two bugs in classic RED."

    RED is useful for high throughput routers, I doubt many linux machines
    act as such devices.

    I was considering adding Adaptative RED (Sally Floyd, Ramakrishna
    Gummadi, Scott Shender), August 2001

    In this version, maxp is dynamic (from 1% to 50%), and user only have to
    setup min_th (target average queue size)
    (max_th and wq (burst in linux RED) are automatically setup)

    By the way it seems we have a small bug in red_change()

    if (skb_queue_empty(&sch->q))
    red_end_of_idle_period(&q->parms);

    First, if queue is empty, we should call
    red_start_of_idle_period(&q->parms);

    Second, since we dont use anymore sch->q, but q->qdisc, the test is
    meaningless.

    Oh well...

    [PATCH] sch_red: fix red_change()

    Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty,
    we start an idle period, not end it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Dec, 2011

1 commit

  • We need rcu_read_lock() protection before using dst_get_neighbour(), and
    we must cache its value (pass it to __teql_resolve())

    teql_master_xmit() is called under rcu_read_lock_bh() protection, its
    not enough.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Nov, 2011

2 commits


25 Oct, 2011

1 commit

  • Dan Siemon would like to add tunnelling support to cls_flow

    This preliminary patch introduces use of skb_header_pointer() to help
    this task, while avoiding skb head reallocation because of deep packet
    inspection.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Sep, 2011

1 commit

  • Conflicts:
    MAINTAINERS
    drivers/net/Kconfig
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
    drivers/net/ethernet/broadcom/tg3.c
    drivers/net/wireless/iwlwifi/iwl-pci.c
    drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
    drivers/net/wireless/rt2x00/rt2800usb.c
    drivers/net/wireless/wl12xx/main.c

    David S. Miller
     

16 Sep, 2011

1 commit


27 Aug, 2011

1 commit


18 Aug, 2011

1 commit


10 Aug, 2011

1 commit

  • commit 07bd8df5df4369487812bf85a237322ff3569b77
    (sch_sfq: fix peek() implementation) changed sfq to use generic
    peek helper.

    This makes HFSC complain about a non-work-conserving child qdisc, if
    prio with sfq child is used within hfsc:

    hfsc peeks into prio qdisc, which will then peek into sfq.
    returned skb is stashed in sch->gso_skb.

    Next, hfsc tries to dequeue from prio, but prio will call sfq dequeue
    directly, which may return NULL instead of previously peeked-at skb.

    Have prio call qdisc_dequeue_peeked, so sfq->dequeue() is
    not called in this case.

    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

01 Aug, 2011

1 commit

  • commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals)
    forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a
    packet (from another flow) was dropped, leading to various problems.

    With help from Michal Soltys and Michal Pokrywka, who did a bisection.

    Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372
    Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945

    Reported-by: Lucas Bocchi
    Reported-and-bisected-by: Michal Pokrywka
    Signed-off-by: Eric Dumazet
    CC: Michal Soltys
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jul, 2011

1 commit


15 Jul, 2011

2 commits


06 Jul, 2011

1 commit


27 Jun, 2011

1 commit

  • Results on dummy device can be seen in my netconf 2011
    slides. These results are for a 10Gige IXGBE intel
    nic - on another i5 machine, very similar specs to
    the one used in the netconf2011 results.
    It turns out - this is a hell lot worse than dummy
    and so this patch is even more beneficial for 10G.

    Test setup:
    ----------

    System under test sending packets out.
    Additional box connected directly dropping packets.
    Installed prio qdisc on the eth device and default
    netdev default length of 1000 used as is.
    The 3 prio bands each were set to 100 (didnt factor in
    the results).

    5 packet runs were made and the middle 3 picked.

    results
    -------

    The "cpu" column indicates the which cpu the sample
    was taken on,
    The "Pkt runx" carries the number of packets a cpu
    dequeued when forced to be in the "dequeuer" role.
    The "avg" for each run is the number of times each
    cpu should be a "dequeuer" if the system was fair.

    3.0-rc4 (plain)
    cpu Pkt run1 Pkt run2 Pkt run3
    ================================================
    cpu0 21853354 21598183 22199900
    cpu1 431058 473476 393159
    cpu2 481975 477529 458466
    cpu3 23261406 23412299 22894315
    avg 11506948 11490372 11486460

    3.0-rc4 with patch and default weight 64
    cpu Pkt run1 Pkt run2 Pkt run3
    ================================================
    cpu0 13205312 13109359 13132333
    cpu1 10189914 10159127 10122270
    cpu2 10213871 10124367 10168722
    cpu3 13165760 13164767 13096705
    avg 11693714 11639405 11630008

    As you can see the system is still not perfect but
    is a lot better than what it was before...

    At the moment we use the old backlog weight, weight_p
    which is 64 packets. It seems to be reasonably fine
    with that value.
    The system could be made more fair if we reduce the
    weight_p (as per my presentation), but we are going
    to affect the shared backlog weight. Unless deemed
    necessary, I think the default value is fine. If not
    we could add yet another knob.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    jamal
     

22 Jun, 2011

2 commits

  • There are enough instances of this:

    iph->frag_off & htons(IP_MF | IP_OFFSET)

    that a helper function is probably warranted.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Paul Gortmaker
     
  • Remove linux/mm.h inclusion from netdevice.h -- it's unused (I've checked manually).

    To prevent mm.h inclusion via other channels also extract "enum dma_data_direction"
    definition into separate header. This tiny piece is what gluing netdevice.h with mm.h
    via "netdevice.h => dmaengine.h => dma-mapping.h => scatterlist.h => mm.h".
    Removal of mm.h from scatterlist.h was tried and was found not feasible
    on most archs, so the link was cutoff earlier.

    Hope people are OK with tiny include file.

    Note, that mm_types.h is still dragged in, but it is a separate story.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

21 Jun, 2011

1 commit


10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

07 Jun, 2011

2 commits


26 May, 2011

1 commit

  • Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair),
    sfq_peek() can return a different skb that would be normally dequeued by
    sfq_dequeue() [ if current slot->allot is negative ]

    Use generic qdisc_peek_dequeued() instead of custom implementation, to
    get consistent result.

    Signed-off-by: Eric Dumazet
    CC: Jarek Poplawski
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 May, 2011

1 commit

  • While chasing a possible net_sched bug, I found that IP fragments have
    litle chance to pass a congestioned SFQ qdisc :

    - Say SFQ qdisc is full because one flow is non responsive.
    - ip_fragment() wants to send two fragments belonging to an idle flow.
    - sfq_enqueue() queues first packet, but see queue limit reached :
    - sfq_enqueue() drops one packet from 'big consumer', and returns
    NET_XMIT_CN.
    - ip_fragment() cancel remaining fragments.

    This patch restores fairness, making sure we return NET_XMIT_CN only if
    we dropped a packet from the same flow.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2011

1 commit

  • dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set
    to noop_qdisc.

    This call is here to make sure they are no outstanding qdisc-less
    dev_queue_xmit calls before returning to caller.

    But in dismantle phase, we dont have to wait, because we wont activate
    again the device, and we are going to wait one rcu grace period later in
    rollback_registered_many().

    After this patch, device dismantle uses one synchronize_net() and one
    rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL
    latency.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy ,
    CC: Ben Greear
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

20 May, 2011

2 commits


08 May, 2011

3 commits


23 Apr, 2011

1 commit


12 Apr, 2011

1 commit


05 Apr, 2011

1 commit

  • This is an implementation of the Quick Fair Queue scheduler developed
    by Fabio Checconi. The same algorithm is already implemented in ipfw
    in FreeBSD. Fabio had an earlier version developed on Linux, I just
    cleaned it up. Thanks to Eric Dumazet for testing this under load.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     

31 Mar, 2011

1 commit


05 Mar, 2011

1 commit