26 May, 2011

1 commit

  • Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair),
    sfq_peek() can return a different skb that would be normally dequeued by
    sfq_dequeue() [ if current slot->allot is negative ]

    Use generic qdisc_peek_dequeued() instead of custom implementation, to
    get consistent result.

    Signed-off-by: Eric Dumazet
    CC: Jarek Poplawski
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 May, 2011

1 commit

  • While chasing a possible net_sched bug, I found that IP fragments have
    litle chance to pass a congestioned SFQ qdisc :

    - Say SFQ qdisc is full because one flow is non responsive.
    - ip_fragment() wants to send two fragments belonging to an idle flow.
    - sfq_enqueue() queues first packet, but see queue limit reached :
    - sfq_enqueue() drops one packet from 'big consumer', and returns
    NET_XMIT_CN.
    - ip_fragment() cancel remaining fragments.

    This patch restores fairness, making sure we return NET_XMIT_CN only if
    we dropped a packet from the same flow.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2011

1 commit

  • dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set
    to noop_qdisc.

    This call is here to make sure they are no outstanding qdisc-less
    dev_queue_xmit calls before returning to caller.

    But in dismantle phase, we dont have to wait, because we wont activate
    again the device, and we are going to wait one rcu grace period later in
    rollback_registered_many().

    After this patch, device dismantle uses one synchronize_net() and one
    rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL
    latency.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy ,
    CC: Ben Greear
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

20 May, 2011

2 commits


08 May, 2011

3 commits


23 Apr, 2011

1 commit


12 Apr, 2011

1 commit


05 Apr, 2011

1 commit

  • This is an implementation of the Quick Fair Queue scheduler developed
    by Fabio Checconi. The same algorithm is already implemented in ipfw
    in FreeBSD. Fabio had an earlier version developed on Linux, I just
    cleaned it up. Thanks to Eric Dumazet for testing this under load.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     

31 Mar, 2011

1 commit


05 Mar, 2011

1 commit


04 Mar, 2011

2 commits


26 Feb, 2011

1 commit


25 Feb, 2011

8 commits


24 Feb, 2011

3 commits

  • gfp_t needs to be cast to integer.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • * make qdisc_ops local
    * add sparse annotation about expected unlock/unlock in dump_class_stats
    * fix indentation

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • This is the Stochastic Fair Blue scheduler, based on work from :

    W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
    Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

    http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

    This implementation is based on work done by Juliusz Chroboczek

    General SFB algorithm can be found in figure 14, page 15:

    B[l][n] : L x N array of bins (L levels, N bins per level)
    enqueue()
    Calculate hash function values h{0}, h{1}, .. h{L-1}
    Update bins at each level
    for i = 0 to L - 1
    if (B[i][h{i}].qlen > bin_size)
    B[i][h{i}].p_mark += p_increment;
    else if (B[i][h{i}].qlen == 0)
    B[i][h{i}].p_mark -= p_decrement;
    p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
    if (p_min == 1.0)
    ratelimit();
    else
    mark/drop with probabilty p_min;

    I did the adaptation of Juliusz code to meet current kernel standards,
    and various changes to address previous comments :

    http://thread.gmane.org/gmane.linux.network/90225
    http://thread.gmane.org/gmane.linux.network/90375

    Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
    we can use an external flow classifier if wanted.

    tc qdisc add dev $DEV parent 1:11 handle 11: \
    est 0.5sec 2sec sfb limit 128

    tc filter add dev $DEV protocol ip parent 11: handle 3 \
    flow hash keys dst divisor 1024

    Notes:

    1) SFB default child qdisc is pfifo_fast. It can be changed by another
    qdisc but a child qdisc MUST not drop a packet previously queued. This
    is because SFB needs to handle a dequeued packet in order to maintain
    its virtual queue states. pfifo_head_drop or CHOKe should not be used.

    2) ECN is enabled by default, unlike RED/CHOKe/GRED

    With help from Patrick McHardy & Andi Kleen

    Signed-off-by: Eric Dumazet
    CC: Juliusz Chroboczek
    CC: Stephen Hemminger
    CC: Patrick McHardy
    CC: Andi Kleen
    CC: John W. Linville
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Feb, 2011

1 commit


21 Feb, 2011

1 commit

  • From: Eric W. Biederman

    In the beginning with batching unreg_list was a list that was used only
    once in the lifetime of a network device (I think). Now we have calls
    using the unreg_list that can happen multiple times in the life of a
    network device like dev_deactivate and dev_close that are also using the
    unreg_list. In addition in unregister_netdevice_queue we also do a
    list_move because for devices like veth pairs it is possible that
    unregister_netdevice_queue will be called multiple times.

    So I think the change below to fix dev_deactivate which Eric D. missed
    will fix this problem. Now to go test that.

    Signed-off-by: David S. Miller

    Eric W. Biederman
     

15 Feb, 2011

1 commit


03 Feb, 2011

3 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
    packet scheduler based on the Random Exponential Drop (RED) algorithm.

    The core idea is:
    For every packet arrival:
    Calculate Qave
    if (Qave < minth)
    Queue the new packet
    else
    Select randomly a packet from the queue
    if (both packets from same flow)
    then Drop both the packets
    else if (Qave > maxth)
    Drop packet
    else
    Admit packet with proability p (same as RED)

    See also:
    Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
    queue management scheme for approximating fair bandwidth allocation",
    Proceeding of INFOCOM'2000, March 2000.

    Help from:
    Eric Dumazet
    Patrick McHardy

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • The change to allow divisor to be a parameter (in 2.6.38-rc1)
    commit 817fb15dfd988d8dda916ee04fa506f0c466b9d6
    introduced a possible deadlock caught by sparse.

    The scheduler tree lock was left locked in the case of an incorrect
    divisor value. Simplest fix is to move test outside of lock
    which also solves problem of partial update.

    Signed-off-by: Stephen Hemminger
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     

27 Jan, 2011

1 commit


25 Jan, 2011

2 commits


22 Jan, 2011

1 commit

  • Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
    __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
    than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq

    SFQ is special because it can have external classifiers, and in these
    cases, we cannot bypass queue discipline (packet could be dropped by
    classifier) without admin asking it, or further changes.

    Its worth doing this, especially for SFQ, avoiding dirtying memory in
    case no packets are already waiting in queue.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Jan, 2011

2 commits

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch converts stab qdisc management to RCU, so that we can perform
    the qdisc_calculate_pkt_len() call before getting qdisc lock.

    This shortens the lock's held time in __dev_xmit_skb().

    This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
    cache misses and so reducing latencies.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet