21 Feb, 2011

1 commit

  • From: Eric W. Biederman

    In the beginning with batching unreg_list was a list that was used only
    once in the lifetime of a network device (I think). Now we have calls
    using the unreg_list that can happen multiple times in the life of a
    network device like dev_deactivate and dev_close that are also using the
    unreg_list. In addition in unregister_netdevice_queue we also do a
    list_move because for devices like veth pairs it is possible that
    unregister_netdevice_queue will be called multiple times.

    So I think the change below to fix dev_deactivate which Eric D. missed
    will fix this problem. Now to go test that.

    Signed-off-by: David S. Miller

    Eric W. Biederman
     

25 Jan, 2011

1 commit


21 Jan, 2011

1 commit

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Jan, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
    GRETH: resolve SMP issues and other problems
    GRETH: handle frame error interrupts
    GRETH: avoid writing bad speed/duplex when setting transfer mode
    GRETH: fixed skb buffer memory leak on frame errors
    GRETH: GBit transmit descriptor handling optimization
    GRETH: fix opening/closing
    GRETH: added raw AMBA vendor/device number to match against.
    cassini: Fix build bustage on x86.
    e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
    e1000e: update Copyright for 2011
    e1000: Avoid unhandled IRQ
    r8169: keep firmware in memory.
    netdev: tilepro: Use is_unicast_ether_addr helper
    etherdevice.h: Add is_unicast_ether_addr function
    ks8695net: Use default implementation of ethtool_ops::get_link
    ks8695net: Disable non-working ethtool operations
    USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
    vxge: Remember to release firmware after upgrading firmware
    netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
    ipsec: update MAX_AH_AUTH_LEN to support sha512
    ...

    Linus Torvalds
     

14 Jan, 2011

2 commits

  • After recent changes, (percpu stats on vlan/tunnels...), we dont need
    anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

    Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

    1) ixgbe can be converted to use existing tx_ring counters.

    2) macvlan incremented txq->tx_dropped, it can use the
    dev->stats.tx_dropped counter.

    3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
    Now we have ndo_get_stats64(), use it, even for "unsigned long"
    fields (No need to bring back a struct net_device_stats)

    4) gianfar adds a stats structure per tx queue to hold
    tx_bytes/tx_packets

    This removes a lockdep warning (and possible lockup) in rndis gadget,
    calling dev_get_stats() from hard IRQ context.

    Ref: http://www.spinics.net/lists/netdev/msg149202.html

    Reported-by: Neil Jones
    Signed-off-by: Eric Dumazet
    CC: Jarek Poplawski
    CC: Alexander Duyck
    CC: Jeff Kirsher
    CC: Sandeep Gopalpet
    CC: Michal Nazarewicz
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Jan, 2011

1 commit

  • commit 57dbb2d83d100ea (sched: add head drop fifo queue)
    introduced pfifo_head_drop, and broke the invariant that
    sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing
    counters only)

    This can break estimators because est_timer() handles unsigned deltas
    only. A decreasing counter can then give a huge unsigned delta.

    My mid term suggestion would be to change things so that
    sch->bstats.bytes and sch->bstats.packets are incremented in dequeue()
    only, not at enqueue() time. We also could add drop_bytes/drop_packets
    and provide estimations of drop rates.

    It would be more sensible anyway for very low speeds, and big bursts.
    Right now, if we drop packets, they still are accounted in byte/packets
    abolute counters and rate estimators.

    Before this mid term change, this patch makes pfifo_head_drop behavior
    similar to other qdiscs in case of drops :
    Dont decrement sch->bstats.bytes and sch->bstats.packets

    Signed-off-by: Eric Dumazet
    Acked-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Jan, 2011

1 commit

  • Provide child qdisc backlog (byte count) information so that "tc -s
    qdisc" can report it to user.

    packet count is already correctly provided.

    qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn
    Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0)
    rate 242385Kbit 13630pps backlog 13560b 8p requeues 0
    marked 7865 early 1 pdrop 7 other 0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Jan, 2011

2 commits

  • slot_dequeue_head() should make sure slot skb chain is correct in both
    ways, or we can crash if all possible flows are in use.

    Jarek pointed out slot_queue_init() can now be done in sfq_init() once,
    instead each time a flow is setup.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SFQ is currently 'limited' to small packets, because it uses a 15bit
    allotment number per flow. Introduce a scale by 8, so that we can handle
    full size TSO/GRO packets.

    Use appropriate handling to make sure allot is positive before a new
    packet is dequeued, so that fairness is respected.

    Signed-off-by: Eric Dumazet
    Acked-by: Jarek Poplawski
    Cc: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Dec, 2010

2 commits

  • sfq_walk() runs without qdisc lock. By the time it selects a non empty
    hash slot and sfq_dump_class_stats() is run (with lock held), slot might
    have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of
    bounds, and crash in slot_queue_walk()

    On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and
    allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal
    memory access happens, only possibly wrong data reported to user.

    Also, slot_dequeue_tail() should make sure slot skb chain is correctly
    terminated, or sfq_dump_class_stats() can access freed skbs.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Conflicts:
    MAINTAINERS
    arch/arm/mach-omap2/pm24xx.c
    drivers/scsi/bfa/bfa_fcpim.c

    Needed to update to apply fixes for which the old branch was too
    outdated.

    Jiri Kosina
     

21 Dec, 2010

4 commits

  • Here is a respin of patch.

    I'll send a short patch to make SFQ more fair in presence of large
    packets as well.

    Thanks

    [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts

    This patch shrinks sizeof(struct sfq_sched_data)
    from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
    reduce text size as well.

    text data bss dec hex filename
    4821 152 0 4973 136d old/net/sched/sch_sfq.o
    4627 136 0 4763 129b new/net/sched/sch_sfq.o

    All data for a slot/flow is now grouped in a compact and cache friendly
    structure, instead of being spreaded in many different points.

    struct sfq_slot {
    struct sk_buff *skblist_next;
    struct sk_buff *skblist_prev;
    sfq_index qlen; /* number of skbs in skblist */
    sfq_index next; /* next slot in sfq chain */
    struct sfq_head dep; /* anchor in dep[] chains */
    unsigned short hash; /* hash value (index in ht[]) */
    short allot; /* credit for this slot */
    };

    Signed-off-by: Eric Dumazet
    Cc: Jarek Poplawski
    Cc: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     
  • When deploying SFQ/IFB here at work, I found the allot management was
    pretty wrong in sfq, even changing allot from short to int...

    We should init allot for each new flow, not using a previous value found
    in slot.

    Before patch, I saw bursts of several packets per flow, apparently
    denying the default "quantum 1514" limit I had on my SFQ class.

    class sfq 11:1 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 7p requeues 0
    allot 11546

    class sfq 11:46 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 1p requeues 0
    allot -23873

    class sfq 11:78 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 5p requeues 0
    allot 11393

    After patch, better fairness among each flow, allot limit being
    respected, allot is positive :

    class sfq 11:e parent 11:
    (dropped 0, overlimits 0 requeues 86)
    backlog 0b 3p requeues 86
    allot 596

    class sfq 11:94 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 3p requeues 0
    allot 1468

    class sfq 11:a4 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 4p requeues 0
    allot 650

    class sfq 11:bb parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 3p requeues 0
    allot 596

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We currently return for each active SFQ slot the number of packets in
    queue. We can also give number of bytes accounted for these packets.

    tc -s class show dev ifb0

    Before patch :

    class sfq 11:3d9 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 0b 3p requeues 0
    allot 1266

    After patch :

    class sfq 11:3e4 parent 11:
    (dropped 0, overlimits 0 requeues 0)
    backlog 4380b 3p requeues 0
    allot 1212

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Dec, 2010

1 commit

  • Add dev_close_many and dev_deactivate_many to factorize another
    sync-rcu operation on the netdevice unregister path.

    $ modprobe dummy numdummies=10000
    $ ip link set dev dummy* up
    $ time rmmod dummy

    Without the patch With the patch

    real 0m 24.63s real 0m 5.15s
    user 0m 0.00s user 0m 0.00s
    sys 0m 6.05s sys 0m 5.14s

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

02 Dec, 2010

1 commit

  • Allocate qdisc memory according to NUMA properties of cpus included in
    xps map.

    To be effective, qdisc should be (re)setup after changes
    of /sys/class/net/eth/queues/tx-/xps_cpus

    I added a numa_node field in struct netdev_queue, containing NUMA node
    if all cpus included in xps_cpus share same node, else -1.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Nov, 2010

1 commit


16 Nov, 2010

1 commit

  • Some of the documentation refers to web pages under
    the domain `osdl.org'. However, `osdl.org' now
    redirects to `linuxfoundation.org'.

    Rather than rely on redirections, this patch updates
    the addresses appropriately; for the most part, only
    documentation that is meant to be current has been
    updated.

    The patch should be pretty quick to scan and check;
    each new web-page url was gotten by trying out the
    original URL in a browser and then simply copying the
    the redirected URL (formatting as necessary).

    There is some conflict as to which one of these domain
    names is preferred:

    linuxfoundation.org
    linux-foundation.org

    So, I wrote:

    info@linuxfoundation.org

    and got this reply:

    Message-ID:
    Date: Mon, 15 Nov 2010 10:41:42 -0800
    From: David Ames

    ...

    linuxfoundation.org is preferred. The canonical name for our web site is
    www.linuxfoundation.org. Our list site is actually
    lists.linux-foundation.org.

    Regarding email linuxfoundation.org is preferred there are a few people
    who choose to use linux-foundation.org for their own reasons.

    Consequently, I used `linuxfoundation.org' for web pages and
    `lists.linux-foundation.org' for mailing-list web pages and email addresses;
    the only personal email address I updated from `@osdl.org' was that of
    Andrew Morton, who prefers `linux-foundation.org' according `git log'.

    Signed-off-by: Michael Witten
    Signed-off-by: Jiri Kosina

    Michael Witten
     

09 Nov, 2010

1 commit


04 Nov, 2010

1 commit

  • Somewhere along the lines net_cls_subsys_id became a macro when
    cls_cgroup is built as a module. Not only did it make cls_cgroup
    completely useless, it also causes it to crash on module unload.

    This patch fixes this by removing that macro.

    Thanks to Eric Dumazet for diagnosing this problem.

    Reported-by: Randy Dunlap
    Signed-off-by: Herbert Xu
    Reviewed-by: Li Zefan
    Signed-off-by: David S. Miller

    Herbert Xu
     

01 Nov, 2010

1 commit


24 Oct, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
    bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
    vlan: Calling vlan_hwaccel_do_receive() is always valid.
    tproxy: use the interface primary IP address as a default value for --on-ip
    tproxy: added IPv6 support to the socket match
    cxgb3: function namespace cleanup
    tproxy: added IPv6 support to the TPROXY target
    tproxy: added IPv6 socket lookup function to nf_tproxy_core
    be2net: Changes to use only priority codes allowed by f/w
    tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
    tproxy: added tproxy sockopt interface in the IPV6 layer
    tproxy: added udp6_lib_lookup function
    tproxy: added const specifiers to udp lookup functions
    tproxy: split off ipv6 defragmentation to a separate module
    l2tp: small cleanup
    nf_nat: restrict ICMP translation for embedded header
    can: mcp251x: fix generation of error frames
    can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
    can-raw: add msg_flags to distinguish local traffic
    9p: client code cleanup
    rds: make local functions/variables static
    ...

    Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
    drivers/net/wireless/ath/ath9k/debug.c as per David

    Linus Torvalds
     

21 Oct, 2010

3 commits

  • David S. Miller
     
  • The first parameter dev isn't in use in qdisc_create_dflt().

    Signed-off-by: Changli Gao
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Changli Gao
     
  • Under network load, doing :

    tc qdisc del dev eth0 root

    triggers :

    [ 167.193087] BUG: spinlock bad magic on CPU#3, udpflood/4928
    [ 167.193139] lock: c15bc324, .magic: 00000000, .owner:
    /-1, .owner_cpu: -1
    [ 167.193193] Pid: 4928, comm: udpflood Not tainted
    2.6.36-rc7-11417-g215340c-dirty #323
    [ 167.193245] Call Trace:
    [ 167.193292] [] ? printk+0x18/0x20
    [ 167.193342] [] spin_bug+0xa3/0xf0
    [ 167.193389] [] do_raw_spin_lock+0x7d/0x160
    [ 167.193440] [] ? __dev_xmit_skb+0x27e/0x2b0
    [ 167.193496] [] ? trace_hardirqs_on+0xb/0x10
    [ 167.193545] [] _raw_spin_lock+0x3a/0x40
    [ 167.193593] [] ? __dev_xmit_skb+0x27e/0x2b0
    [ 167.193641] [] __dev_xmit_skb+0x27e/0x2b0

    commit 79640a4ca695 (add additional lock to qdisc to increase
    throughput) forgot to initialize noop_qdisc and noqueue_qdisc busylock

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Oct, 2010

1 commit

  • Peter Zijlstra found a bug in the way softirq time is accounted in
    VIRT_CPU_ACCOUNTING on this thread:

    http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

    The problem is, softirq processing uses local_bh_disable internally. There
    is no way, later in the flow, to differentiate between whether softirq is
    being processed or is it just that bh has been disabled. So, a hardirq when bh
    is disabled results in time being wrongly accounted as softirq.

    Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
    as well. As account_system_time() in normal tick based accouting also uses
    softirq_count, which will be set even when not in softirq with bh disabled.

    Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
    for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
    processing. The patch below does that and adds API in_serving_softirq() which
    returns whether we are currently processing softirq or not.

    Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
    to in_serving_softirq.

    Looks like many usages of in_softirq really want in_serving_softirq. Those
    changes can be made individually on a case by case basis.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     

14 Oct, 2010

2 commits


12 Oct, 2010

1 commit

  • Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
    dirtying neighbour in stress situation (many different flows / dsts)

    Dirtying takes place because of read_lock(&n->lock) and n->used writes.

    Switching to a seqlock, and writing n->used only on jiffies changes
    permits less dirtying.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Oct, 2010

1 commit


07 Oct, 2010

1 commit


05 Oct, 2010

2 commits

  • skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also
    unsigned and can't be less than zero. This test was added in 66d50d25:
    "u32: negative offset fix" It was supposed to fix a regression.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • ingress being not used very much, and net_device->ingress_queue being
    quite a big object (128 or 256 bytes), use a dynamic allocation if
    needed (tc qdisc add dev eth0 ingress ...)

    dev_ingress_queue(dev) helper should be used only with RTNL taken.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Sep, 2010

1 commit


27 Sep, 2010

1 commit


13 Sep, 2010

1 commit


10 Sep, 2010

1 commit