18 Jun, 2009

2 commits

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Action police statistics could be misleading because drops are not
    shown when expected.

    With feedback from: Jamal Hadi Salim

    Reported-by: Pawel Staszewski
    Signed-off-by: Jarek Poplawski
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

15 Jun, 2009

2 commits


13 Jun, 2009

1 commit


09 Jun, 2009

2 commits

  • Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
    PSCHED_NS2US() macros to enable changing this value later.

    Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
    definitions. This part of the patch is based on feedback from
    Patrick McHardy .

    Reported-by: Antonio Almeida
    Tested-by: Antonio Almeida
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • I found a bug in cls_cgroup_change() in cls_cgroup.c.
    cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly,
    but tc in iproute2-2.6.29-1 (which I used) didn't set it.

    In the current source code of tc in git, it set tca[TCA_OPTIONS].

    git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

    If we always use a newest iproute2 in git when we use cls_cgroup,
    we don't face this oops probably.
    But I think, kernel shouldn't panic regardless of use program's behaviour.

    Signed-off-by: Minoru Usui
    Signed-off-by: David S. Miller

    Minoru Usui
     

03 Jun, 2009

3 commits


02 Jun, 2009

1 commit

  • … when we use cls_cgroup

    This patch fixes a bug which unconfigured struct tcf_proto keeps
    chaining in tc_ctl_tfilter(), and avoids kernel panic in
    cls_cgroup_classify() when we use cls_cgroup.

    When we execute 'tc filter add', tcf_proto is allocated, initialized
    by classifier's init(), and chained. After it's chained,
    tc_ctl_tfilter() calls classifier's change(). When classifier's
    change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.

    In addition, cls_cgroup is initialized in change() not in init(). It
    accesses unconfigured struct tcf_proto which is chained before
    change(), then hits Oops.

    Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
    Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
    Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
    Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Minoru Usui
     

27 May, 2009

1 commit

  • Avoid reading the unsynchronized value cs->classid multiple times,
    since it could change concurrently from non-zero to zero; this would
    result in the classifier returning a positive result with a bogus
    (zero) classid.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Signed-off-by: David S. Miller

    Paul Menage
     

26 May, 2009

1 commit

  • We would like to get rid of netdev->trans_start = jiffies; that about all net
    drivers have to use in their start_xmit() function, and use txq->trans_start
    instead.

    This can be done generically in core network, as suggested by David.

    Some devices, (particularly loopback) dont need trans_start update, because
    they dont have transmit watchdog. We could add a new device flag, or rely
    on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
    different than -1. Use a helper function to hide our choice.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 May, 2009

1 commit

  • We can slightly reduce size of teqlN structure, not duplicating stats
    structure in teql_master but using stats field from net_device.stats
    for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped
    values.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 May, 2009

2 commits


18 May, 2009

2 commits

  • struct net_device trans_start field is a hot spot on SMP and high performance
    devices, particularly multi queues ones, because every transmitter dirties
    it. Is main use is tx watchdog and bonding alive checks.

    But as most devices dont use NETIF_F_LLTX, we have to lock
    a netdev_queue before calling their ndo_start_xmit(). So it makes
    sense to move trans_start from net_device to netdev_queue. Its update
    will occur on a already present (and in exclusive state) cache line, for
    free.

    We can do this transition smoothly. An old driver continue to
    update dev->trans_start, while an updated one updates txq->trans_start.

    Further patches could also put tx_bytes/tx_packets counters in
    netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We can remove this lock here, since we are in cgroup write handler and
    thus the cgrp is guaranteed to be valid, and no lock is needed when
    writing a u32 variable.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Li Zefan
     

07 May, 2009

1 commit

  • When no limit is given, the bfifo uses a default of tx_queue_len * mtu.
    Packets handled by qdiscs include the link layer header, so this should
    be taken into account, similar to what other qdiscs do.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

03 May, 2009

1 commit

  • The kernel should only be using the high 16 bits of a kernel
    generated priority. Filter priorities in all other cases only
    use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto',
    but when the kernel generates the priority of a filter is saves all
    32 bits which can result in incorrect lookup failures when a filter
    needs to be deleted or modified.

    Signed-off-by: Robert Love
    Signed-off-by: David S. Miller

    Robert Love
     

20 Apr, 2009

1 commit

  • Alex Sidorenko reported:

    "while experimenting with 'netem' we have found some strange behaviour. It
    seemed that ingress delay as measured by 'ping' command shows up on some
    hosts but not on others.

    After some investigation I have found that the problem is that skbuff->tstamp
    field value depends on whether there are any packet sniffers enabled. That
    is:

    - if any ptype_all handler is registered, the tstamp field is as expected
    - if there are no ptype_all handlers, the tstamp field does not show the delay"

    This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
    on ingress path (with act_mirred) adding a check, so minimal overhead on
    the fast path, but only when sniffers etc. are active.

    Since netem at ingress seems to logically emulate a network before a host,
    tstamp is zeroed to trigger the update and pretend delays are from the
    outside.

    Reported-by: Alex Sidorenko
    Tested-by: Alex Sidorenko
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

14 Apr, 2009

1 commit


22 Mar, 2009

1 commit

  • tcp_sack_swap seems unnecessary so I pushed swap to the caller.
    Also removed comment that seemed then pointless, and added include
    when not already there. Compile tested.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

16 Mar, 2009

1 commit

  • While looking for a possible reason of bugzilla report on HTB oops:
    http://bugzilla.kernel.org/show_bug.cgi?id=12858
    I found the code in htb_delete calling htb_destroy_class on zero
    refcount is very misleading: it can suggest this is a common path, and
    destroy is called under sch_tree_lock. Actually, this can never happen
    like this because before deletion cops->get() is done, and after
    delete a class is still used by tclass_notify. The class destroy is
    always called from cops->put(), so without sch_tree_lock.

    This doesn't mean much now (since 2.6.27) because all vulnerable calls
    were moved from htb_destroy_class to htb_delete, but there was a bug
    in older kernels. The same change is done for other classful scheds,
    which, it seems, didn't have similar locking problems here.

    Reported-by: m0sia
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

05 Mar, 2009

2 commits


02 Mar, 2009

1 commit


27 Feb, 2009

1 commit


10 Feb, 2009

1 commit


01 Feb, 2009

3 commits


13 Jan, 2009

2 commits

  • Currently htb_do_events() breaks events recounting for a level after 2
    jiffies, but there is no reason to repeat this for next levels and
    increase delays even more (with softirqs disabled). htb_dequeue_tree()
    can add to this too, btw. In such a case q->now time is invalid anyway.

    Thanks to Patrick McHardy for spotting an error around earlier version
    of this patch.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Next event time should consider jiffies used for recounting. Otherwise
    qdisc_watchdog_schedule() triggers hrtimer immediately with the event
    in the past, and may cause very high ksoftirqd cpu usage (if highres
    is on).

    There is also removed checking "event" for zero in htb_dequeue(): it's
    always true in this place.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

09 Jan, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (84 commits)
    wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev
    net: convert pegasus driver to net_device_ops
    bnx2x: Prevent eeprom set when driver is down
    net: switch kaweth driver to netdevops
    pcnet32: round off carrier watch timer
    i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM
    wimax: testing for rfkill support should also test for CONFIG_RFKILL_MODULE
    wimax: fix kconfig interactions with rfkill and input layers
    wimax: fix '#ifndef CONFIG_BUG' layout to avoid warning
    r6040: bump release number to 0.20
    r6040: warn about MAC address being unset
    r6040: check PHY status when bringing interface up
    r6040: make printks consistent with DRV_NAME
    gianfar: Fixup use of BUS_ID_SIZE
    mlx4_en: Returning real Max in get_ringparam
    mlx4_en: Consider inline packets on completion
    netdev: bfin_mac: enable bfin_mac net dev driver for BF51x
    qeth: convert to net_device_ops
    vlan: add neigh_setup
    dm9601: warn on invalid mac address
    ...

    Linus Torvalds
     
  • Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: Theodore Ts'o
    Acked-by: Mark Fasheh
    Acked-by: David S. Miller
    Cc: James Morris
    Acked-by: Casey Schaufler
    Acked-by: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fernando Carrijo
     

07 Jan, 2009

1 commit


06 Jan, 2009

2 commits

  • New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
    so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
    This isn't enough without rmb() on the read side, but on the other
    hand adding such barriers doesn't give any savings, so the lock is
    added instead.

    Reported-by: m0sia
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • This reverts commit 22604c866889c4b2e12b73cbf1683bda1b72a313.

    We can't fix this issue in this way, because we now can try
    to take the dev_base_lock rwlock as a writer in software interrupt
    context and that is not allowed without major surgery elsewhere.

    This initial link state problem needs to be solved in some other
    way.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Jan, 2009

1 commit

  • From: Michael Marineau

    Commit b47300168e770b60ab96c8924854c3b0eb4260eb "Do not fire linkwatch
    events until the device is registered." was made as a workaround for
    drivers that call netif_carrier_off before registering the device.
    Unfortunately this causes these drivers to incorrectly report their
    link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
    flag when the interface is first brought up. This issues was
    previously pointed out[1] but was dismissed saying that IFF_RUNNING is
    not related to the link status. From my digging IFF_RUNNING, as
    reported to userspace, is based on the link state. It is set based on
    __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
    and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
    not reported to user space so it may well be independent of the link,
    I don't know if and when it may get set.)

    The end result depends slightly depending on the driver. The the two I
    tested were e1000e and b44. With e1000e if the system is booted
    without a network cable attached the interface will falsely report
    RUNNING when it is brought up causing NetworkManager to attempt to
    start it and eventually time out. With b44 when the system is booted
    with a network cable attached and brought up with dhcpcd it will time
    out the first time.

    The attached patch that will still set the operstate variable
    correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
    but then return rather than skipping the linkwatch_fire_event call
    entirely as the previous fix did. (sorry it isn't inline, I don't have
    a patch friendly email client at the moment)

    Signed-off-by: David S. Miller

    Michael Marineau