18 Jun, 2009
2 commits
-
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Action police statistics could be misleading because drops are not
shown when expected.With feedback from: Jamal Hadi Salim
Reported-by: Pawel Staszewski
Signed-off-by: Jarek Poplawski
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
15 Jun, 2009
2 commits
-
Conflicts:
Documentation/feature-removal-schedule.txt
drivers/scsi/fcoe/fcoe.c
net/core/drop_monitor.c
net/core/net-traces.c -
Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
(like in PSCHED_TICKS_PER_SEC already) to avoid misleading.Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
13 Jun, 2009
1 commit
-
Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively.
0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases
where its in direct proximity to one of the other values.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
09 Jun, 2009
2 commits
-
Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
PSCHED_NS2US() macros to enable changing this value later.Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
definitions. This part of the patch is based on feedback from
Patrick McHardy .Reported-by: Antonio Almeida
Tested-by: Antonio Almeida
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller -
I found a bug in cls_cgroup_change() in cls_cgroup.c.
cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly,
but tc in iproute2-2.6.29-1 (which I used) didn't set it.In the current source code of tc in git, it set tca[TCA_OPTIONS].
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
If we always use a newest iproute2 in git when we use cls_cgroup,
we don't face this oops probably.
But I think, kernel shouldn't panic regardless of use program's behaviour.Signed-off-by: Minoru Usui
Signed-off-by: David S. Miller
03 Jun, 2009
3 commits
-
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;Delete skb->dst field
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Conflicts:
drivers/net/forcedeth.c
02 Jun, 2009
1 commit
-
… when we use cls_cgroup
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
27 May, 2009
1 commit
-
Avoid reading the unsynchronized value cs->classid multiple times,
since it could change concurrently from non-zero to zero; this would
result in the classifier returning a positive result with a bogus
(zero) classid.Signed-off-by: Paul Menage
Reviewed-by: Li Zefan
Signed-off-by: David S. Miller
26 May, 2009
1 commit
-
We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.This can be done generically in core network, as suggested by David.
Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 May, 2009
1 commit
-
We can slightly reduce size of teqlN structure, not duplicating stats
structure in teql_master but using stats field from net_device.stats
for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped
values.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 May, 2009
2 commits
-
Conflicts:
drivers/scsi/fcoe/fcoe.c -
It is illegal to dereference a skb after a successful ndo_start_xmit()
call. We must store skb length in a local variable instead.Bug was introduced in 2.6.27 by commit 0abf77e55a2459aa9905be4b226e4729d5b4f0cb
(net_sched: Add accessor function for packet length for qdiscs)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 May, 2009
2 commits
-
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.Further patches could also put tx_bytes/tx_packets counters in
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
We can remove this lock here, since we are in cgroup write handler and
thus the cgrp is guaranteed to be valid, and no lock is needed when
writing a u32 variable.Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
07 May, 2009
1 commit
-
When no limit is given, the bfifo uses a default of tx_queue_len * mtu.
Packets handled by qdiscs include the link layer header, so this should
be taken into account, similar to what other qdiscs do.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
03 May, 2009
1 commit
-
The kernel should only be using the high 16 bits of a kernel
generated priority. Filter priorities in all other cases only
use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto',
but when the kernel generates the priority of a filter is saves all
32 bits which can result in incorrect lookup failures when a filter
needs to be deleted or modified.Signed-off-by: Robert Love
Signed-off-by: David S. Miller
20 Apr, 2009
1 commit
-
Alex Sidorenko reported:
"while experimenting with 'netem' we have found some strange behaviour. It
seemed that ingress delay as measured by 'ping' command shows up on some
hosts but not on others.After some investigation I have found that the problem is that skbuff->tstamp
field value depends on whether there are any packet sniffers enabled. That
is:- if any ptype_all handler is registered, the tstamp field is as expected
- if there are no ptype_all handlers, the tstamp field does not show the delay"This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
on ingress path (with act_mirred) adding a check, so minimal overhead on
the fast path, but only when sniffers etc. are active.Since netem at ingress seems to logically emulate a network before a host,
tstamp is zeroed to trigger the update and pretend delays are from the
outside.Reported-by: Alex Sidorenko
Tested-by: Alex Sidorenko
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
14 Apr, 2009
1 commit
-
When vlan acceleration is used on receive, the vlan tag is maintained
outside of the skb data. The existing vlan tag match only works on TX
path because it uses vlan_get_tag which tests for VLAN_HW_TX_ACCEL.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
22 Mar, 2009
1 commit
-
tcp_sack_swap seems unnecessary so I pushed swap to the caller.
Also removed comment that seemed then pointless, and added include
when not already there. Compile tested.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller
16 Mar, 2009
1 commit
-
While looking for a possible reason of bugzilla report on HTB oops:
http://bugzilla.kernel.org/show_bug.cgi?id=12858
I found the code in htb_delete calling htb_destroy_class on zero
refcount is very misleading: it can suggest this is a common path, and
destroy is called under sch_tree_lock. Actually, this can never happen
like this because before deletion cops->get() is done, and after
delete a class is still used by tclass_notify. The class destroy is
always called from cops->put(), so without sch_tree_lock.This doesn't mean much now (since 2.6.27) because all vulnerable calls
were moved from htb_destroy_class to htb_delete, but there was a bug
in older kernels. The same change is done for other classful scheds,
which, it seems, didn't have similar locking problems here.Reported-by: m0sia
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
05 Mar, 2009
2 commits
-
Conflicts:
drivers/net/tokenring/tmspci.c
drivers/net/ucc_geth_mii.c -
A commit c1b56878fb68e9c14070939ea4537ad4db79ffae "tc: policing requires
a rate estimator" introduced a test which invalidates previously working
configs, based on examples from iproute2: doc/actions/actions-general.
This is too rigorous: a rate estimator is needed only when police's
"avrate" option is used.Reported-by: Joao Correia
Diagnosed-by: John Dykstra
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
02 Mar, 2009
1 commit
-
Conflicts:
drivers/net/wireless/iwlwifi/iwl-tx.c
net/8021q/vlan_core.c
net/core/dev.c
27 Feb, 2009
1 commit
-
drr_change_class lacks a check for NULL of tca[TCA_OPTIONS], so oops
is possible.Reported-by: Denys Fedoryschenko
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
10 Feb, 2009
1 commit
-
Current "RTNETLINK answers: Invalid argument" warning, while trying to
add multiq qdisc to non-multiqueue device, isn't very helpful and some
of these devs can be changed btw., so let's use a better errno.With feedback from Stephen Hemminger
Reported-by: Badalian Vyacheslav
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
01 Feb, 2009
3 commits
-
Patrick McHardy suggested using a workqueue instead
of hrtimers to trigger netif_schedule() when there is a problem with
setting exact time of this event: 'The differnce - yeah, it shouldn't
make much, mainly wake up the qdisc earlier (but not too early) after
"too many events" occured _and_ no further enqueue events wake up the
qdisc anyways.'Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller -
Let's get some info on possible config problems. This patch brings
back an old warning, but is printed only once now.With feedback from Patrick McHardy
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller -
Patrick McHardy suggested:
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.This patch uses qdisc->flags field of "suspected" child qdisc.
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
13 Jan, 2009
2 commits
-
Currently htb_do_events() breaks events recounting for a level after 2
jiffies, but there is no reason to repeat this for next levels and
increase delays even more (with softirqs disabled). htb_dequeue_tree()
can add to this too, btw. In such a case q->now time is invalid anyway.Thanks to Patrick McHardy for spotting an error around earlier version
of this patch.Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller -
Next event time should consider jiffies used for recounting. Otherwise
qdisc_watchdog_schedule() triggers hrtimer immediately with the event
in the past, and may cause very high ksoftirqd cpu usage (if highres
is on).There is also removed checking "event" for zero in htb_dequeue(): it's
always true in this place.Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
09 Jan, 2009
2 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (84 commits)
wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev
net: convert pegasus driver to net_device_ops
bnx2x: Prevent eeprom set when driver is down
net: switch kaweth driver to netdevops
pcnet32: round off carrier watch timer
i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM
wimax: testing for rfkill support should also test for CONFIG_RFKILL_MODULE
wimax: fix kconfig interactions with rfkill and input layers
wimax: fix '#ifndef CONFIG_BUG' layout to avoid warning
r6040: bump release number to 0.20
r6040: warn about MAC address being unset
r6040: check PHY status when bringing interface up
r6040: make printks consistent with DRV_NAME
gianfar: Fixup use of BUS_ID_SIZE
mlx4_en: Returning real Max in get_ringparam
mlx4_en: Consider inline packets on completion
netdev: bfin_mac: enable bfin_mac net dev driver for BF51x
qeth: convert to net_device_ops
vlan: add neigh_setup
dm9601: warn on invalid mac address
... -
Cc: Ingo Molnar
Cc: Thomas Gleixner
Acked-by: Theodore Ts'o
Acked-by: Mark Fasheh
Acked-by: David S. Miller
Cc: James Morris
Acked-by: Casey Schaufler
Acked-by: Takashi Iwai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jan, 2009
1 commit
-
Convert this driver to net_device_ops.
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
06 Jan, 2009
2 commits
-
New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
This isn't enough without rmb() on the read side, but on the other
hand adding such barriers doesn't give any savings, so the lock is
added instead.Reported-by: m0sia
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller -
This reverts commit 22604c866889c4b2e12b73cbf1683bda1b72a313.
We can't fix this issue in this way, because we now can try
to take the dev_base_lock rwlock as a writer in software interrupt
context and that is not allowed without major surgery elsewhere.This initial link state problem needs to be solved in some other
way.Signed-off-by: David S. Miller
05 Jan, 2009
1 commit
-
From: Michael Marineau
Commit b47300168e770b60ab96c8924854c3b0eb4260eb "Do not fire linkwatch
events until the device is registered." was made as a workaround for
drivers that call netif_carrier_off before registering the device.
Unfortunately this causes these drivers to incorrectly report their
link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
flag when the interface is first brought up. This issues was
previously pointed out[1] but was dismissed saying that IFF_RUNNING is
not related to the link status. From my digging IFF_RUNNING, as
reported to userspace, is based on the link state. It is set based on
__LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
not reported to user space so it may well be independent of the link,
I don't know if and when it may get set.)The end result depends slightly depending on the driver. The the two I
tested were e1000e and b44. With e1000e if the system is booted
without a network cable attached the interface will falsely report
RUNNING when it is brought up causing NetworkManager to attempt to
start it and eventually time out. With b44 when the system is booted
with a network cable attached and brought up with dhcpcd it will time
out the first time.The attached patch that will still set the operstate variable
correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
but then return rather than skipping the linkwatch_fire_event call
entirely as the previous fix did. (sorry it isn't inline, I don't have
a patch friendly email client at the moment)Signed-off-by: David S. Miller