12 Jul, 2007
3 commits
-
Drivers need to validate the initial addresses in their netlink attribute
validation function or manually reject them if they can't support this.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
All drivers need to unregister their devices in the module unload function.
While doing so they must hold the rtnl and atomically unregister the
rtnl_link ops as well. This makes the rtnl_link_unregister function that
takes the rtnl itself completely useless.Provide default newlink/dellink functions, make __rtnl_link_unregister and
rtnl_link_unregister unregister all devices with matching rtnl_link_ops and
change the existing users to take advantage of that.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Keep netpoll/poll_napi from messing with the poll_list.
Only net_rx_action is allowed to manipulate the list.Signed-off-by: Olaf Kirch
Signed-off-by: David S. Miller
11 Jul, 2007
21 commits
-
As noticed by Jarek Poplawski , the timer removal in
gen_kill_estimator races with the timer function rearming the timer.Check whether the timer list is empty before rearming the timer
in the timer function to fix this.Signed-off-by: Patrick McHardy
Acked-by: Jarek Poplawski
Signed-off-by: David S. Miller -
93ec2c723e3f8a216dde2899aeb85c648672bc6b applied excessive duct tape to
the netpoll beast's netpoll_cleanup(), thus substituting one leak with
another, and opening up a little buglet :-)net_device->npinfo (netpoll_info) is a shared and refcounted object and
cannot simply be set NULL the first time netpoll_cleanup() is called.
Otherwise, further netpoll_cleanup()'s see np->dev->npinfo == NULL and
become no-ops, thus leaking. And it's a bug too: the first call to
netpoll_cleanup() would thus (annoyingly) "disable" other (still alive)
netpolls too. Maybe nobody noticed this because netconsole (only user
of netpoll) never supported multiple netpoll objects earlier.This is a trivial and obvious one-line fixlet.
Signed-off-by: Satyam Sharma
Signed-off-by: David S. Miller -
- save 4 bytes
- it's read-mostly.
Signed-off-by: Andrew Morton
Acked-by: Vasily Averin
Signed-off-by: David S. Miller -
This includes /proc/net/protocols, /proc/net/rxrpc_calls and
/proc/net/rxrpc_connections files.All three need seq_list_start_head to show some header.
Signed-off-by: Pavel Emelianov
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller -
The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.Signed-off-by: Jozsef Kadlecsik
Signed-off-by: Patrick NcHardy
Signed-off-by: David S. Miller -
Added transport mode ESP support for starters. I will send more of
these modes and types once i have resolved the tunnel mode isses.Signed-off-by: Jamal Hadi Salim
Signed-off-by: Robert Olsson
Signed-off-by: David S. Miller -
By default all flows in pktgen are randomly selected.
This patch introduces ability to have all defined flows to
be sent sequentially. Robert defined randomness to be the
default behavior.Signed-off-by: Jamal Hadi Salim
Signed-off-by: Robert Olsson
Signed-off-by: David S. Miller -
Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc
Signed-off-by: Jamal Hadi Salim
Signed-off-by: Robert Olsson
Signed-off-by: David S. Miller -
When a reference to an existing address is increased or decreased without
hitting zero, the address count is incorrectly adjusted.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Add the multiqueue hardware device support API to the core network
stack. Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.Signed-off-by: Peter P Waskiewicz Jr
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
This patch fixes a boolean error in the new TX checksum check
that causes bogus TSO packets to be generated.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.This patch:
* adds NETIF_F_IPV6_CSUM for those devices
* fixes bnx2 and tg3 devices that need it
* add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
* fixes assumptions about NETIF_F_ALL_CSUM in nat
* adjusts bridge union of checksumming computationSigned-off-by: David S. Miller
-
Sent the wrong patch previously.
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.The attribute looks like this:
struct {
[ compat contents ]
struct rtattr {
.rta_len = total size,
.rta_type = type,
} rta;
struct old_structure struct;[ nested top-level attribute ]
struct rtattr {
.rta_len = nest size,
.rta_type = type,
} nest_attr;[ optional 0 .. n nested attributes ]
struct rtattr {
.rta_len = private attribute len,
.rta_type = private attribute typ,
} nested_attr;
struct nested_data data;
};Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:- sendfile eth0, no NAT: sys 0m0.388s
- sendfile eth0, NAT: sys 0m1.835s
- sendfile eth0: NAT + path: sys 0m0.370s (~ -80%)- sendfile lo, no NAT: sys 0m0.258s
- sendfile lo, NAT: sys 0m2.609s
- sendfile lo, NAT + patch: sys 0m0.260s (~ -90%)- sendmsg eth0, no NAT: sys 0m2.508s
- sendmsg eth0, NAT: sys 0m2.539s
- sendmsg eth0, NAT + patch: sys 0m2.445s (no change)- sendmsg lo, no NAT: sys 0m2.151s
- sendmsg lo, NAT: sys 0m3.557s
- sendmsg lo, NAT + patch: sys 0m2.159s (~ -40%)I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Add rtnetlink API for creating, changing and deleting software devices.
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Split up rtnl_setlink into a function performing validation and a function
performing the actual changes. This allows to share the modifcation logic
with rtnl_newlink, which is introduced by the next patch.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
06 Jul, 2007
3 commits
-
>From my recent patch:
> > #1
> > Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
> > required a work function should always (unconditionally) rearm with
> > delay > 0 - otherwise it would endlessly loop. This patch replaces
> > this function with cancel_delayed_work(). Later kernel versions don't
> > require this, so here it's only for uniformity.But Oleg Nesterov found:
> But 2.6.22 doesn't need this change, why it was merged?
>
> In fact, I suspect this change adds a race,
...His description was right (thanks), so this patch reverts #1.
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller -
Every file should include the headers containing the prototypes for
its global functions.Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller -
skb_clone_fraglist is static so it shouldn't be exported.
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
29 Jun, 2007
1 commit
-
#1
Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
required a work function should always (unconditionally) rearm with
delay > 0 - otherwise it would endlessly loop. This patch replaces
this function with cancel_delayed_work(). Later kernel versions don't
require this, so here it's only for uniformity.#2
After deleting a timer in cancel_[rearming_]delayed_work() there could
stay a last skb queued in npinfo->txq causing a memory leak after
kfree(npinfo).Initial patch & testing by: Jason Wessel
Signed-off-by: Jarek Poplawski
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
27 Jun, 2007
1 commit
-
If sky2 device poll routine is called from netpoll_send_skb, it would
deadlock. The netpoll_send_skb held the netif_tx_lock, and the poll
routine could acquire it to clean up skb's. Other drivers might use
same locking model.The driver is correct, netpoll should not introduce more locking
problems than it causes already. So change the code to drop lock
before calling poll handler.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
24 Jun, 2007
3 commits
-
Having walked through the entire skbuff, skb_seq_read would leave the
last fragment mapped. As a consequence, the unwary caller would leak
kmaps, and proceed with preempt_count off by one. The only (kind of
non-intuitive) workaround is to use skb_seq_read_abort.This patch makes sure skb_seq_read always unmaps frag_data after
having cycled through the skb's paged part.Signed-off-by: Olaf Kirch
Signed-off-by: David S. Miller -
This moves the local_irq_enable() call in net_rx_action() to before
calling the CONFIG_NET_DMA's dma_async_memcpy_issue_pending() rather
than after. This shortens the irq disabled window and allows for DMA
drivers that need to do their own irq hold.Signed-off-by: Shannon Nelson
Signed-off-by: David S. Miller -
secmark doesn't depend on CONFIG_NET_SCHED.
Signed-off-by: Patrick McHardy
Acked-by: James Morris
Signed-off-by: David S. Miller
08 Jun, 2007
4 commits
-
When changing the link state from userspace not affecting any other
flags. Two duplicate notification are being sent, once as action
in the NETDEV_UP/NETDEV_DOWN notification chain and a second time
when comparing old and new device flags after the change has been
completed. Although harmless, the duplicates should be avoided.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
ifindex == 0 does not exist and implies we should do a lookup by name if
one was given.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Signed-off-by: Denis Cheng
Signed-off-by: David S. Miller
04 Jun, 2007
1 commit
-
This isn't a bug just yet as only TCP uses sk_setup_caps for GSO.
However, if and when UDP or something else starts using it this is
likely to cause a problem if we forget to add software emulation
for it at the same time.The problem is that right now we translate GSO emulation to the
bitmask NETIF_F_GSO_MASK, which includes every protocol, even
ones that we cannot emulate.This patch makes it provide only the ones that we can emulate.
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
31 May, 2007
2 commits
-
in4_pton converts a textual representation of an ip4 address
into an integer representation. However, when the textual representation
is of in the form ip:port, e.g. 192.168.1.1:5060, and 'delim' is set to
-1, the function bails out with an error when reading the colon.It makes sense to allow the colon as a delimiting character without
explicitly having to set it through the 'delim' variable as there can be
no ambiguity in the point where the ip address is completely parsed. This
function is indeed called from nf_conntrack_sip.c in this way to parse
textual ip:port combinations which fails due to the reason stated above.Signed-off-by: Jerome Borsboom
Signed-off-by: David S. Miller -
Signed-off-by: David S. Miller
25 May, 2007
1 commit
-
The current IPSEC rule resolution behavior we have does not work for a
lot of people, even though technically it's an improvement from the
-EAGAIN buisness we had before.Right now we'll block until the key manager resolves the route. That
works for simple cases, but many folks would rather packets get
silently dropped until the key manager resolves the IPSEC rules.We can't tell these folks to "set the socket non-blocking" because
they don't have control over the non-block setting of things like the
sockets used to resolve DNS deep inside of the resolver libraries in
libc.With that in mind I coded up the patch below with some help from
Herbert Xu which provides packet-drop behavior during larval state
resolution, controllable via sysctl and off by default.This lays the framework to either:
1) Make this default at some point or...
2) Move this logic into xfrm{4,6}_policy.c and implement the
ARP-like resolution queue we've all been dreaming of.
The idea would be to queue packets to the policy, then
once the larval state is resolved by the key manager we
re-resolve the route and push the packets out. The
packets would timeout if the rule didn't get resolved
in a certain amount of time.Signed-off-by: David S. Miller