12 Oct, 2007
10 commits
-
This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.Maybe after(highest_sack_end_seq, tp->high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.Noticed because of report against a related problem from TAKANO
Ryousei . He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:http://marc.info/?l=linux-netdev&m=119149311913288&w=2
...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
This is necessary for upcoming DSACK bugfix. Reduces sacktag
length which is not very sad thing at all... :-)Notice that there's a need to handle out-of-mem at caller's
place.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
It's on the way for future cutting of that function.
Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
From RFC 3493, Section 5.2:
IPV6_MULTICAST_IF
Set the interface to use for outgoing multicast packets. The
argument is the index of the interface to use. If the
interface index is specified as zero, the system selects the
interface (for example, by looking up the address in a routing
table and using the resulting interface).This patch adds support for (index == 0) to reset the value to it's
original state, allowing the system to choose the best interface. IPv4
already behaves this way.Signed-off-by: Brian Haley
Acked-by: David L Stevens
Signed-off-by: David S. Miller -
The patch will add MODULE_ALIAS("ip6t_") where missing,
otherwise you will getip6tables: No chain/target/match by that name
when xt_ is not already loaded.
Signed-off-by: Jan Engelhardt
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:When a connection is >>closed actively<>accept<< a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
[...]The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.Signed-off-by: Jozsef Kadlecsik
Tested-by: Krzysztof Piotr Oledzki
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
11 Oct, 2007
30 commits
-
1) fibnl needs to be declared outside of config ifdefs,
and also should not be explicitly initialized to NULL
2) nl_fib_input() args are wrong for netlink_kernel_create()
input methodSigned-off-by: David S. Miller
-
As discussed before, this patch provides userland with a way to access
relevant options in Router Advertisements, after they are processed
and validated by the kernel. Extra options are processed in a generic
way; this patch only exports RDNSS options described in RFC5006, but
support to control which options are exported could be easily added.A new rtnetlink message type is defined, to transport Neighbor
Discovery options, along with optional context information. At the
moment only the address of the router sending an RDNSS option is
included, but additional attributes may be later defined, if needed by
new use cases.Signed-off-by: Pierre Ynard
Signed-off-by: David S. Miller -
This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced
asynchronious user -> kernel communication.The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.Kernel -> user path in netlink_unicast remains untouched.
EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller -
There are currently two ways to determine whether the netlink socket is a
kernel one or a user one. This patch creates a single inline call for
this purpose and unifies all the calls in the af_netlink.cNo similar calls are found outside af_netlink.c.
Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller -
netlink_sendskb does not use third argument. Clean it and save a couple of
bytes.Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller -
The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks
like outdated copy/paste from rtnetlink.c. Push them into sync with the
original.Changes from v1:
- deleted comment in nfnetlink_rcv_msg by request of Patrick McHardySigned-off-by: Denis V. Lunev
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller -
There is no need to process outstanding netlink user->kernel packets
during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
anymore.Normal code path is the following:
netlink_sendmsg
netlink_unicast
netlink_sendskb
skb_queue_tail
netlink_data_ready
rtnetlink_rcv
mutex_lock(&rtnl_mutex);
netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
mutex_unlock(&rtnl_mutex);So, it is possible, that packets can be present in the rtnl->sk_receive_queue
during rtnl_unlock, but there is no need to process them at that moment as
rtnetlink_rcv for that packet is pending.Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller -
If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore). Make it pass back NULL
instead for better safety.Signed-off-by: Tony Battersby
Signed-off-by: David S. Miller -
Expansion of original idea from Denis V. Lunev
Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Add port randomization rather than a simple fixed rover
for use with SCTP. This makes it act similar to TCP, UDP, DCCP
when allocating ports.No longer need port_alloc_lock as well (suggestion by Brian Haley).
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
The fourth parameter of /proc/net/psched is supposed to show the timer
resultion and is used by HTB userspace to calculate the necessary
burst rate. Currently we show the clock resolution, which results in a
too low burst rate when the two differ.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
This patch makes the IPv4 x->type->input functions return the next protocol
instead of setting it directly. This is identical to how we do things in
IPv6 and will help us merge common code on the input path.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
This patch moves the setting of the IP length and checksum fields out of
the transforms and into the xfrmX_output functions. This would help future
efforts in merging the transforms themselves.It also adds an optimisation to ipcomp due to the fact that the transport
offset is guaranteed to be zero.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions. Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.I've also added transport header type conversion headers for these types
which are now used by the transforms.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation. This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
This patch changes the calling convention so that on entry from
x->mode->output and before entry into x->type->output skb->data
will point to the payload instead of the IP header.This is essentially a redistribution of skb_push/skb_pull calls
with the aim of minimising them on the common path of tunnel +
ESP.It'll also let us use the same calling convention between IPv4
and IPv6 with the next patch.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
The beet output function completely kills any extension headers by replacing
them with the IPv6 header. This is because it essentially ignores the
result of ip6_find_1stfragopt by simply acting as if there aren't any
extension headers.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
I pointed this out back when this patch was first proposed but it looks like
it got lost along the way.The checksum only needs to be ignored for NAT-T in transport mode where
we lose the original inner addresses due to NAT. With BEET the inner
addresses will be intact so the checksum remains valid.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
To judge the timing for DAD, netif_carrier_ok() is used. However,
there is a possibility that dev->qdisc stays noop_qdisc even if
netif_carrier_ok() returns true. In that case, DAD NS is not sent out.
We need to defer the IPv6 device initialization until a valid qdisc
is specified.Signed-off-by: Mitsuru Chinen
Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller -
The unregister_netdevice() and dev_change_net_namespace()
both check for dev->flags to be IFF_UP before calling the
dev_close(), but the dev_close() checks for IFF_UP itself,
so remove those unneeded checks.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Follows own function for each task principle, this is really
somewhat separate task being done in sacktag. Also reduces
indentation.In addition, added ack_seq local var to break some long
lines & fixed coding style things.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
Just switch to the consolidated code.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Just switch to the consolidated code
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Just switch to the consolidated code.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Just switch to the consolidated calls.
ipt_recent() has to initialize the private, so use
the __seq_open_private() helper.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The decryption handlers will skip the frame if the RX_FLAG_DECRYPTED
flag is set, so the early flag setting introduced by Johannes breaks
decryption. To work around this, call the handlers first and then set
the flag.Signed-off-by: Mattias Nissler
Signed-off-by: John W. Linville -
Problem description by Daniel Drake :
"This sequence of events causes loss of connectivity:
ifconfig eth7 down
iwconfig eth7 mode monitor
ifconfig eth7 up
ifconfig eth7 down
iwconfig eth7 mode managedAt this point you are associated but TX does not work. This is because
the eth7 hard_start_xmit is still ieee80211_monitor_start_xmit."The problem is caused by ieee80211_if_set_type checking for a non-zero
hard_start_xmit pointer value in order to avoid changing that value for
master devices. The fix is to make that check more explicitly linked to
master devices rather than simply checking if the value has been
previously set.CC: Daniel Drake
Acked-by: Michael Wu
Signed-off-by: John W. Linville -
This patch releases the lock on the state before calling x->type->output.
It also adds the lock to the spots where they're currently needed.Most of those places (all except mip6) are expected to disappear with
async crypto.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
This patch adds locking so that when we're copying non-atomic fields such as
life-time or coaddr to user-space we don't get a partial result.For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
expiration notification to include the keys and life-times. This is in-line
with XFRM behaviour.The actual cases affected are:
* pfkey_getspi: No change as we don't have any keys to copy.
* key_notify_sa:
+ ADD/UPD: This wouldn't work otherwise.
+ DEL: It can't hurt.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller