Eric Lee / smarc-fsl-linux-kernel

12 Oct, 2007

10 commits

b08d6cb22 [TCP]: Limit processing lost_retrans loop to work-to-do cases ... Browse Code »

This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:36:13 +0800
f785a8e28 [TCP]: Fix lost_retrans loop vs fastpath problems ... Browse Code »

Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.

It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.

Maybe after(highest_sack_end_seq, tp->high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.

Noticed because of report against a related problem from TAKANO
Ryousei . He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:

http://marc.info/?l=linux-netdev&m=119149311913288&w=2

...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:35:41 +0800
4cd829995 [TCP]: No need to re-count fackets_out/sacked_out at RTO ... Browse Code »

Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:34:57 +0800
d19359429 [TCP]: Extract tcp_match_queue_to_sack from sacktag code ... Browse Code »

This is necessary for upcoming DSACK bugfix. Reduces sacktag
length which is not very sad thing at all... :-)

Notice that there's a need to handle out-of-mem at caller's
place.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:34:25 +0800
f6fb128d2 [TCP]: Kill almost unused variable pcount from sacktag ... Browse Code »

It's on the way for future cutting of that function.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:33:55 +0800
3eec0047d [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L ... Browse Code »

This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:33:11 +0800
16e906812 [TCP]: Add bytes_acked (ABC) clearing to FRTO too ... Browse Code »

I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.

Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-12 08:32:31 +0800
4953f0fcc [IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2 ... Browse Code »

From RFC 3493, Section 5.2:

IPV6_MULTICAST_IF

Set the interface to use for outgoing multicast packets. The
argument is the index of the interface to use. If the
interface index is specified as zero, the system selects the
interface (for example, by looking up the address in a routing
table and using the resulting interface).

This patch adds support for (index == 0) to reset the value to it's
original state, allowing the system to choose the best interface. IPv4
already behaves this way.

Signed-off-by: Brian Haley
Acked-by: David L Stevens
Signed-off-by: David S. Miller

Brian Haley
2007-10-12 05:39:29 +0800
73aaf9355 [NETFILTER]: x_tables: add missing ip6t_modulename aliases ... Browse Code »

The patch will add MODULE_ALIAS("ip6t_") where missing,
otherwise you will get

ip6tables: No chain/target/match by that name

when xt_ is not already loaded.

Signed-off-by: Jan Engelhardt
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Jan Engelhardt
2007-10-12 05:36:40 +0800
17311393f [NETFILTER]: nf_conntrack_tcp: fix connection reopening ... Browse Code »

With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:

When a connection is >>closed actively<>accept<< a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
[...]

The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.

Signed-off-by: Jozsef Kadlecsik
Tested-by: Krzysztof Piotr Oledzki
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Jozsef Kadlecsik
2007-10-12 05:35:52 +0800

11 Oct, 2007

30 commits

28f7b0360 [NETLINK]: fib_frontend build fixes ... Browse Code »

1) fibnl needs to be declared outside of config ifdefs,
and also should not be explicitly initialized to NULL
2) nl_fib_input() args are wrong for netlink_kernel_create()
input method

Signed-off-by: David S. Miller

David S. Miller
2007-10-11 12:32:39 +0800
31910575a [IPv6]: Export userland ND options through netlink (RDNSS support) ... Browse Code »

As discussed before, this patch provides userland with a way to access
relevant options in Router Advertisements, after they are processed
and validated by the kernel. Extra options are processed in a generic
way; this patch only exports RDNSS options described in RFC5006, but
support to control which options are exported could be easily added.

A new rtnetlink message type is defined, to transport Neighbor
Discovery options, along with optional context information. At the
moment only the address of the router sending an RDNSS option is
included, but additional attributes may be later defined, if needed by
new use cases.

Signed-off-by: Pierre Ynard
Signed-off-by: David S. Miller

Pierre Ynard
2007-10-11 12:22:05 +0800
cd40b7d39 [NET]: make netlink user -> kernel interface synchronious ... Browse Code »

This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.

Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller

Denis V. Lunev
2007-10-11 12:15:29 +0800
aed815601 [NET]: unify netlink kernel socket recognition ... Browse Code »

There are currently two ways to determine whether the netlink socket is a
kernel one or a user one. This patch creates a single inline call for
this purpose and unifies all the calls in the af_netlink.c

No similar calls are found outside af_netlink.c.

Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller

Denis V. Lunev
2007-10-11 12:14:32 +0800
7ee015e0f [NET]: cleanup 3rd argument in netlink_sendskb ... Browse Code »

netlink_sendskb does not use third argument. Clean it and save a couple of
bytes.

Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller

Denis V. Lunev
2007-10-11 12:14:03 +0800
3b71535f3 [NET]: Make netlink processing routines semi-synchronious (inspired by rtnl) v2 ... Browse Code »

The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks
like outdated copy/paste from rtnetlink.c. Push them into sync with the
original.

Changes from v1:
- deleted comment in nfnetlink_rcv_msg by request of Patrick McHardy

Signed-off-by: Denis V. Lunev
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller

Denis V. Lunev
2007-10-11 12:13:32 +0800
1536cc0d5 [NET]: rtnl_unlock cleanups ... Browse Code »

There is no need to process outstanding netlink user->kernel packets
during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
anymore.

Normal code path is the following:
netlink_sendmsg
netlink_unicast
netlink_sendskb
skb_queue_tail
netlink_data_ready
rtnetlink_rcv
mutex_lock(&rtnl_mutex);
netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
mutex_unlock(&rtnl_mutex);

So, it is possible, that packets can be present in the rtnl->sk_receive_queue
during rtnl_unlock, but there is no need to process them at that moment as
rtnetlink_rcv for that packet is pending.

Signed-off-by: Denis V. Lunev
Acked-by: Alexey Kuznetsov
Signed-off-by: David S. Miller

Denis V. Lunev
2007-10-11 12:12:58 +0800
fa8705b00 [NET]: sanitize kernel_accept() error path ... Browse Code »

If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore). Make it pass back NULL
instead for better safety.

Signed-off-by: Tony Battersby
Signed-off-by: David S. Miller

Tony Battersby
2007-10-11 12:09:04 +0800
227b60f51 [INET]: local port range robustness ... Browse Code »

Expansion of original idea from Denis V. Lunev

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-10-11 08:30:46 +0800
063930090 [SCTP]: port randomization ... Browse Code »

Add port randomization rather than a simple fixed rover
for use with SCTP. This makes it act similar to TCP, UDP, DCCP
when allocating ports.

No longer need port_alloc_lock as well (suggestion by Brian Haley).

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-10-11 08:30:18 +0800
3c0cfc135 [NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched ... Browse Code »

The fourth parameter of /proc/net/psched is supposed to show the timer
resultion and is used by HTB userspace to calculate the necessary
burst rate. Currently we show the clock resolution, which results in a
too low burst rate when the two differ.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2007-10-11 07:55:59 +0800
631a6698d [IPSEC]: Move IP protocol setting from transforms into xfrm4_input.c ... Browse Code »

This patch makes the IPv4 x->type->input functions return the next protocol
instead of setting it directly. This is identical to how we do things in
IPv6 and will help us merge common code on the input path.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:56 +0800
ceb1eec82 [IPSEC]: Move IP length/checksum setting out of transforms ... Browse Code »

This patch moves the setting of the IP length and checksum fields out of
the transforms and into the xfrmX_output functions. This would help future
efforts in merging the transforms themselves.

It also adds an optimisation to ipcomp due to the fact that the transport
offset is guaranteed to be zero.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:56 +0800
87bdc48d3 [IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr ... Browse Code »

This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions. Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.

I've also added transport header type conversion headers for these types
which are now used by the transforms.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:55 +0800
37fedd3aa [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output ... Browse Code »

The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation. This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.

It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:54 +0800
7b277b1a5 [IPSEC]: Set skb->data to payload in x->mode->output ... Browse Code »

This patch changes the calling convention so that on entry from
x->mode->output and before entry into x->type->output skb->data
will point to the payload instead of the IP header.

This is essentially a redistribution of skb_push/skb_pull calls
with the aim of minimising them on the common path of tunnel +
ESP.

It'll also let us use the same calling convention between IPv4
and IPv6 with the next patch.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:54 +0800
bee0b40c0 [IPSEC] beet: Fix extension header support on output ... Browse Code »

The beet output function completely kills any extension headers by replacing
them with the IPv6 header. This is because it essentially ignores the
result of ip6_find_1stfragopt by simply acting as if there aren't any
extension headers.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:53 +0800
8bd170750 [IPSEC] esp: Remove NAT-T checksum invalidation for BEET ... Browse Code »

I pointed this out back when this patch was first proposed but it looks like
it got lost along the way.

The checksum only needs to be ignored for NAT-T in transport mode where
we lose the original inner addresses due to NAT. With BEET the inner
addresses will be intact so the checksum remains valid.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:53 +0800
f24e3d658 [IPV6]: Defer IPv6 device initialization until a valid qdisc is specified ... Browse Code »

To judge the timing for DAD, netif_carrier_ok() is used. However,
there is a possibility that dev->qdisc stays noop_qdisc even if
netif_carrier_ok() returns true. In that case, DAD NS is not sent out.
We need to defer the IPv6 device initialization until a valid qdisc
is specified.

Signed-off-by: Mitsuru Chinen
Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

Mitsuru Chinen
2007-10-11 07:55:52 +0800
9b7726523 [NET]: Remove double dev->flags checking when calling dev_close() ... Browse Code »

The unregister_netdevice() and dev_change_net_namespace()
both check for dev->flags to be IFF_UP before calling the
dev_close(), but the dev_close() checks for IFF_UP itself,
so remove those unneeded checks.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-11 07:55:52 +0800
1c1e87edb [TCP]: Separate lost_retrans loop into own function ... Browse Code »

Follows own function for each task principle, this is really
somewhat separate task being done in sacktag. Also reduces
indentation.

In addition, added ack_seq local var to break some long
lines & fixed coding style things.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-11 07:55:51 +0800
ec9310351 [SUNRPC]: Make the sunrpc use the seq_open_private() ... Browse Code »

Just switch to the consolidated code.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-11 07:55:36 +0800
a662d4cb5 [IRDA]: Make the IRDA use the seq_open_private() ... Browse Code »

Just switch to the consolidated code

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-11 07:55:35 +0800
31164088d [DECNET]: Make decnet code use the seq_open_private() ... Browse Code »

Just switch to the consolidated code.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-11 07:55:34 +0800
e2da59133 [NETFILTER]: Make netfilter code use the seq_open_private ... Browse Code »

Just switch to the consolidated calls.

ipt_recent() has to initialize the private, so use
the __seq_open_private() helper.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-11 07:55:34 +0800
cf7732e4c [NET]: Make core networking code use seq_open_private ... Browse Code »

This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.

The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-10-11 07:55:33 +0800
e2f036da2 [PATCH] mac80211: Defer setting of RX_FLAG_DECRYPTED. ... Browse Code »

The decryption handlers will skip the frame if the RX_FLAG_DECRYPTED
flag is set, so the early flag setting introduced by Johannes breaks
decryption. To work around this, call the handlers first and then set
the flag.

Signed-off-by: Mattias Nissler
Signed-off-by: John W. Linville

Mattias Nissler
2007-10-11 07:55:23 +0800
0654ff055 [PATCH] ieee80211_if_set_type: make check for master dev more explicit ... Browse Code »

Problem description by Daniel Drake :

"This sequence of events causes loss of connectivity:

ifconfig eth7 down
iwconfig eth7 mode monitor
ifconfig eth7 up
ifconfig eth7 down
iwconfig eth7 mode managed

At this point you are associated but TX does not work. This is because
the eth7 hard_start_xmit is still ieee80211_monitor_start_xmit."

The problem is caused by ieee80211_if_set_type checking for a non-zero
hard_start_xmit pointer value in order to avoid changing that value for
master devices. The fix is to make that check more explicitly linked to
master devices rather than simply checking if the value has been
previously set.

CC: Daniel Drake
Acked-by: Michael Wu
Signed-off-by: John W. Linville

John W. Linville
2007-10-11 07:55:23 +0800
b7c6538cd [IPSEC]: Move state lock into x->type->output ... Browse Code »

This patch releases the lock on the state before calling x->type->output.
It also adds the lock to the spots where they're currently needed.

Most of those places (all except mip6) are expected to disappear with
async crypto.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:03 +0800
050f009e1 [IPSEC]: Lock state when copying non-atomic fields to user-space ... Browse Code »

This patch adds locking so that when we're copying non-atomic fields such as
life-time or coaddr to user-space we don't get a partial result.

For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
expiration notification to include the keys and life-times. This is in-line
with XFRM behaviour.

The actual cases affected are:

* pfkey_getspi: No change as we don't have any keys to copy.
* key_notify_sa:
+ ADD/UPD: This wouldn't work otherwise.
+ DEL: It can't hurt.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:55:02 +0800