01 Nov, 2007
16 commits
-
Signed-off-by: David S. Miller
-
Documentation updates for network interfaces.
1. Add doc for netif_napi_add
2. Remove doc for unused returns from netif_rx
3. Add doc for netif_receive_skb[ Incorporated minor mods from Randy Dunlap -DaveM ]
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
This cache is only required to create new namespaces,
but we won't have them in CONFIG_NET_NS=n case.Hide it under the appropriate ifdef.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The setup_net is called for the init net namespace
only (int the CONFIG_NET_NS=n of course) from the __init
function, so mark it as __net_init to disappear with the
caller after the boot.Yet again, in the perfect world this has to be under
#ifdef CONFIG_NET_NS, but it isn't guaranteed that every
subsystem is registered *after* the init_net_ns is set
up. After we are sure, that we don't start registering
them before the init net setup, we'll be able to move
this code under the ifdef.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The namespace creation/destruction code is never called
if the CONFIG_NET_NS is n, so it's OK to move it under
appropriate ifdef.The copy_net_ns() in the "n" case checks for flags and
returns -EINVAL when new net ns is requested. In a perfect
world this stub must be in net_namespace.h, but this
function need to know the CLONE_NEWNET value and thus
requires sched.h. On the other hand this header is to be
injected into almost every .c file in the networking code,
and making all this code depend on the sched.h is a
suicidal attempt.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
When the new pernet something (subsys, device or operations) is
being registered, the init callback is to be called for each
namespace, that currently exitst in the system. During the
unregister, the same is to be done with the exit callback.However, not every pernet something has both calls, but the
check for the appropriate pointer to be not NULL is performed
inside the for_each_net() loop.This is (at least) strange, so tune this.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope
this particular split helped.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
At this point nobody calls the sk_alloc(() with zero_it == 0,
so remove unneeded checks from it.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The sk_prot_alloc() already performs all the stuff needed by the
sk_clone(). Besides, the sk_prot_alloc() requires almost twice
less arguments than the sk_alloc() does, so call the sk_prot_alloc()
saving the stack a bit.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The security_sk_alloc() and the module_get is a part of the
object allocations - move it in the proper place.Note, that since we do not reset the newly allocated sock
in the sk_alloc() (memset() is removed with the previous
patch) we can safely do this.Also fix the error path in sk_prot_alloc() - release the security
context if needed.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
We have a __GFP_ZERO flag that allocates a zeroed chunk of memory.
Use it in the sk_alloc() and avoid a hand-made memset().This is a temporary patch that will help us in the nearest future :)
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The sock object is allocated either from the generic cache with
the kmalloc, or from the proc->slab cache.Move this logic into an isolated set of helpers and make the
sk_alloc/sk_free look a bit nicer.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The sock_copy() is supposed to just clone the socket. In a perfect
world it has to be just memcpy, but we have to handle the security
mark correctly. All the extra setup must be performed in sk_clone()
call, so move the get_net() into more proper place.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The sock_copy() call is not used outside the sock.c file,
so just move it into a sock.cSigned-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Similar to commit 3eec0047d9bdd, point of this is to avoid
skipping R-bit skbs.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
DSACK inside another SACK block were missed if start_seq of DSACK
was larger than SACK block's because sorting prioritizes full
processing of the SACK block before DSACK. After SACK block
sorting situation is like this:SSSSSSSSS
D
SSSSSS
SSSSSSSBecause write_queue is walked in-order, when the first SACK block
has been processed, TCP is already past the skb for which the
DSACK arrived and we haven't taught it to backtrack (nor should
we), so TCP just continues processing by going to the next SACK
block after the DSACK (if any).Whenever such DSACK is present, do an embedded checking during
the previous SACK block.If the DSACK is below snd_una, there won't be overlapping SACK
block, and thus no problem in that case. Also if start_seq of
the DSACK is equal to the actual block, it will be processed
first.Tested this by using netem to duplicate 15% of packets, and
by printing SACK block when found_dup_sack is true and the
selected skb in the dup_sack = 1 branch (if taken):SACK block 0: 4344-5792 (relative to snd_una 2019137317)
SACK block 1: 4344-5792 (relative to snd_una 2019137317)equal start seqnos => next_dup = 0, dup_sack = 1 won't occur...
SACK block 0: 5792-7240 (relative to snd_una 2019214061)
SACK block 1: 2896-7240 (relative to snd_una 2019214061)
DSACK skb match 5792-7240 (relative to snd_una)...and next_dup = 1 case (after the not shown start_seq sort),
went to dup_sack = 1 branch.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller
31 Oct, 2007
7 commits
-
On PowerPC allmodconfig build we get this:
net/key/af_key.c:400: warning: comparison is always false due to limited range of data type
Signed-off-by: Stephen Rothwell
Signed-off-by: David S. Miller -
This fixes scatterlist corruptions added by
commit 68e3f5dd4db62619fdbe520d36c9ebf62e672256
[CRYPTO] users: Fix up scatterlist conversion errorsThe issue is that the code calls sg_mark_end() which clobbers the
sg_page() pointer of the final scatterlist entry.The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().
After considering all skb_to_sgvec() call sites the most correct
solution is to call __sg_mark_end() in skb_to_sgvec() since that is
what all of the callers would end up doing anyways.I suspect this might have fixed some problems in virtio_net which is
the sole non-crypto user of skb_to_sgvec().Other similar sg_mark_end() cases were converted over to
__sg_mark_end() as well.Arguably sg_mark_end() is a poorly named function because it doesn't
just "mark", it clears out the page pointer as a side effect, which is
what led to these bugs in the first place.The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
and arguably it could be converted to __sg_mark_end() if only so that
we can delete this confusing interface from linux/scatterlist.hSigned-off-by: David S. Miller
-
It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed.
Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller -
The file /proc/net/if_inet6 is removed twice.
First time in:
inet6_exit
->addrconf_cleanup
And followed a few lines after by:
inet6_exit
-> if6_proc_exitSigned-off-by: Daniel Lezcano
Signed-off-by: David S. Miller -
When a network namespace reference is held by a network subsystem,
and when this reference is decremented in a rcu update callback, we
must ensure that there is no more outstanding rcu update before
trying to free the network namespace.In the normal case, the rcu_barrier is called when the network namespace
is exiting in the cleanup_net function.But when a network namespace creation fails, and the subsystems are
undone (like the cleanup), the rcu_barrier is missing.This patch adds the missing rcu_barrier.
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller -
Point 1:
The unregistering of a network device schedule a netdev_run_todo.
This function calls dev->destructor when it is set and the
destructor calls free_netdev.Point 2:
In the case of an initialization of a network device the usual code
is:
* alloc_netdev
* register_netdev
-> if this one fails, call free_netdev and exit with error.Point 3:
In the register_netdevice function at the later state, when the device
is at the registered state, a call to the netdevice_notifiers is made.
If one of the notification falls into an error, a rollback to the
registered state is done using unregister_netdevice.Conclusion:
When a network device fails to register during initialization because
one network subsystem returned an error during a notification call
chain, the network device is freed twice because of fact 1 and fact 2.
The second free_netdev will be done with an invalid pointer.Proposed solution:
The following patch move all the code of unregister_netdevice *except*
the call to net_set_todo, to a new function "rollback_registered".The following functions are changed in this way:
* register_netdevice: calls rollback_registered when a notification fails
* unregister_netdevice: calls rollback_register + net_set_todo, the call
order to net_set_todo is changed because it is the
latest now. Since it justs add an element to a list
that should not break anything.Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller -
Fix links to files in Documentation/* in various Kconfig files
Signed-off-by: Dirk Hohndel
Signed-off-by: Linus Torvalds
30 Oct, 2007
11 commits
-
Commit baa3a2a0d24ebcf1c451bec8e5bee3d3467f4cbb, by removing initialization
of the ctl_name field, broke this conditional, preventing the display of
rpc_tasks that you previously got when turning on rpc debugging.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: J. Bruce Fields
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller -
On systems with a very large amount of memory, the heuristics in
alloc_large_system_hash() result in a very large TCP established hash
table: 16 millions of entries for a 128 GB ia64 system. This makes
reading from /proc/net/tcp pretty slow (well over a second) and as a
result netstat is slow on these machines. I know that /proc/net/tcp is
deprecated in favor of tcp_diag, however at the moment netstat only
knows of the former.I am skeptical that such a large TCP established hash is often needed.
Just because a system has a lot of memory doesn't imply that it will
have several millions of concurrent TCP connections. Thus I believe
that we should put an arbitrary high limit to the size of the TCP
established hash by default. Users who really need a bigger hash can
always use the thash_entries boot parameter to get more.I propose 2 millions of entries as the arbitrary high limit. This
makes /proc/net/tcp reasonably fast on the system in question (0.2 s)
while being still large enough for me to be confident that network
performance won't suffer.This is just one way to limit the hash size, there are others; I am not
familiar enough with the TCP code to decide which is best. Thus, I
would welcome the proposals of alternatives.[ 2 million is still too large, thus I've modified the limit in the
change to be '512 * 1024'. -DaveM ]Signed-off-by: Jean Delvare
Signed-off-by: David S. Miller -
as some architectures have unsigned long for u64.
net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_create_chunks':
net/sunrpc/xprtrdma/rpc_rdma.c:222: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/sunrpc/xprtrdma/rpc_rdma.c:234: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_count_chunks':
net/sunrpc/xprtrdma/rpc_rdma.c:577: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64Noticed on PowerPC pseries_defconfig build.
Signed-off-by: Stephen Rothwell
Signed-off-by: David S. Miller -
While displaying ICMP out-going statistics as Out counters in
/proc/net/snmp, the memory location for ICMP in-coming statistics
was referred by mistake.Signed-off-by: Mitsuru Chinen
Acked-by: David L Stevens
Signed-off-by: David S. Miller -
If either of the two sock_alloc_fd() calls fail, we
forget to update 'err' and thus we'll erroneously
return zero in these cases.Based upon a report and patch from Rich Paul, and
commentary from Chuck Ebbert.Signed-off-by: David S. Miller
-
This allocation is expected to fail and we handle it by fallback to vmalloc().
So don't scare people with nasty messages like
http://bugzilla.kernel.org/show_bug.cgi?id=9190Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller -
netpoll_poll_lock() synchronizes the ->poll() invocation
code paths, but once we have the lock we have to make
sure that NAPI_STATE_SCHED is still set. Otherwise we
get:cpu 0 cpu 1
net_rx_action() poll_napi()
netpoll_poll_lock() ... spin on ->poll_lock
->poll()
netif_rx_complete
netpoll_poll_unlock() acquire ->poll_lock()
->poll()
netif_rx_complete()
CRASHBased upon a bug report from Tina Yang.
Signed-off-by: David S. Miller
-
while reviewing the tcp_md5-related code further i came across with
another two of these casts which you probably have missed. I don't
actually think that they impose a problem by now, but as you said we
should remove them.Signed-off-by: Matthias M. Dellweg
Signed-off-by: David S. Miller -
TCP Vegas implementation has a bug in the process of disabling
slow-start with gamma parameter. The bug may lead to extreme
unfairness in the presence of early packet loss. See details in:
http://www.cs.caltech.edu/~weixl/technical/ns2linux/known_linux/index.html#vegasSwitch the order of "if (tp->snd_cwnd snd_ssthresh)" statement
and "if (diff > gamma)" statement to eliminate the problem.Signed-off-by: Xiaoliang (David) Wei
Signed-off-by: David S. Miller -
Instead of using the default timeout of 3 minutes, this uses the timeout
specific to the protocol used for the connection. The 3 minute timeout
seems somewhat arbitrary (though I know it is used other places in the
ipvs code) and when failing over it would be much nicer to use one of
the configured timeout values.Signed-off-by: Andy Gospodarek
Acked-by: Simon Horman
Signed-off-by: David S. Miller -
This bug was introduced by the commit
d12af679bcf8995a237560bdf7a4d734f8df5dbb (sysctl: fix neighbour table
sysctls).Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller
29 Oct, 2007
2 commits
-
Signed-off-by: Al Viro
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds -
rpcrdma stuff lacks endianness annotations for on-the-wire data.
Signed-off-by: Al Viro
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds
27 Oct, 2007
4 commits
-
This patch fixes the errors made in the users of the crypto layer during
the sg_init_table conversion. It also adds a few conversions that were
missing altogether.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
The pid namespace patches changed the semantics of
find_task_by_pid without breaking the compile resulting
in get_net_ns_by_pid doing the wrong thing.So switch to using the intended find_task_by_vpid.
Combined with Denis' earlier patch to make netlink traffic
fully synchronous the inadvertent race I introduced with
accessing current is actually removed.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller -
It is not safe to to place struct pernet_operations in a special section.
We need struct pernet_operations to last until we call unregister_pernet_subsys.
Which doesn't happen until module unload.So marking struct pernet_operations is a disaster for modules in two ways.
- We discard it before we call the exit method it points to.
- Because I keep struct pernet_operations on a linked list discarding
it for compiled in code removes elements in the middle of a linked
list and does horrible things for linked insert.So this looks safe assuming __exit_refok is not discarded
for modules.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch fixes the following compile errors in some configurations:
...
CC net/ipv4/esp4.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c: In function 'esp_output':
/home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c:113: error: implicit declaration of function 'sg_init_table'
make[3]: *** [net/ipv4/esp4.o] Error 1
...
/home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c: In function 'esp6_output':
/home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c:112: error: implicit declaration of function 'sg_init_table'
make[3]: *** [net/ipv6/esp6.o] Error 1Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller