14 Dec, 2010
1 commit
-
ip_default_ttl should be between 1 and 255
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Dec, 2010
1 commit
-
Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
This allowed several simplifications:
1) The interim dst_metric_hoplimit() can go as it's no longer
userd.2) The sysctl_ip_default_ttl entry no longer needs to use
ipv4_doint_and_flush, since the sysctl is not cached in
routing cache metrics any longer.3) ipv4_doint_and_flush no longer needs to be exported and
therefore can be marked static.When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.hSigned-off-by: David S. Miller
29 Nov, 2010
1 commit
-
tcp_win_from_space() does the following:
if (sysctl_tcp_adv_win_scale > (-sysctl_tcp_adv_win_scale);
else
return space - (space >> sysctl_tcp_adv_win_scale);"space" is int.
As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.Which means we busyloop in tcp_fixup_rcvbuf().
Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].
Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312
Steps to reproduce:
echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
wget www.kernel.org
[softlockup]Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
11 Nov, 2010
1 commit
-
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.Signed-off-by: Eric Dumazet
Reported-by: Robin Holt
Reviewed-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
16 May, 2010
1 commit
-
(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.Signed-off-by: Octavian Purdila
Signed-off-by: WANG Cong
Cc: Neil Horman
Cc: Eric Dumazet
Cc: Eric W. Biederman
Signed-off-by: David S. Miller
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
19 Feb, 2010
2 commits
-
This patch enables fast retransmissions after one dupACK for
TCP if the stream is identified as thin. This will reduce
latencies for thin streams that are not able to trigger fast
retransmissions due to high packet interarrival time. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin.Signed-off-by: Andreas Petlund
Signed-off-by: David S. Miller -
This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.Signed-off-by: Andreas Petlund
Signed-off-by: David S. Miller
08 Dec, 2009
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
03 Dec, 2009
1 commit
-
Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):http://thread.gmane.org/gmane.linux.network/102586
These functions will also be used in subsequent patches that implement
additional features.Requires:
net: TCP_MSS_DEFAULT, TCP_MSS_DESIREDSigned-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller
26 Nov, 2009
1 commit
-
Generated with the following semantic patch
@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)applied over {include,net,drivers/net}.
Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller
12 Nov, 2009
1 commit
-
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.Cc: "David Miller"
Cc: Hideaki YOSHIFUJI
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman
24 Sep, 2009
1 commit
-
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Nov, 2008
1 commit
-
I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
28 Oct, 2008
1 commit
-
This is a patch to provide on demand route cache rebuilding. Currently, our
route cache is rebulid periodically regardless of need. This introduced
unneeded periodic latency. This patch offers a better approach. Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire. Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace. This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected. This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at allTested successfully by me.
Signed-off-by: Neil Horman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Oct, 2008
1 commit
-
name and nlen parameters passed to ->strategy hook are unused, remove
them. In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).Signed-off-by: Alexey Dobriyan
Acked-by: David S. Miller [ networking bits ]
Cc: Ralf Baechle
Cc: David Howells
Cc: Matt Mackall
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Oct, 2008
1 commit
-
I noticed sysctl_local_port_range[] and its associated seqlock
sysctl_local_port_range_lock were on separate cache lines.
Moreover, sysctl_local_port_range[] was close to unrelated
variables, highly modified, leading to cache misses.Moving these two variables in a structure can help data
locality and moving this structure to read_mostly section
helps sharing of this data among cpus.Cleanup of extern declarations (moved in include file where
they belong), and use of inet_get_local_port_range()
accessor instead of direct access to ports values.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
04 Aug, 2008
1 commit
-
Commit 76e6ebfb40a2455c18234dcb0f9df37533215461 ("netns: add namespace
parameter to rt_cache_flush") acceses the extra2 parameter of the
ip_default_ttl ctl_table, but it is never set to a meaningful
value. When e84f84f276473dcc673f360e8ff3203148bdf0e2 ("netns: place
rt_genid into struct net") is applied, we'll oops in
rt_cache_invalidate(). Set extra2 to init_net, to avoid that.Reported-by: Marcin Slusarz
Signed-off-by: Sven Wegener
Tested-by: Marcin Slusarz
Acked-by: Denis V. Lunev
Signed-off-by: David S. Miller
28 Jul, 2008
1 commit
-
Piss-poor sysctl registration API strikes again, film at 11...
What we really need is _pathname_ required to be present in already
registered table, so that kernel could warn about bad order. That's the
next target for sysctl stuff (and generally saner and more explicit
order of initialization of ipv[46] internals wouldn't hurt either).For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and make sure
that sufficient skeleton is there before we start registering per-net
sysctls.Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
27 Jul, 2008
1 commit
-
Massage ipv4 initialization - make sure that net.ipv4 appears as
non-per-net-namespace before it shows up in per-net-namespace sysctls.
That's the only change outside of sysctl.c needed to get sane ordering
rules and data structures for sysctls (esp. for procfs side of that
mess).Signed-off-by: Al Viro
02 Jul, 2008
1 commit
-
Convert the sysctl values for icmp ratelimit to use milliseconds instead
of jiffies which is based on kernel configured HZ.
Internal kernel jiffies are not a proper unit for any userspace API.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
12 Jun, 2008
1 commit
-
This patch removes CVS keywords that weren't updated for a long time
from comments.Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller
26 Mar, 2008
3 commits
-
Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net,
i.e. copy the table, alter .data pointers and register it per-net.Other ipv4_table's sysctls are now global, but this is going to
change once sysctl permissions patches migrate from -mm tree to
mainline in 2.6.26 merge window :)Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Initialization is moved to icmp_sk_init, all the places, that
refer to them use init_net for now.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This includes adding pernet_operations, empty init and exit
hooks and a bit of changes in sysctl_ipv4_init just not to
have this part in next patches.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
01 Feb, 2008
1 commit
-
In strategy_allowed_congestion_control of the 2.6.24 kernel, when
sysctl_string return 1 on success,it should call
tcp_set_allowed_congestion_control to set the allowed congestion
control.But, it don't. the sysctl_string return 1 on success,
otherwise return negative, never return 0.The patch fix the problem.Signed-off-by: Shan Wei
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
29 Jan, 2008
6 commits
-
This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.This will simplify the later patches and will keep
similar things closer to each other.ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This includes the most simple cases for netfilter.
The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.The conntrack module is also patched, but this hunk is
very small and simple.Signed-off-by: Pavel Emelyanov
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller -
Signed-off-by: Takahiro Yasui
Signed-off-by: Hideo Aoki
Signed-off-by: David S. Miller -
AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.I propose to merge the handlers together using ctl paths.
The inet_forward_change() is static after this and I move
it higher to be closer to other "propagation" helpers and
to avoid diff making patches based on { and } matching :)
i.e. - make them easier to read.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This is the same as I did for the net/core/ table in the
second patch in his series: use the paths and isolate the
whole table in the .c file.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This includes several cleanups:
* tune Makefile to compile out this file when SYSCTL=n. Now
it looks like net/core/sysctl_net_core.c one;
* move the ipv4_config to af_inet.c to exist all the time;
* remove additional sysctl_ip_nonlocal_bind declaration
(it is already declared in net/ip.h);
* remove no nonger needed ifdefs from this file.This is a preparation for using ctl paths for net/ipv4/
sysctl table.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
20 Nov, 2007
1 commit
-
From: "Sam Jansen"
sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.Signed-off-by: David S. Miller
19 Oct, 2007
2 commits
-
There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.usefull links:
Patches posted by Stephen Hemminger
http://marc.info/?l=linux-netdev&m=119206106218187&w=2
http://marc.info/?l=linux-netdev&m=119206109918235&w=2Andrew Morton's comment
http://marc.info/?l=linux-kernel&m=119248225007737&w=21. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.Signed-off-by: Anton Arapov
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Currently tcp_available_congestion_control does not even attempt being read
from sys_sysctl, and ipfrag_max_dist while it works allows setting of invalid
values using sys_sysctl.So just kill the binary sys_sysctl support for these sysctls. If the support
is not important enough to test and get right it probably isn't important
enough to keep.Signed-off-by: Eric W. Biederman
Cc: Alexey Dobriyan
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Oct, 2007
1 commit
-
Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:1. to keep them in the __read_mostly section;
2. not to export the whole inet_frags objects outside.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
11 Oct, 2007
1 commit
-
Expansion of original idea from Denis V. Lunev
Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
08 Jun, 2007
1 commit
-
This patch converts the ipv4_devconf config members (everything except
sysctl) to an array. This allows easier manipulation which will be
needed later on to provide better management of default config values.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
26 Apr, 2007
2 commits
-
New sysctl tcp_frto_response is added to select amongst these
responses:
- Rate halving based; reuses CA_CWR state (default)
- Very conservative; used to be the only one available (=1)
- Undo cwr; undoes ssthresh and cwnd reductions (=2)The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
Signed-off-by: John Heffner
Signed-off-by: David S. Miller