Eric Lee / smarc-fsl-linux-kernel

15 Dec, 2011

1 commit

e6560d4df net: ping: remove some sparse errors ... Browse Code »

net/ipv4/sysctl_net_ipv4.c:78:6: warning: symbol 'inet_get_ping_group_range_table'
was not declared. Should it be static?

net/ipv4/sysctl_net_ipv4.c:119:31: warning: incorrect type in argument 2
(different signedness)
net/ipv4/sysctl_net_ipv4.c:119:31: expected int *range
net/ipv4/sysctl_net_ipv4.c:119:31: got unsigned int *

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-15 02:34:55 +0800

13 Dec, 2011

2 commits

3aaabe234 tcp buffer limitation: per-cgroup limit ... Browse Code »

This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
effectively control the amount of kernel memory pinned by a cgroup.

This value is ignored in the root cgroup, and in all others,
caps the value specified by the admin in the net namespaces'
view of tcp_sysctl_mem.

If namespaces are being used, the admin is allowed to set a
value bigger than cgroup's maximum, the same way it is allowed
to set pretty much unlimited values in a real box.

Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:11 +0800
3dc43e3e4 per-netns ipv4 sysctl_tcp_mem ... Browse Code »
129

This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:11 +0800

09 Jun, 2011

1 commit

4b9d9be83 inetpeer: remove unused list ... Browse Code »

Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.

Signed-off-by: Eric Dumazet
CC: Andi Kleen
CC: Tim Chen
Signed-off-by: David S. Miller

Eric Dumazet
2011-06-09 08:05:30 +0800

18 May, 2011

1 commit

f56e03e8d net: ping: fix build failure ... Browse Code »

If CONFIG_PROC_SYSCTL=n the building process fails:

ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'

Moved inet_get_ping_group_range_net() to ping.c.

Reported-by: Randy Dunlap
Signed-off-by: Vasiliy Kulikov
Acked-by: Eric Dumazet
Acked-by: Randy Dunlap
Signed-off-by: David S. Miller

Vasiliy Kulikov
2011-05-18 02:16:58 +0800

14 May, 2011

1 commit

c319b4d76 net: ipv4: add IPPROTO_ICMP socket kind ... Browse Code »

This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges. In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets. Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
- switched to flowi4.
- minor changes to be consistent with raw sockets code.

PATCH v2:
- changed ping_debug() to pr_debug().
- removed CONFIG_IP_PING.
- removed ping_seq_fops.owner field (unused for procfs).
- switched to proc_net_fops_create().
- switched to %pK in seq_printf().

PATCH v1:
- fixed checksumming bug.
- CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
- minor cleanups.
- introduced sysctl'able group range to restrict socket(2).

Signed-off-by: Vasiliy Kulikov
Signed-off-by: David S. Miller

Vasiliy Kulikov
2011-05-14 04:08:13 +0800

13 Apr, 2011

1 commit

192910a6c net: Do not wrap sysctl igmp_max_memberships in IP_MULTICAST ... Browse Code »

controlling igmp_max_membership is useful even when IP_MULTICAST
is off.
Quagga(an OSPF deamon) uses multicast addresses for all interfaces
using a single socket and hits igmp_max_membership limit when
there are 20 interfaces or more.
Always export sysctl igmp_max_memberships in proc, just like
igmp_max_msf

Signed-off-by: Joakim Tjernlund
Signed-off-by: David S. Miller

Joakim Tjernlund
2011-04-13 04:59:33 +0800

14 Dec, 2010

1 commit

249fab773 net: add limits to ip_default_ttl ... Browse Code »

ip_default_ttl should be between 1 and 255

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-12-14 04:16:14 +0800

13 Dec, 2010

1 commit

323e126f0 ipv4: Don't pre-seed hoplimit metric. ... Browse Code »

Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

This allowed several simplifications:

1) The interim dst_metric_hoplimit() can go as it's no longer
userd.

2) The sysctl_ip_default_ttl entry no longer needs to use
ipv4_doint_and_flush, since the sysctl is not cached in
routing cache metrics any longer.

3) ipv4_doint_and_flush no longer needs to be exported and
therefore can be marked static.

When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.

We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.h

Signed-off-by: David S. Miller

David S. Miller
2010-12-13 14:08:17 +0800

29 Nov, 2010

1 commit

0147fc058 tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) ... Browse Code »

tcp_win_from_space() does the following:

if (sysctl_tcp_adv_win_scale > (-sysctl_tcp_adv_win_scale);
else
return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
wget www.kernel.org
[softlockup]

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2010-11-29 02:39:45 +0800

11 Nov, 2010

1 commit

8d987e5c7 net: avoid limits overflow ... Browse Code »

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.

Signed-off-by: Eric Dumazet
Reported-by: Robin Holt
Reviewed-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Eric Dumazet
2010-11-11 04:12:00 +0800

16 May, 2010

1 commit

e3826f1e9 net: reserve ports for applications using fixed port numbers ... Browse Code »

(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila
Signed-off-by: WANG Cong
Cc: Neil Horman
Cc: Eric Dumazet
Cc: Eric W. Biederman
Signed-off-by: David S. Miller

Amerigo Wang
2010-05-16 14:28:40 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

19 Feb, 2010

2 commits

7e3801755 net: TCP thin dupack ... Browse Code »

This patch enables fast retransmissions after one dupACK for
TCP if the stream is identified as thin. This will reduce
latencies for thin streams that are not able to trigger fast
retransmissions due to high packet interarrival time. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin.

Signed-off-by: Andreas Petlund
Signed-off-by: David S. Miller

Andreas Petlund
2010-02-19 07:43:09 +0800
36e31b0af net: TCP thin linear timeouts ... Browse Code »

This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.

Signed-off-by: Andreas Petlund
Signed-off-by: David S. Miller

Andreas Petlund
2010-02-19 07:43:08 +0800

08 Dec, 2009

1 commit

d7fc02c7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...

Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c

Linus Torvalds
2009-12-08 23:55:01 +0800

03 Dec, 2009

1 commit

519855c50 TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS ... Browse Code »

Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.

Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.

Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).

This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

http://thread.gmane.org/gmane.linux.network/102586

These functions will also be used in subsequent patches that implement
additional features.

Requires:
net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller

William Allen Simpson
2009-12-03 14:07:24 +0800

26 Nov, 2009

1 commit

09ad9bc75 net: use net_eq to compare nets ... Browse Code »

Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.

Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller

Octavian Purdila
2009-11-26 07:14:13 +0800

12 Nov, 2009

1 commit

f8572d8f2 sysctl net: Remove unused binary sysctl code ... Browse Code »

Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller"
Cc: Hideaki YOSHIFUJI
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2009-11-12 18:05:06 +0800

24 Sep, 2009

1 commit

8d65af789 sysctl: remove "struct file *" argument of ->proc_handler ... Browse Code »

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-24 22:21:04 +0800

04 Nov, 2008

1 commit

6d9f239a1 net: '&' redux ... Browse Code »

I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.

So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2008-11-04 10:21:05 +0800

28 Oct, 2008

1 commit

1080d709f net: implement emergency route cache rebulds when gc_elasticity is exceeded ... Browse Code »

This is a patch to provide on demand route cache rebuilding. Currently, our
route cache is rebulid periodically regardless of need. This introduced
unneeded periodic latency. This patch offers a better approach. Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire. Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace. This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected. This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at all

Tested successfully by me.

Signed-off-by: Neil Horman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Neil Horman
2008-10-28 08:06:14 +0800

17 Oct, 2008

1 commit

f221e726b sysctl: simplify ->strategy ... Browse Code »

name and nlen parameters passed to ->strategy hook are unused, remove
them. In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).

Signed-off-by: Alexey Dobriyan
Acked-by: David S. Miller [ networking bits ]
Cc: Ralf Baechle
Cc: David Howells
Cc: Matt Mackall
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-10-17 02:21:47 +0800

09 Oct, 2008

1 commit

3c689b732 inet: cleanup of local_port_range ... Browse Code »

I noticed sysctl_local_port_range[] and its associated seqlock
sysctl_local_port_range_lock were on separate cache lines.
Moreover, sysctl_local_port_range[] was close to unrelated
variables, highly modified, leading to cache misses.

Moving these two variables in a structure can help data
locality and moving this structure to read_mostly section
helps sharing of this data among cpus.

Cleanup of extern declarations (moved in include file where
they belong), and use of inet_get_local_port_range()
accessor instead of direct access to ports values.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-10-09 05:18:04 +0800

04 Aug, 2008

1 commit

adf044c87 net: Add missing extra2 parameter for ip_default_ttl sysctl ... Browse Code »

Commit 76e6ebfb40a2455c18234dcb0f9df37533215461 ("netns: add namespace
parameter to rt_cache_flush") acceses the extra2 parameter of the
ip_default_ttl ctl_table, but it is never set to a meaningful
value. When e84f84f276473dcc673f360e8ff3203148bdf0e2 ("netns: place
rt_genid into struct net") is applied, we'll oops in
rt_cache_invalidate(). Set extra2 to init_net, to avoid that.

Reported-by: Marcin Slusarz
Signed-off-by: Sven Wegener
Tested-by: Marcin Slusarz
Acked-by: Denis V. Lunev
Signed-off-by: David S. Miller

Sven Wegener
2008-08-04 05:06:44 +0800

28 Jul, 2008

1 commit

eeb61f719 missing bits of net-namespace / sysctl ... Browse Code »

Piss-poor sysctl registration API strikes again, film at 11...

What we really need is _pathname_ required to be present in already
registered table, so that kernel could warn about bad order. That's the
next target for sysctl stuff (and generally saner and more explicit
order of initialization of ipv[46] internals wouldn't hurt either).

For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and make sure
that sufficient skeleton is there before we start registering per-net
sysctls.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-07-28 00:45:34 +0800

27 Jul, 2008

1 commit

bd7b1533c [PATCH] sysctl: make sure that /proc/sys/net/ipv4 appears before per-ns ones ... Browse Code »

Massage ipv4 initialization - make sure that net.ipv4 appears as
non-per-net-namespace before it shows up in per-net-namespace sysctls.
That's the only change outside of sysctl.c needed to get sane ordering
rules and data structures for sysctls (esp. for procfs side of that
mess).

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:10 +0800

02 Jul, 2008

1 commit

6dbf4bcac icmp: fix units for ratelimit ... Browse Code »

Convert the sysctl values for icmp ratelimit to use milliseconds instead
of jiffies which is based on kernel configured HZ.
Internal kernel jiffies are not a proper unit for any userspace API.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2008-07-02 10:29:07 +0800

12 Jun, 2008

1 commit

0b0408299 net: remove CVS keywords ... Browse Code »

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller

Adrian Bunk
2008-06-12 12:00:38 +0800

26 Mar, 2008

3 commits

68528f099 [NETNS][ICMP]: Make ctl tables for ICMP sysctls per-net. ... Browse Code »

Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net,
i.e. copy the table, alter .data pointers and register it per-net.

Other ipv4_table's sysctls are now global, but this is going to
change once sysctl permissions patches migrate from -mm tree to
mainline in 2.6.26 merge window :)

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-03-26 16:56:24 +0800
a24022e18 [NETNS][ICMP]: Move ICMP sysctls on struct net. ... Browse Code »

Initialization is moved to icmp_sk_init, all the places, that
refer to them use init_net for now.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-03-26 16:55:37 +0800
1577519d6 [NETNS][ICMP]: Register pernet subsys to make ICMP sysctls per-net. ... Browse Code »

This includes adding pernet_operations, empty init and exit
hooks and a bit of changes in sysctl_ipv4_init just not to
have this part in next patches.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-03-26 16:54:18 +0800

01 Feb, 2008

1 commit

16ca3f913 [TCP]: Fix a bug in strategy_allowed_congestion_control ... Browse Code »

In strategy_allowed_congestion_control of the 2.6.24 kernel, when
sysctl_string return 1 on success,it should call
tcp_set_allowed_congestion_control to set the allowed congestion
control.But, it don't. the sysctl_string return 1 on success,
otherwise return negative, never return 0.The patch fix the problem.

Signed-off-by: Shan Wei
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Shan Wei
2008-02-01 11:28:23 +0800

29 Jan, 2008

6 commits

8d8354d2f [NETNS][FRAGS]: Move ctl tables around. ... Browse Code »

This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.

This will simplify the later patches and will keep
similar things closer to each other.

ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:10:34 +0800
3d7cc2ba6 [NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modules ... Browse Code »

This includes the most simple cases for netfilter.

The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.

The conntrack module is also patched, but this hunk is
very small and simple.

Signed-off-by: Pavel Emelyanov
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 07:01:10 +0800
95766fff6 [UDP]: Add memory accounting. ... Browse Code »

Signed-off-by: Takahiro Yasui
Signed-off-by: Hideo Aoki
Signed-off-by: David S. Miller

Hideo Aoki
2008-01-29 07:00:19 +0800
68dd299bc [INET]: Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding ... Browse Code »

AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.

I propose to merge the handlers together using ctl paths.

The inet_forward_change() is static after this and I move
it higher to be closer to other "propagation" helpers and
to avoid diff making patches based on { and } matching :)
i.e. - make them easier to read.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 06:56:31 +0800
3e37c3f99 [IPV4]: Use ctl paths to register net/ipv4/ table ... Browse Code »

This is the same as I did for the net/core/ table in the
second patch in his series: use the paths and isolate the
whole table in the .c file.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 06:56:27 +0800
9ba639797 [IPV4]: Cleanup the sysctl_net_ipv4.c file ... Browse Code »

This includes several cleanups:

* tune Makefile to compile out this file when SYSCTL=n. Now
it looks like net/core/sysctl_net_core.c one;
* move the ipv4_config to af_inet.c to exist all the time;
* remove additional sysctl_ip_nonlocal_bind declaration
(it is already declared in net/ip.h);
* remove no nonger needed ifdefs from this file.

This is a preparation for using ctl paths for net/ipv4/
sysctl table.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 06:56:27 +0800

20 Nov, 2007

1 commit

5487796f0 [TCP]: Problem bug with sysctl_tcp_congestion_control function ... Browse Code »

From: "Sam Jansen"

sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.

sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.

Signed-off-by: David S. Miller

Sam Jansen
2007-11-20 15:28:21 +0800