14 Dec, 2010

1 commit


13 Dec, 2010

1 commit

  • Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

    This allowed several simplifications:

    1) The interim dst_metric_hoplimit() can go as it's no longer
    userd.

    2) The sysctl_ip_default_ttl entry no longer needs to use
    ipv4_doint_and_flush, since the sysctl is not cached in
    routing cache metrics any longer.

    3) ipv4_doint_and_flush no longer needs to be exported and
    therefore can be marked static.

    When ipv4_doint_and_flush_strategy was removed some time ago,
    the external declaration in ip.h was mistakenly left around
    so kill that off too.

    We have to move the sysctl_ip_default_ttl declaration into
    ipv4's route cache definition header net/route.h, because
    currently net/ip.h (where the declaration lives now) has
    a back dependency on net/route.h

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Nov, 2010

1 commit

  • tcp_win_from_space() does the following:

    if (sysctl_tcp_adv_win_scale > (-sysctl_tcp_adv_win_scale);
    else
    return space - (space >> sysctl_tcp_adv_win_scale);

    "space" is int.

    As per C99 6.5.7 (3) shifting int for 32 or more bits is
    undefined behaviour.

    Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
    space >> 32 equals space and function returns 0.

    Which means we busyloop in tcp_fixup_rcvbuf().

    Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

    Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

    Steps to reproduce:

    echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
    wget www.kernel.org
    [softlockup]

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

11 Nov, 2010

1 commit

  • Robin Holt tried to boot a 16TB machine and found some limits were
    reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

    We can switch infrastructure to use long "instead" of "int", now
    atomic_long_t primitives are available for free.

    Signed-off-by: Eric Dumazet
    Reported-by: Robin Holt
    Reviewed-by: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 May, 2010

1 commit

  • (Dropped the infiniband part, because Tetsuo modified the related code,
    I will send a separate patch for it once this is accepted.)

    This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
    allows users to reserve ports for third-party applications.

    The reserved ports will not be used by automatic port assignments
    (e.g. when calling connect() or bind() with port number 0). Explicit
    port allocation behavior is unchanged.

    Signed-off-by: Octavian Purdila
    Signed-off-by: WANG Cong
    Cc: Neil Horman
    Cc: Eric Dumazet
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Amerigo Wang
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

19 Feb, 2010

2 commits

  • This patch enables fast retransmissions after one dupACK for
    TCP if the stream is identified as thin. This will reduce
    latencies for thin streams that are not able to trigger fast
    retransmissions due to high packet interarrival time. This
    mechanism is only active if enabled by iocontrol or syscontrol
    and the stream is identified as thin.

    Signed-off-by: Andreas Petlund
    Signed-off-by: David S. Miller

    Andreas Petlund
     
  • This patch will make TCP use only linear timeouts if the
    stream is thin. This will help to avoid the very high latencies
    that thin stream suffer because of exponential backoff. This
    mechanism is only active if enabled by iocontrol or syscontrol
    and the stream is identified as thin. A maximum of 6 linear
    timeouts is tried before exponential backoff is resumed.

    Signed-off-by: Andreas Petlund
    Signed-off-by: David S. Miller

    Andreas Petlund
     

08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     

03 Dec, 2009

1 commit

  • Define sysctl (tcp_cookie_size) to turn on and off the cookie option
    default globally, instead of a compiled configuration option.

    Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
    data values, retrieving variable cookie values, and other facilities.

    Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
    near its corresponding struct tcp_options_received (prior to changes).

    This is a straightforward re-implementation of an earlier (year-old)
    patch that no longer applies cleanly, with permission of the original
    author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

    These functions will also be used in subsequent patches that implement
    additional features.

    Requires:
    net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED

    Signed-off-by: William.Allen.Simpson@gmail.com
    Signed-off-by: David S. Miller

    William Allen Simpson
     

26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

12 Nov, 2009

1 commit

  • Now that sys_sysctl is a compatiblity wrapper around /proc/sys
    all sysctl strategy routines, and all ctl_name and strategy
    entries in the sysctl tables are unused, and can be
    revmoed.

    In addition neigh_sysctl_register has been modified to no longer
    take a strategy argument and it's callers have been modified not
    to pass one.

    Cc: "David Miller"
    Cc: Hideaki YOSHIFUJI
    Cc: netdev@vger.kernel.org
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

04 Nov, 2008

1 commit

  • I want to compile out proc_* and sysctl_* handlers totally and
    stub them to NULL depending on config options, however usage of &
    will prevent this, since taking adress of NULL pointer will break
    compilation.

    So, drop & in front of every ->proc_handler and every ->strategy
    handler, it was never needed in fact.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

28 Oct, 2008

1 commit

  • This is a patch to provide on demand route cache rebuilding. Currently, our
    route cache is rebulid periodically regardless of need. This introduced
    unneeded periodic latency. This patch offers a better approach. Using code
    provided by Eric Dumazet, we compute the standard deviation of the average hash
    bucket chain length while running rt_check_expire. Should any given chain
    length grow to larger that average plus 4 standard deviations, we trigger an
    emergency hash table rebuild for that net namespace. This allows for the common
    case in which chains are well behaved and do not grow unevenly to not incur any
    latency at all, while those systems (which may be being maliciously attacked),
    only rebuild when the attack is detected. This patch take 2 other factors into
    account:
    1) chains with multiple entries that differ by attributes that do not affect the
    hash value are only counted once, so as not to unduly bias system to rebuilding
    if features like QOS are heavily used
    2) if rebuilding crosses a certain threshold (which is adjustable via the added
    sysctl in this patch), route caching is disabled entirely for that net
    namespace, since constant rebuilding is less efficient that no caching at all

    Tested successfully by me.

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

17 Oct, 2008

1 commit

  • name and nlen parameters passed to ->strategy hook are unused, remove
    them. In general ->strategy hook should know what it's doing, and don't
    do something tricky for which, say, pointer to original userspace array
    may be needed (name).

    Signed-off-by: Alexey Dobriyan
    Acked-by: David S. Miller [ networking bits ]
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: Matt Mackall
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 Oct, 2008

1 commit

  • I noticed sysctl_local_port_range[] and its associated seqlock
    sysctl_local_port_range_lock were on separate cache lines.
    Moreover, sysctl_local_port_range[] was close to unrelated
    variables, highly modified, leading to cache misses.

    Moving these two variables in a structure can help data
    locality and moving this structure to read_mostly section
    helps sharing of this data among cpus.

    Cleanup of extern declarations (moved in include file where
    they belong), and use of inet_get_local_port_range()
    accessor instead of direct access to ports values.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Aug, 2008

1 commit

  • Commit 76e6ebfb40a2455c18234dcb0f9df37533215461 ("netns: add namespace
    parameter to rt_cache_flush") acceses the extra2 parameter of the
    ip_default_ttl ctl_table, but it is never set to a meaningful
    value. When e84f84f276473dcc673f360e8ff3203148bdf0e2 ("netns: place
    rt_genid into struct net") is applied, we'll oops in
    rt_cache_invalidate(). Set extra2 to init_net, to avoid that.

    Reported-by: Marcin Slusarz
    Signed-off-by: Sven Wegener
    Tested-by: Marcin Slusarz
    Acked-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Sven Wegener
     

28 Jul, 2008

1 commit

  • Piss-poor sysctl registration API strikes again, film at 11...

    What we really need is _pathname_ required to be present in already
    registered table, so that kernel could warn about bad order. That's the
    next target for sysctl stuff (and generally saner and more explicit
    order of initialization of ipv[46] internals wouldn't hurt either).

    For the time being, here are full fixups required by ..._rotable()
    stuff; we make per-net sysctl sets descendents of "ro" one and make sure
    that sufficient skeleton is there before we start registering per-net
    sysctls.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

27 Jul, 2008

1 commit


02 Jul, 2008

1 commit

  • Convert the sysctl values for icmp ratelimit to use milliseconds instead
    of jiffies which is based on kernel configured HZ.
    Internal kernel jiffies are not a proper unit for any userspace API.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

12 Jun, 2008

1 commit


26 Mar, 2008

3 commits


01 Feb, 2008

1 commit

  • In strategy_allowed_congestion_control of the 2.6.24 kernel, when
    sysctl_string return 1 on success,it should call
    tcp_set_allowed_congestion_control to set the allowed congestion
    control.But, it don't. the sysctl_string return 1 on success,
    otherwise return negative, never return 0.The patch fix the problem.

    Signed-off-by: Shan Wei
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Shan Wei
     

29 Jan, 2008

6 commits

  • This is a preparation for sysctl netns-ization.
    Move the ctl tables to the files, where the tuning
    variables reside. Plus make the helpers to register
    the tables.

    This will simplify the later patches and will keep
    similar things closer to each other.

    ipv4, ipv6 and conntrack_reasm are patched differently,
    but the result is all the tables are in appropriate files.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This includes the most simple cases for netfilter.

    The first part is tne queue modules for ipv4 and ipv6,
    on which the net/ipv4/ and net/ipv6/ paths are reused
    from the appropriate ipv4 and ipv6 code.

    The conntrack module is also patched, but this hunk is
    very small and simple.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Signed-off-by: Takahiro Yasui
    Signed-off-by: Hideo Aoki
    Signed-off-by: David S. Miller

    Hideo Aoki
     
  • AFAIS these two entries should do the same thing - change the
    forwarding state on ipv4_devconf and on all the devices.

    I propose to merge the handlers together using ctl paths.

    The inet_forward_change() is static after this and I move
    it higher to be closer to other "propagation" helpers and
    to avoid diff making patches based on { and } matching :)
    i.e. - make them easier to read.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This is the same as I did for the net/core/ table in the
    second patch in his series: use the paths and isolate the
    whole table in the .c file.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This includes several cleanups:

    * tune Makefile to compile out this file when SYSCTL=n. Now
    it looks like net/core/sysctl_net_core.c one;
    * move the ipv4_config to af_inet.c to exist all the time;
    * remove additional sysctl_ip_nonlocal_bind declaration
    (it is already declared in net/ip.h);
    * remove no nonger needed ifdefs from this file.

    This is a preparation for using ctl paths for net/ipv4/
    sysctl table.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

20 Nov, 2007

1 commit

  • From: "Sam Jansen"

    sysctl_tcp_congestion_control seems to have a bug that prevents it
    from actually calling the tcp_set_default_congestion_control
    function. This is not so apparent because it does not return an error
    and generally the /proc interface is used to configure the default TCP
    congestion control algorithm. This is present in 2.6.18 onwards and
    probably earlier, though I have not inspected 2.6.15--2.6.17.

    sysctl_tcp_congestion_control calls sysctl_string and expects a successful
    return code of 0. In such a case it actually sets the congestion control
    algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
    value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
    returned 0 on success. However, sysctl_string was updated to return 1 on
    success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
    Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
    converts this return code to '0', so the caller never notices the error.

    Signed-off-by: David S. Miller

    Sam Jansen
     

19 Oct, 2007

2 commits

  • There is a justifying patch for Stephen's patches. Stephen's patches
    disallows using a port range of one single port and brakes the meaning
    of the 'remaining' variable, in some places it has different meaning.
    My patch gives back the sense of 'remaining' variable. It should mean
    how many ports are remaining and nothing else. Also my patch allows
    using a single port.

    I sure we must be able to use mentioned port range, this does not
    restricted by documentation and does not brake current behavior.

    usefull links:
    Patches posted by Stephen Hemminger
    http://marc.info/?l=linux-netdev&m=119206106218187&w=2
    http://marc.info/?l=linux-netdev&m=119206109918235&w=2

    Andrew Morton's comment
    http://marc.info/?l=linux-kernel&m=119248225007737&w=2

    1. Allows using a port range of one single port.
    2. Gives back sense of 'remaining' variable.

    Signed-off-by: Anton Arapov
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Anton Arapov
     
  • Currently tcp_available_congestion_control does not even attempt being read
    from sys_sysctl, and ipfrag_max_dist while it works allows setting of invalid
    values using sys_sysctl.

    So just kill the binary sys_sysctl support for these sysctls. If the support
    is not important enough to test and get right it probably isn't important
    enough to keep.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

16 Oct, 2007

1 commit

  • Some sysctl variables are used to tune the frag queues
    management and it will be useful to work with them in
    a common way in the future, so move them into one
    structure, moreover they are the same for all the frag
    management codes.

    I don't place them in the existing inet_frags object,
    introduced in the previous patch for two reasons:

    1. to keep them in the __read_mostly section;
    2. not to export the whole inet_frags objects outside.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

11 Oct, 2007

1 commit

  • Expansion of original idea from Denis V. Lunev

    Add robustness and locking to the local_port_range sysctl.
    1. Enforce that low < high when setting.
    2. Use seqlock to ensure atomic update.

    The locking might seem like overkill, but there are
    cases where sysadmin might want to change value in the
    middle of a DoS attack.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

08 Jun, 2007

1 commit

  • This patch converts the ipv4_devconf config members (everything except
    sysctl) to an array. This allows easier manipulation which will be
    needed later on to provide better management of default config values.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

26 Apr, 2007

2 commits