28 Jun, 2016

1 commit

  • This works in exactly the same way as the CIPSO label cache.
    The idea is to allow the lsm to cache the result of a secattr
    lookup so that it doesn't need to perform the lookup for
    every skbuff.

    It introduces two sysctl controls:
    calipso_cache_enable - enables/disables the cache.
    calipso_cache_bucket_size - sets the size of a cache bucket.

    Signed-off-by: Huw Davies
    Signed-off-by: Paul Moore

    Huw Davies
     

01 Aug, 2015

1 commit

  • Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
    automatic flow labels generation. There are four modes:

    0: flow labels are disabled
    1: flow labels are enabled, sockets can opt-out
    2: flow labels are allowed, sockets can opt-in
    3: flow labels are enabled and enforced, no opt-out for sockets

    np->autoflowlabel is initialized according to the sysctl value.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

10 Jul, 2015

1 commit

  • Add support to allow non-local binds similar to how this was done for IPv4.
    Non-local binds are very useful in emulating the Internet in a box, etc.

    This add the ip_nonlocal_bind sysctl under ipv6.

    Testing:

    Set up nonlocal binding and receive routing on a host, e.g.:

    ip -6 rule add from ::/0 iif eth0 lookup 200
    ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
    sysctl -w net.ipv6.ip_nonlocal_bind=1

    Set up routing to 2001:0:0:1::/64 on peer to go to first host

    ping6 -I 2001:0:0:1::1 peer-address -- to verify

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

04 May, 2015

1 commit

  • This patch divides the IPv6 flow label space into two ranges:
    0-7ffff is reserved for flow label manager, 80000-fffff will be
    used for creating auto flow labels (per RFC6438). This only affects how
    labels are set on transmit, it does not affect receive. This range split
    can be disbaled by systcl.

    Background:

    IPv6 flow labels have been an unmitigated disappointment thus far
    in the lifetime of IPv6. Support in HW devices to use them for ECMP
    is lacking, and OSes don't turn them on by default. If we had these
    we could get much better hashing in IPv6 networks without resorting
    to DPI, possibly eliminating some of the motivations to to define new
    encaps in UDP just for getting ECMP.

    Unfortunately, the initial specfications of IPv6 did not clarify
    how they are to be used. There has always been a vague concept that
    these can be used for ECMP, flow hashing, etc. and we do now have a
    good standard how to this in RFC6438. The problem is that flow labels
    can be either stateful or stateless (as in RFC6438), and we are
    presented with the possibility that a stateless label may collide
    with a stateful one. Attempts to split the flow label space were
    rejected in IETF. When we added support in Linux for RFC6438, we
    could not turn on flow labels by default due to this conflict.

    This patch splits the flow label space and should give us
    a path to enabling auto flow labels by default for all IPv6 packets.
    This is an API change so we need to consider compatibility with
    existing deployment. The stateful range is chosen to be the lower
    values in hopes that most uses would have chosen small numbers.

    Once we resolve the stateless/stateful issue, we can proceed to
    look at enabling RFC6438 flow labels by default (starting with
    scaled testing).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

01 Apr, 2015

1 commit

  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x == NULL and sometimes as !x. !x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

24 Mar, 2015

1 commit


05 Sep, 2014

1 commit

  • This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query
    robustness variable. It specifies how many retransmit of unsolicited mld
    retransmit should happen. Admins might want to tune this on lossy links.

    Also reset mld state on interface down/up, so we pick up new sysctl
    settings during interface up event.

    IPv6 certification requests this knob to be available.

    I didn't make this knob netns specific, as it is mostly a setting in a
    physical environment and should be per host.

    Cc: Flavio Leitner
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Flavio Leitner
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

06 Aug, 2014

1 commit

  • Conflicts:
    drivers/net/Makefile
    net/ipv6/sysctl_net_ipv6.c

    Two ipv6_table_template[] additions overlap, so the index
    of the ipv6_table[x] assignments needed to be adjusted.

    In the drivers/net/Makefile case, we've gotten rid of the
    garbage whereby we had to list every single USB networking
    driver in the top-level Makefile, there is just one
    "USB_NETWORKING" that guards everything.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Aug, 2014

1 commit


08 Jul, 2014

1 commit

  • Automatically generate flow labels for IPv6 packets on transmit.
    The flow label is computed based on skb_get_hash. The flow label will
    only automatically be set when it is zero otherwise (i.e. flow label
    manager hasn't set one). This supports the transmit side functionality
    of RFC 6438.

    Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
    system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
    functionality per socket.

    By default, auto flowlabels are disabled to avoid possible conflicts
    with flow label manager, however if this feature proves useful we
    may want to enable it by default.

    It should also be noted that FreeBSD has already implemented automatic
    flow labels (including the sysctl and socket option). In FreeBSD,
    automatic flow labels default to enabled.

    Performance impact:

    Running super_netperf with 200 flows for TCP_RR and UDP_RR for
    IPv6. Note that in UDP case, __skb_get_hash will be called for
    every packet with explains slight regression. In the TCP case
    the hash is saved in the socket so there is no regression.

    Automatic flow labels disabled:

    TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

    UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

    Automatic flow labels enabled:

    TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

    UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

14 May, 2014

1 commit

  • Kernel-originated IP packets that have no user socket associated
    with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
    are emitted with a mark of zero. Add a sysctl to make them have
    the same mark as the packet they are replying to.

    This allows an administrator that wishes to do so to use
    mark-based routing, firewalling, etc. for these replies by
    marking the original packets inbound.

    Tested using user-mode linux:
    - ICMP/ICMPv6 echo replies and errors.
    - TCP RST packets (IPv4 and IPv6).

    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

20 Jan, 2014

1 commit

  • With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
    flow label unicity. This patch introduces a new sysctl to protect the old
    behaviour, enable by default.

    Changelog of V3:
    * rename ip6_flowlabel_consistency to flowlabel_consistency
    * use net_info_ratelimited()
    * checkpatch cleanups

    Signed-off-by: Florent Fourcot
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

15 Jan, 2014

1 commit


08 Jan, 2014

1 commit

  • This change allows to follow a recommandation of RFC4942.

    - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
    as source addresses for ICMPv6 echo reply. This sysctl is false by default
    to preserve existing behavior.
    - Add inline check ipv6_anycast_destination().
    - Use them in icmpv6_echo_reply().

    Reference:
    RFC4942 - IPv6 Transition/Coexistence Security Considerations
    (http://tools.ietf.org/html/rfc4942#section-2.1.6)

    2.1.6. Anycast Traffic Identification and Security

    [...]
    To avoid exposing knowledge about the internal structure of the
    network, it is recommended that anycast servers now take advantage of
    the ability to return responses with the anycast address as the
    source address if possible.

    Signed-off-by: Francois-Xavier Le Bail
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    FX Le Bail
     

13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

21 Apr, 2012

5 commits

  • We don't use struct ctl_path anymore so delete the exported constants.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • The sysctl core no longer natively understands sysctl tables
    with .child entries.

    Split the ipv6_table to remove the .child entries.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • sysctl no longer requires explicit creation of directories. The neigh
    directory is always populated with at least a default entry so this
    should cause no user visible changes.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This makes it clearer which sysctls are relative to your current network
    namespace.

    This makes it a little less error prone by not exposing sysctls for the
    initial network namespace in other namespaces.

    This is the same way we handle all of our other network interfaces to
    userspace and I can't honestly remember why we didn't do this for
    sysctls right from the start.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • register_sysctl_rotable never caught on as an interesting way to
    register sysctls. My take on the situation is that what we want are
    sysctls that we can only see in the initial network namespace. What we
    have implemented with register_sysctl_rotable are sysctls that we can
    see in all of the network namespaces and can only change in the initial
    network namespace.

    That is a very silly way to go. Just register the network sysctls
    in the initial network namespace and we don't have any weird special
    cases to deal with.

    The sysctls affected are:
    /proc/sys/net/ipv4/ipfrag_secret_interval
    /proc/sys/net/ipv4/ipfrag_max_dist
    /proc/sys/net/ipv6/ip6frag_secret_interval
    /proc/sys/net/ipv6/mld_max_msf

    I really don't expect anyone will miss them if they can't read them in a
    child user namespace.

    CC: Pavel Emelyanov
    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Nov, 2011

1 commit


22 Mar, 2011

1 commit

  • When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
    by adding a mount point it appears I missed a critical ordering issue, in the
    ipv6 initialization. I had not realized that ipv6_sysctl_register is called
    at the very end of the ipv6 initialization and in particular after we call
    neigh_sysctl_register from ndisc_init.

    "neigh" needs to be initialized in ipv6_static_sysctl_register which is
    the first ipv6 table to initialized, and definitely before ndisc_init.
    This removes the weirdness of duplicate tables while still providing a
    "neigh" mount point which prevents races in sysctl unregistering.

    This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
    Reported-by: sunkan@zappa.cx
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Feb, 2011

1 commit

  • In my testing of 2.6.37 I was occassionally getting a warning about
    sysctl table entries being unregistered in the wrong order. Digging
    in it turns out this dates back to the last great sysctl reorg done
    where Al Viro introduced the requirement that sysctl directories
    needed to be created before and destroyed after the files in them.

    It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
    overlooked. So this patch fixes that oversight and makes an annoying
    warning message go away.

    >------------[ cut here ]------------
    >WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
    >Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
    >Call Trace:
    > [] warn_slowpath_common+0x80/0x98
    > [] warn_slowpath_null+0x15/0x17
    > [] unregister_sysctl_table+0x134/0x164
    > [] ? kfree+0xc4/0xd1
    > [] neigh_sysctl_unregister+0x22/0x3a
    > [] addrconf_ifdown+0x33f/0x37b [ipv6]
    > [] ? skb_dequeue+0x5f/0x6b
    > [] addrconf_notify+0x69b/0x75c [ipv6]
    > [] ? ip6mr_device_event+0x98/0xa9 [ipv6]
    > [] notifier_call_chain+0x32/0x5e
    > [] raw_notifier_call_chain+0xf/0x11
    > [] call_netdevice_notifiers+0x45/0x4a
    > [] rollback_registered_many+0x118/0x201
    > [] unregister_netdevice_many+0x16/0x6d
    > [] default_device_exit_batch+0xa4/0xb8
    > [] ? cleanup_net+0x0/0x194
    > [] ops_exit_list+0x4e/0x56
    > [] cleanup_net+0xf4/0x194
    > [] process_one_work+0x187/0x280
    > [] worker_thread+0xff/0x19f
    > [] ? worker_thread+0x0/0x19f
    > [] kthread+0x7d/0x85
    > [] kernel_thread_helper+0x4/0x10
    > [] ? kthread+0x0/0x85
    > [] ? kernel_thread_helper+0x0/0x10
    >---[ end trace 8a7e9310b35e9486 ]---

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

18 Jan, 2010

1 commit


12 Nov, 2009

1 commit

  • Now that sys_sysctl is a compatiblity wrapper around /proc/sys
    all sysctl strategy routines, and all ctl_name and strategy
    entries in the sysctl tables are unused, and can be
    revmoed.

    In addition neigh_sysctl_register has been modified to no longer
    take a strategy argument and it's callers have been modified not
    to pass one.

    Cc: "David Miller"
    Cc: Hideaki YOSHIFUJI
    Cc: netdev@vger.kernel.org
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

03 Aug, 2009

1 commit

  • This renames away a variable clash:
    * ipv6_table[] is declared as a static global table;
    * ipv6_sysctl_net_init() uses ipv6_table to refer/destroy dynamic memory;
    * ipv6_sysctl_net_exit() also uses ipv6_table for the same purpose;
    * both the two last functions call kfree() on ipv6_table.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

09 Jan, 2009

1 commit

  • Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: Theodore Ts'o
    Acked-by: Mark Fasheh
    Acked-by: David S. Miller
    Cc: James Morris
    Acked-by: Casey Schaufler
    Acked-by: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fernando Carrijo
     

04 Nov, 2008

1 commit

  • I want to compile out proc_* and sysctl_* handlers totally and
    stub them to NULL depending on config options, however usage of &
    will prevent this, since taking adress of NULL pointer will break
    compilation.

    So, drop & in front of every ->proc_handler and every ->strategy
    handler, it was never needed in fact.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

26 Aug, 2008

1 commit


28 Jul, 2008

1 commit

  • Piss-poor sysctl registration API strikes again, film at 11...

    What we really need is _pathname_ required to be present in already
    registered table, so that kernel could warn about bad order. That's the
    next target for sysctl stuff (and generally saner and more explicit
    order of initialization of ipv[46] internals wouldn't hurt either).

    For the time being, here are full fixups required by ..._rotable()
    stuff; we make per-net sysctl sets descendents of "ro" one and make sure
    that sufficient skeleton is there before we start registering per-net
    sysctls.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

20 May, 2008

1 commit


04 Mar, 2008

1 commit


28 Feb, 2008

1 commit


29 Jan, 2008

6 commits

  • This is a preparation for sysctl netns-ization.
    Move the ctl tables to the files, where the tuning
    variables reside. Plus make the helpers to register
    the tables.

    This will simplify the later patches and will keep
    similar things closer to each other.

    ipv4, ipv6 and conntrack_reasm are patched differently,
    but the result is all the tables are in appropriate files.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Fix the following sparse warnings:
    | net/ipv6/route.c:2491:18: warning: symbol 'ipv6_route_sysctl_init' was not declared. Should it be static?
    | net/ipv6/icmp.c:922:18: warning: symbol 'ipv6_icmp_sysctl_init' was not declared. Should it be static?
    | net/ipv6/reassembly.c:628:6: warning: symbol 'ipv6_frag_sysctl_init' was not declared. Should it be static?

    Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • This patch moves the icmpv6_time sysctl to the network namespace
    structure.

    Because the ipv6 protocol is not yet per namespace, the variable is
    accessed relatively to the initial network namespace.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • All the sysctl concerning the routes are moved to the network
    namespace structure. A helper function is called to initialize the
    variables.

    Because the ipv6 protocol is not yet per namespace, the variables are
    accessed relatively from the network namespace.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • The mld_max_msf protects the system with a maximum allowed multicast
    source filters. Making this variable per namespace can be potentially
    an problem if someone inside a namespace set it to a big value, that
    will impact the whole system including other namespaces.

    I don't see any benefits to have it per namespace for now, so in order
    to keep a directory entry in a newly created namespace, I make it
    read-only when we are not in the initial network namespace.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • The ip6_frags is moved to the network namespace structure. Because
    there can be multiple instances of the network namespaces, and the
    ip6_frags is no longer a global static variable, a helper function has
    been added to facilitate the initialization of the variables.

    Until the ipv6 protocol is not per namespace, the variables are
    accessed relatively from the initial network namespace.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano