15 Jan, 2012

1 commit

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     

13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Jan, 2012

1 commit


05 Jan, 2012

2 commits

  • This ensures a linear behaviour when filling /proc/net/if_inet6 thus making
    ifconfig run really fast on IPv6 only addresses. In fact, with this patch and
    the IPv4 one sent a while ago, ifconfig will run in linear time regardless of
    address type.

    IPv4 related patch: f04565ddf52e401880f8ba51de0dff8ba51c99fd
    dev: use name hash for dev_seq_ops
    ...

    Some statistics (running ifconfig > /dev/null on a different setup):

    iface count / IPv6 no-patch time / IPv6 patched time / IPv4 time
    ----------------------------------------------------------------
    6250 | 0.23 s | 0.13 s | 0.11 s
    12500 | 0.62 s | 0.28 s | 0.22 s
    25000 | 2.91 s | 0.57 s | 0.46 s
    50000 | 11.37 s | 1.21 s | 0.94 s
    128000 | 86.78 s | 3.05 s | 2.54 s

    Signed-off-by: Mihai Maruseac
    Cc: Daniel Baluta
    Signed-off-by: David S. Miller

    Mihai Maruseac
     
  • Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop
    route for the interface we're adding an addres to was wrong (see commit
    7ffbcecbeed91e5874e9a1cfc4c0cbb07dac3069). for one, it never triggers, and two,
    it was completely wrong to begin with. This test was meant to cover this
    section of RFC 4429:

    3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration

    * (modifies section 5.5) A host MAY choose to configure a new address
    as an Optimistic Address. A host that does not know the SLLAO
    of its router SHOULD NOT configure a new address as Optimistic.
    A router SHOULD NOT configure an Optimistic Address.

    This patch should bring us into proper compliance with the above clause. Since
    we only add a SLAAC address after we've received a RA which may or may not
    contain a source link layer address option, we can pass a pointer to that option
    to addrconf_prefix_rcv (which may be null if the option is not present), and
    only set the optimistic flag if the option was found in the RA.

    Change notes:
    (v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than
    a pointer to make its use more clear as per request from davem.

    Signed-off-by: Neil Horman
    CC: "David S. Miller"
    CC: Hideaki YOSHIFUJI
    Signed-off-by: David S. Miller

    Neil Horman
     

31 Dec, 2011

1 commit

  • During some debugging I needed to look into how /proc/net/ipv6_route
    operated and in my digging I found its calling fib6_clean_all() which uses
    "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
    found this on 2.6.32, but reading the code I believe the same basic idea
    exists currently. Looking at the rtnetlink code they are only calling
    "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
    reading from proc isn't the recommended way of fetching the ipv6 route
    table; taking a write lock seems unnecessary and would probably cause
    network performance issues.

    To verify this I loaded up the ipv6 route table and then ran iperf in 3
    cases:
    * doing nothing
    * reading ipv6 route table via proc
    (while :; do cat /proc/net/ipv6_route > /dev/null; done)
    * reading ipv6 route table via rtnetlink
    (while :; do ip -6 route show table all > /dev/null; done)

    * Load the ipv6 route table up with:
    * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done

    * iperf commands:
    * client: iperf -i 1 -V -c
    * server: iperf -V -s

    * iperf results - 3 runs each (in Mbits/sec)
    * nothing: client: 927,927,927 server: 927,927,927
    * proc: client: 179,97,96,113 server: 142,112,133
    * iproute: client: 928,927,928 server: 927,927,927

    lock_stat shows taking the write lock is causing the slowdown. Using this
    info I decided to write a version of fib6_clean_all() which replaces
    write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
    this new function I see the same results as with my rtnetlink iperf test.

    Signed-off-by: Josh Hunt
    Signed-off-by: David S. Miller

    Josh Hunt
     

30 Dec, 2011

2 commits


29 Dec, 2011

4 commits


27 Dec, 2011

1 commit


25 Dec, 2011

1 commit


24 Dec, 2011

1 commit


23 Dec, 2011

1 commit

  • Chris Boot reported crashes occurring in ipv6_select_ident().

    [ 461.457562] RIP: 0010:[] []
    ipv6_select_ident+0x31/0xa7

    [ 461.578229] Call Trace:
    [ 461.580742]
    [ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
    [ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
    [ 461.595140] [] ? skb_gso_segment+0x208/0x28b
    [ 461.601198] [] ? ipv6_confirm+0x146/0x15e
    [nf_conntrack_ipv6]
    [ 461.608786] [] ? nf_iterate+0x41/0x77
    [ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
    [ 461.620659] [] ? nf_hook_slow+0x73/0x111
    [ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
    [bridge]
    [ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
    [ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
    [bridge]
    [ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
    [bridge]
    [ 461.653997] [] ? nf_iterate+0x41/0x77
    [ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.665485] [] ? nf_hook_slow+0x73/0x111
    [ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.677299] [] ?
    nf_bridge_update_protocol+0x20/0x20 [bridge]
    [ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
    [ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
    [bridge]
    [ 461.704616] [] ?
    nf_bridge_push_encap_header+0x1c/0x26 [bridge]
    [ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
    [bridge]
    [ 461.719490] [] ?
    nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
    [ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
    [ 461.734292] [] ? nf_iterate+0x41/0x77
    [ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.746203] [] ? nf_hook_slow+0x73/0x111
    [ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
    [bridge]

    This is caused by bridge netfilter special dst_entry (fake_rtable), a
    special shared entry, where attaching an inetpeer makes no sense.

    Problem is present since commit 87c48fa3b46 (ipv6: make fragment
    identifications less predictable)

    Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
    __ip_select_ident() fallback to the 'no peer attached' handling.

    Reported-by: Chris Boot
    Tested-by: Chris Boot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Dec, 2011

1 commit

  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     

16 Dec, 2011

1 commit


14 Dec, 2011

2 commits

  • After commit 8e2ec639173f325977818c45011ee176ef2b11f6 ("ipv6: don't
    use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
    for setting the ANYCAST flag is now wrong.

    'rt' will always now have a plen of 128, because it is set explicitly
    to 128 by ip6_rt_copy.

    So to restore the semantics of the test, check the destination prefix
    length of 'ort'.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Don't just succeed with a route that has a NULL neighbour attached.
    This follows the behavior of addrconf_dst_alloc().

    Allowing this kind of route to end up with a NULL neigh attached will
    result in packet drops on output until the route is somehow
    invalidated, since nothing will meanwhile try to lookup the neigh
    again.

    A statistic is bumped for the case where we see a neigh-less route on
    output, but the resulting packet drop is otherwise silent in nature,
    and frankly it's a hard error for this to happen and ipv6 should do
    what ipv4 does which is say something in the kernel logs.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Dec, 2011

6 commits

  • This is not merged with the ipv4 match into xt_rpfilter.c
    to avoid ipv6 module dependency issues.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This patch allows each namespace to independently set up
    its levels for tcp memory pressure thresholds. This patch
    alone does not buy much: we need to make this values
    per group of process somehow. This is achieved in the
    patches that follows in this patchset.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces memory pressure controls for the tcp
    protocol. It uses the generic socket memory pressure code
    introduced in earlier patches, and fills in the
    necessary data in cg_proto struct.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch replaces all uses of struct sock fields' memory_pressure,
    memory_allocated, sockets_allocated, and sysctl_mem to acessor
    macros. Those macros can either receive a socket argument, or a mem_cgroup
    argument, depending on the context they live in.

    Since we're only doing a macro wrapping here, no performance impact at all is
    expected in the case where we don't have cgroups disabled.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • Same fix as 731abb9cb2 for ipip and sit tunnel.
    Commit 1c5cae815d removed an explicit call to dev_alloc_name in
    ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
    will now create a valid name, however the tunnel keeps a copy of the
    name in the private parms structure. Fix this by copying the name back
    after register_netdevice has successfully returned.

    This shows up if you do a simple tunnel add, followed by a tunnel show:

    $ sudo ip tunnel add mode ipip remote 10.2.20.211
    $ ip tunnel
    tunl0: ip/ip remote any local any ttl inherit nopmtudisc
    tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit
    $ sudo ip tunnel add mode sit remote 10.2.20.212
    $ ip tunnel
    sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16
    sit%d: ioctl 89f8 failed: No such device
    sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit

    Cc: stable@vger.kernel.org
    Signed-off-by: Ted Feng
    Signed-off-by: David S. Miller

    Ted Feng
     
  • There is no obvious reason to add a default multicast route for loopback
    devices, otherwise there would be a route entry whose dst.error set to
    -ENETUNREACH that would blocking all multicast packets.

    ====================

    [ more detailed explanation ]

    The problem is that the resulting routing table depends on the sequence
    of interface's initialization and in some situation, that would block all
    muticast packets. Suppose there are two interfaces on my computer
    (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
    table(for multicast) would be

    # ip -6 route show | grep ff00::
    unreachable ff00::/8 dev lo metric 256 error -101
    ff00::/8 dev eth0 metric 256

    When sending multicasting packets, routing subsystem will return the first
    route entry which with a error set to -101(ENETUNREACH).

    I know the kernel will set the default ipv6 address for 'lo' when it is up
    and won't set the default multicast route for it, but there is no reason to
    stop 'init' program from setting address for 'lo', and that is exactly what
    systemd did.

    I am sure there is something wrong with kernel or systemd, currently I preferred
    kernel caused this problem.

    ====================

    Signed-off-by: Li Wei
    Signed-off-by: David S. Miller

    Li Wei
     

10 Dec, 2011

1 commit


07 Dec, 2011

2 commits


06 Dec, 2011

1 commit


05 Dec, 2011

1 commit

  • like rt6_lookup, but allows caller to pass in flowi6 structure.
    Will be used by the upcoming ipv6 netfilter reverse path filter
    match.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

04 Dec, 2011

5 commits


03 Dec, 2011

1 commit


02 Dec, 2011

1 commit

  • This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a.

    If we take the "try_again" goto, due to a checksum error,
    the 'len' has already been truncated. So we won't compute
    the same values as the original code did.

    Reported-by: paul bilke
    Signed-off-by: David S. Miller

    David S. Miller
     

01 Dec, 2011

2 commits