17 Dec, 2011

4 commits


16 Dec, 2011

2 commits


15 Dec, 2011

2 commits

  • net/ipv4/sysctl_net_ipv4.c:78:6: warning: symbol 'inet_get_ping_group_range_table'
    was not declared. Should it be static?

    net/ipv4/sysctl_net_ipv4.c:119:31: warning: incorrect type in argument 2
    (different signedness)
    net/ipv4/sysctl_net_ipv4.c:119:31: expected int *range
    net/ipv4/sysctl_net_ipv4.c:119:31: got unsigned int *

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Its better to use a predefined size for this small automatic variable.

    Removes a sparse error as well :

    net/sched/cls_flow.c:288:13: error: bad constant expression

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Dec, 2011

6 commits

  • commit 6d4cdf47d2 (vlan: add 802.1q netpoll support) forgot to declare
    as static some private functions.

    Signed-off-by: Eric Dumazet
    CC: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The original code generates a Sparse warning:
    net/8021q/vlan_core.c:336:9:
    error: incompatible types in comparison expression (different address spaces)

    It's ok to dereference __rcu pointers here because we are holding the
    RTNL lock. I've added some calls to rtnl_dereference() to silence the
    warning.

    Signed-off-by: Dan Carpenter
    Acked-by: Eric Dumazet
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • Before adding a struct rtnl_link_ops into link_ops list, check it doesnt
    clash with a prior one.

    Based on a previous patch from Alexander Smirnov

    Signed-off-by: Eric Dumazet
    CC: Alexander Smirnov
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • After commit 8e2ec639173f325977818c45011ee176ef2b11f6 ("ipv6: don't
    use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
    for setting the ANYCAST flag is now wrong.

    'rt' will always now have a plen of 128, because it is set explicitly
    to 128 by ip6_rt_copy.

    So to restore the semantics of the test, check the destination prefix
    length of 'ort'.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Don't just succeed with a route that has a NULL neighbour attached.
    This follows the behavior of addrconf_dst_alloc().

    Allowing this kind of route to end up with a NULL neigh attached will
    result in packet drops on output until the route is somehow
    invalidated, since nothing will meanwhile try to lookup the neigh
    again.

    A statistic is bumped for the case where we see a neigh-less route on
    output, but the resulting packet drop is otherwise silent in nature,
    and frankly it's a hard error for this to happen and ipv6 should do
    what ipv4 does which is say something in the kernel logs.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • It's simpler to just keep these things out until there is a real user
    of them, so we can see what the needs actually are, rather than keep
    these things around as useless overhead.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Dec, 2011

14 commits

  • This extension can be used to simulate special link layer
    characteristics. Simulate because packet data is not modified, only the
    calculation base is changed to delay a packet based on the original
    packet size and artificial cell information.

    packet_overhead can be used to simulate a link layer header compression
    scheme (e.g. set packet_overhead to -20) or with a positive
    packet_overhead value an additional MAC header can be simulated. It is
    also possible to "replace" the 14 byte Ethernet header with something
    else.

    cell_size and cell_overhead can be used to simulate link layer schemes,
    based on cells, like some TDMA schemes. Another application area are MAC
    schemes using a link layer fragmentation with a (small) header each.
    Cell size is the maximum amount of data bytes within one cell. Cell
    overhead is an additional variable to change the per-cell-overhead
    (e.g. 5 byte header per fragment).

    Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per
    cell overhead 5 byte):

    tc qdisc add dev eth0 root netem rate 5kbit 20 100 5

    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: Florian Westphal
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     
  • David S. Miller
     
  • gred_change_vq() is called under sch_tree_lock(sch).

    This means a spinlock is held, and we are not allowed to sleep in this
    context.

    We might pre-allocate memory using GFP_KERNEL before taking spinlock,
    but this is not suitable for stable material.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch introduces kmem.tcp.max_usage_in_bytes file, living in the
    kmem_cgroup filesystem. The root cgroup will display a value equal
    to RESOURCE_MAX. This is to avoid introducing any locking schemes in
    the network paths when cgroups are not being actively used.

    All others, will see the maximum memory ever used by this cgroup.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces kmem.tcp.failcnt file, living in the
    kmem_cgroup filesystem. Following the pattern in the other
    memcg resources, this files keeps a counter of how many times
    allocation failed due to limits being hit in this cgroup.
    The root cgroup will always show a failcnt of 0.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces kmem.tcp.usage_in_bytes file, living in the
    kmem_cgroup filesystem. It is a simple read-only file that displays the
    amount of kernel memory currently consumed by the cgroup.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
    effectively control the amount of kernel memory pinned by a cgroup.

    This value is ignored in the root cgroup, and in all others,
    caps the value specified by the admin in the net namespaces'
    view of tcp_sysctl_mem.

    If namespaces are being used, the admin is allowed to set a
    value bigger than cgroup's maximum, the same way it is allowed
    to set pretty much unlimited values in a real box.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch allows each namespace to independently set up
    its levels for tcp memory pressure thresholds. This patch
    alone does not buy much: we need to make this values
    per group of process somehow. This is achieved in the
    patches that follows in this patchset.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces memory pressure controls for the tcp
    protocol. It uses the generic socket memory pressure code
    introduced in earlier patches, and fills in the
    necessary data in cg_proto struct.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • The goal of this work is to move the memory pressure tcp
    controls to a cgroup, instead of just relying on global
    conditions.

    To avoid excessive overhead in the network fast paths,
    the code that accounts allocated memory to a cgroup is
    hidden inside a static_branch(). This branch is patched out
    until the first non-root cgroup is created. So when nobody
    is using cgroups, even if it is mounted, no significant performance
    penalty should be seen.

    This patch handles the generic part of the code, and has nothing
    tcp-specific.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Kirill A. Shutemov
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch replaces all uses of struct sock fields' memory_pressure,
    memory_allocated, sockets_allocated, and sysctl_mem to acessor
    macros. Those macros can either receive a socket argument, or a mem_cgroup
    argument, depending on the context they live in.

    Since we're only doing a macro wrapping here, no performance impact at all is
    expected in the case where we don't have cgroups disabled.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • Same fix as 731abb9cb2 for ipip and sit tunnel.
    Commit 1c5cae815d removed an explicit call to dev_alloc_name in
    ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
    will now create a valid name, however the tunnel keeps a copy of the
    name in the private parms structure. Fix this by copying the name back
    after register_netdevice has successfully returned.

    This shows up if you do a simple tunnel add, followed by a tunnel show:

    $ sudo ip tunnel add mode ipip remote 10.2.20.211
    $ ip tunnel
    tunl0: ip/ip remote any local any ttl inherit nopmtudisc
    tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit
    $ sudo ip tunnel add mode sit remote 10.2.20.212
    $ ip tunnel
    sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16
    sit%d: ioctl 89f8 failed: No such device
    sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit

    Cc: stable@vger.kernel.org
    Signed-off-by: Ted Feng
    Signed-off-by: David S. Miller

    Ted Feng
     
  • There is no obvious reason to add a default multicast route for loopback
    devices, otherwise there would be a route entry whose dst.error set to
    -ENETUNREACH that would blocking all multicast packets.

    ====================

    [ more detailed explanation ]

    The problem is that the resulting routing table depends on the sequence
    of interface's initialization and in some situation, that would block all
    muticast packets. Suppose there are two interfaces on my computer
    (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
    table(for multicast) would be

    # ip -6 route show | grep ff00::
    unreachable ff00::/8 dev lo metric 256 error -101
    ff00::/8 dev eth0 metric 256

    When sending multicasting packets, routing subsystem will return the first
    route entry which with a error set to -101(ENETUNREACH).

    I know the kernel will set the default ipv6 address for 'lo' when it is up
    and won't set the default multicast route for it, but there is no reason to
    stop 'init' program from setting address for 'lo', and that is exactly what
    systemd did.

    I am sure there is something wrong with kernel or systemd, currently I preferred
    kernel caused this problem.

    ====================

    Signed-off-by: Li Wei
    Signed-off-by: David S. Miller

    Li Wei
     
  • …wireless-next into for-davem

    John W. Linville
     

12 Dec, 2011

4 commits


11 Dec, 2011

2 commits

  • Wrap the udp6 lookup into the proper ifdef-s.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Eric Dumazet reported, that when inet_diag is built-in the udp_diag also goes
    built-in and when ipv6 is a module the udp6 lookup symbol is not found.

    LD .tmp_vmlinux1
    net/built-in.o: In function `udp_dump_one':
    udp_diag.c:(.text+0xa2b40): undefined reference to `__udp6_lib_lookup'
    make: *** [.tmp_vmlinux1] Erreur 1

    Fix this by making udp diag build mode depend on both -- inet diag and ipv6.

    Reported-by: Eric Dumazet
    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

10 Dec, 2011

6 commits