10 Jan, 2012

3 commits


09 Jan, 2012

1 commit

  • so move it there. Fixes build errors when CONFIG_INET is not defined:

    In file included from include/linux/tcp.h:211:0,
    from include/linux/ipv6.h:221,
    from include/net/ipv6.h:16,
    from include/linux/sunrpc/clnt.h:26,
    from include/linux/nfs_fs.h:50,
    from init/do_mounts.c:20:
    include/net/sock.h: In function 'sk_update_clone':
    include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration]

    Signed-off-by: Stephen Rothwell
    Signed-off-by: David S. Miller

    Stephen Rothwell
     

08 Jan, 2012

2 commits

  • In 882716604ec "pktgen: fix multiple queue warning" we added special
    logic to handle the case where ntxq is zero. It's not clear to me that
    ntxq can actually be zero. But if it were then we would set
    ->queue_map_min and ->queue_map_max to USHRT_MAX when probably we want
    to set them to zero?

    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • Sockets can also be created through sock_clone. Because it copies
    all data in the sock structure, it also copies the memcg-related pointer,
    and all should be fine. However, since we now use reference counts in
    socket creation, we are left with some sockets that have no reference
    counts. It matters when we destroy them, since it leads to a mismatch.

    Signed-off-by: Glauber Costa
    CC: David S. Miller
    CC: Greg Thelen
    CC: Hiroyouki Kamezawa
    CC: Laurent Chavey
    Signed-off-by: David S. Miller

    Glauber Costa
     

05 Jan, 2012

2 commits

  • All implementations have been converted to implement set_rxnfc
    instead.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Define special location values for RX NFC that request the driver to
    select the actual rule location. This allows for implementation on
    devices that use hash-based filter lookup, whereas currently the API is
    more suited to devices with TCAM lookup or linear search.

    In ethtool_set_rxnfc() and the compat wrapper ethtool_ioctl(), copy
    the structure back to user-space after insertion so that the actual
    location is returned.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     

31 Dec, 2011

1 commit

  • Add a routine that dumps memory-related values of a socket.
    It's made as an array to make it possible to add more stuff
    here later without breaking compatibility.

    Since v1: The SK_MEMINFO_ constants are in userspace
    visible part of sock_diag.h, the rest is under __KERNEL__.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

29 Dec, 2011

1 commit


25 Dec, 2011

1 commit

  • Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches.

    Theorical limit on number of flows is 2^32

    Fix some buggy RPS/RFS macros as well.

    Signed-off-by: Eric Dumazet
    CC: Tom Herbert
    CC: Xi Wang
    CC: Laurent Chavey
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Dec, 2011

1 commit


23 Dec, 2011

2 commits

  • skb->truesize might be big even for a small packet.

    Its even bigger after commit 87fb4b7b533 (net: more accurate skb
    truesize) and big MTU.

    We should allow queueing at least one packet per receiver, even with a
    low RCVBUF setting.

    Reported-by: Michal Simek
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will
    cause a kernel oops due to insufficient bounds checking.

    if (count > 1<<< 30) * 8 will overflow
    32 bits.

    This patch replaces the magic number (1 << 30) with a symbolic bound.

    Suggested-by: Eric Dumazet
    Signed-off-by: Xi Wang
    Signed-off-by: David S. Miller

    Xi Wang
     

22 Dec, 2011

1 commit

  • flow_cach_flush() might sleep but can be called from
    atomic context via the xfrm garbage collector. So add
    a flow_cache_flush_deferred() function and use this if
    the xfrm garbage colector is invoked from within the
    packet path.

    Signed-off-by: Steffen Klassert
    Acked-by: Timo Teräs
    Signed-off-by: David S. Miller

    Steffen Klassert
     

20 Dec, 2011

1 commit


17 Dec, 2011

6 commits

  • Use IS_ENABLED(CONFIG_FOO)
    instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE)

    Signed-off-by: Igor Maravić
    Signed-off-by: David S. Miller

    Igor Maravić
     
  • We can't scan the proto_list to initialize sock cgroups, as it
    holds a rwlock, and we also want to keep the code generic enough to
    avoid calling the initialization functions of protocols directly,

    Convert proto_list_lock into a mutex, so we can sleep and do the
    necessary allocations. This lock is seldom taken, so there shouldn't
    be any performance penalties associated with that

    Signed-off-by: Glauber Costa
    CC: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric Dumazet
    CC: Stephen Rothwell
    CC: Randy Dunlap
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • All drivers that support modification of the RX flow hash indirection
    table initialise it in the same way: RX rings are assigned to table
    entries in rotation. Make that default policy explicit by having them
    call a ethtool_rxfh_indir_default() function.

    In the ethtool core, add support for a zero size value for
    ETHTOOL_SRXFHINDIR, which resets the table to this default.

    Partly-suggested-by: Matt Carlson
    Signed-off-by: Ben Hutchings
    Acked-by: Shreyas N Bhatewara
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Add a new ethtool operation (get_rxfh_indir_size) to get the
    indirectional table size. Use this to validate the user buffer size
    before calling get_rxfh_indir or set_rxfh_indir. Use get_rxnfc to get
    the number of RX rings, and validate the contents of the new
    indirection table before calling set_rxfh_indir. Remove this
    validation from drivers.

    Signed-off-by: Ben Hutchings
    Acked-by: Dimitris Michailidis
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • The sk address is used as a cookie between dump/get_exact calls.
    It will be required for unix socket sdumping, so move it from
    inet_diag to sock_diag.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • I've made a mistake when fixing the sock_/inet_diag aliases :(

    1. The sock_diag layer should request the family-based alias,
    not just the IPPROTO_IP one;
    2. The inet_diag layer should request for AF_INET+protocol alias,
    not just the protocol one.

    Thus fix this.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

14 Dec, 2011

2 commits


13 Dec, 2011

3 commits

  • This patch introduces memory pressure controls for the tcp
    protocol. It uses the generic socket memory pressure code
    introduced in earlier patches, and fills in the
    necessary data in cg_proto struct.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • The goal of this work is to move the memory pressure tcp
    controls to a cgroup, instead of just relying on global
    conditions.

    To avoid excessive overhead in the network fast paths,
    the code that accounts allocated memory to a cgroup is
    hidden inside a static_branch(). This branch is patched out
    until the first non-root cgroup is created. So when nobody
    is using cgroups, even if it is mounted, no significant performance
    penalty should be seen.

    This patch handles the generic part of the code, and has nothing
    tcp-specific.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Kirill A. Shutemov
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch replaces all uses of struct sock fields' memory_pressure,
    memory_allocated, sockets_allocated, and sysctl_mem to acessor
    macros. Those macros can either receive a socket argument, or a mem_cgroup
    argument, depending on the context they live in.

    Since we're only doing a macro wrapping here, no performance impact at all is
    expected in the case where we don't have cgroups disabled.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     

12 Dec, 2011

1 commit


10 Dec, 2011

1 commit

  • This reverts commit 865d9f9f748fdc1943679ea65d9ee1dc55e4a6ae.

    This commit breaks the build with CONFIG_NETPRIO_CGROUP=y so
    revert it. It does build as a module though. The SUBSYS macro
    in the cgroup core code automatically defines a subsys structure
    as extern. Long term we should fix the macro. And I need to
    fully build test things.

    Tested with CONFIG_NETPRIO_CGROUP={y|m|n} with and without
    CONFIG_CGROUPS defined.

    Signed-off-by: John Fastabend
    CC: Neil Horman
    Reported-By: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

09 Dec, 2011

2 commits


07 Dec, 2011

4 commits


06 Dec, 2011

2 commits


05 Dec, 2011

1 commit

  • We discovered that TCP stack could retransmit misaligned skbs if a
    malicious peer acknowledged sub MSS frame. This currently can happen
    only if output interface is non SG enabled : If SG is enabled, tcp
    builds headless skbs (all payload is included in fragments), so the tcp
    trimming process only removes parts of skb fragments, header stay
    aligned.

    Some arches cant handle misalignments, so force a head reallocation and
    shrink headroom to MAX_TCP_HEADER.

    Dont care about misaligments on x86 and PPC (or other arches setting
    NET_IP_ALIGN to 0)

    This patch introduces __pskb_copy() which can specify the headroom of
    new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Dec, 2011

1 commit


02 Dec, 2011

1 commit