12 Jan, 2015

1 commit

  • Pablo Neira Ayuso says:

    ====================
    netfilter/ipvs fixes for net

    The following patchset contains netfilter/ipvs fixes, they are:

    1) Small fix for the FTP helper in IPVS, a diff variable may be left
    unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

    2) Fix nf_tables port NAT in little endian archs, patch from leroy
    christophe.

    3) Fix race condition between conntrack confirmation and flush from
    userspace. This is the second reincarnation to resolve this problem.

    4) Make sure inner messages in the batch come with the nfnetlink header.

    5) Relax strict check from nfnetlink_bind() that may break old userspace
    applications using all 1s group mask.

    6) Schedule removal of chains once no sets and rules refer to them in
    the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
    Toennesen.

    Note that this batch comes later than usual because of the short
    winter holidays.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Dec, 2014

2 commits

  • Make sure this fetches 16-bits port data from the register.
    Remove casting to make sparse happy, not needed anymore.

    Signed-off-by: leroy christophe
    Signed-off-by: Pablo Neira Ayuso

    leroy christophe
     
  • When xfrm6_policy_check() is used, _decode_session6() is called after some
    intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
    prepared after the call of xfrm6_policy_check().

    Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.

    Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Reported-by: Huaibin Wang
    Suggested-by: Eric Dumazet
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

12 Dec, 2014

1 commit

  • Pull networking updates from David Miller:

    1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

    2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers. Thanks to Al Viro
    and Herbert Xu.

    3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

    4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

    5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

    6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

    7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

    8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

    9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets. From Alexei
    Starovoitov.

    10) Support TSO/LSO in sunvnet driver, from David L Stevens.

    11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

    12) Remote checksum offload, from Tom Herbert.

    13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

    14) Add MPLS support to openvswitch, from Simon Horman.

    15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

    16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet. This tries to resolve the conflicting goals between the
    desired handling of bulk vs. RPC-like traffic.

    17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

    18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

    19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

    20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

    21) Add VLAN packet scheduler action, from Jiri Pirko.

    22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
    Fix race condition between vxlan_sock_add and vxlan_sock_release
    net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
    net/mlx4: Add support for A0 steering
    net/mlx4: Refactor QUERY_PORT
    net/mlx4_core: Add explicit error message when rule doesn't meet configuration
    net/mlx4: Add A0 hybrid steering
    net/mlx4: Add mlx4_bitmap zone allocator
    net/mlx4: Add a check if there are too many reserved QPs
    net/mlx4: Change QP allocation scheme
    net/mlx4_core: Use tasklet for user-space CQ completion events
    net/mlx4_core: Mask out host side virtualization features for guests
    net/mlx4_en: Set csum level for encapsulated packets
    be2net: Export tunnel offloads only when a VxLAN tunnel is created
    gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
    cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
    net: fec: only enable mdio interrupt before phy device link up
    net: fec: clear all interrupt events to support i.MX6SX
    net: fec: reset fep link status in suspend function
    net: sock: fix access via invalid file descriptor
    net: introduce helper macro for_each_cmsghdr
    ...

    Linus Torvalds
     

11 Dec, 2014

3 commits

  • Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
    cmsghdr from msghdr, just cleanup.

    Signed-off-by: Gu Zheng
    Signed-off-by: David S. Miller

    Gu Zheng
     
  • Pull VFS changes from Al Viro:
    "First pile out of several (there _definitely_ will be more). Stuff in
    this one:

    - unification of d_splice_alias()/d_materialize_unique()

    - iov_iter rewrite

    - killing a bunch of ->f_path.dentry users (and f_dentry macro).

    Getting that completed will make life much simpler for
    unionmount/overlayfs, since then we'll be able to limit the places
    sensitive to file _dentry_ to reasonably few. Which allows to have
    file_inode(file) pointing to inode in a covered layer, with dentry
    pointing to (negative) dentry in union one.

    Still not complete, but much closer now.

    - crapectomy in lustre (dead code removal, mostly)

    - "let's make seq_printf return nothing" preparations

    - assorted cleanups and fixes

    There _definitely_ will be more piles"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    copy_from_iter_nocache()
    new helper: iov_iter_kvec()
    csum_and_copy_..._iter()
    iov_iter.c: handle ITER_KVEC directly
    iov_iter.c: convert copy_to_iter() to iterate_and_advance
    iov_iter.c: convert copy_from_iter() to iterate_and_advance
    iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
    iov_iter.c: convert iov_iter_zero() to iterate_and_advance
    iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
    iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
    iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
    iov_iter.c: iterate_and_advance
    iov_iter.c: macros for iterating over iov_iter
    kill f_dentry macro
    dcache: fix kmemcheck warning in switch_names
    new helper: audit_file()
    nfsd_vfs_write(): use file_inode()
    ncpfs: use file_inode()
    kill f_dentry uses
    lockd: get rid of ->f_path.dentry->d_sb
    ...

    Linus Torvalds
     
  • Conflicts:
    drivers/net/ethernet/amd/xgbe/xgbe-desc.c
    drivers/net/ethernet/renesas/sh_eth.c

    Overlapping changes in both conflict cases.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Dec, 2014

5 commits


09 Dec, 2014

4 commits

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2014-12-03

    1) Fix a set but not used warning. From Fabian Frederick.

    2) Currently we make sequence number values available to userspace
    only if we use ESN. Make the sequence number values also available
    for non ESN states. From Zhi Ding.

    3) Remove socket policy hashing. We don't need it because socket
    policies are always looked up via a linked list. From Herbert Xu.

    4) After removing socket policy hashing, we can use __xfrm_policy_link
    in xfrm_policy_insert. From Herbert Xu.

    5) Add a lookup method for vti6 tunnels with wildcard endpoints.
    I forgot this when I initially implemented vti6.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Al Viro
     
  • The compute_score functions are a bit difficult to read.

    Neaten them a bit to reduce object sizes and make them a
    bit more intelligible.

    Return early to avoid indentation and avoid unnecessary
    initializations.

    (allyesconfig, but w/ -O2 and no profiling)

    $ size net/ipv[46]/udp.o.*
    text data bss dec hex filename
    28680 1184 25 29889 74c1 net/ipv4/udp.o.new
    28756 1184 25 29965 750d net/ipv4/udp.o.old
    17600 1010 2 18612 48b4 net/ipv6/udp.o.new
    17632 1010 2 18644 48d4 net/ipv6/udp.o.old

    Signed-off-by: Joe Perches
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Allow reading of timestamps and cmsg at the same time on all relevant
    socket families. One use is to correlate timestamps with egress
    device, by asking for cmsg IP_PKTINFO.

    on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
    avoid changing legacy expectations, only do so if the caller sets a
    new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

    on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
    returned for all origins. only change is to set ifindex, which is
    not initialized for all error origins.

    In both cases, only generate the pktinfo message if an ifindex is
    known. This is not the case for ACK timestamps.

    The difference between the protocol families is probably a historical
    accident as a result of the different conditions for generating cmsg
    in the relevant ip(v6)_recv_error function:

    ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
    ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

    At one time, this was the same test bar for the ICMP/ICMP6
    distinction. This is no longer true.

    Signed-off-by: Willem de Bruijn

    ----

    Changes
    v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

    The recv cmsg interfaces are also relevant to the discussion of
    whether looping packet headers is problematic. For v6, cmsgs that
    identify many headers are already returned. This patch expands
    that to v4. If it sounds reasonable, I will follow with patches

    1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
    (http://patchwork.ozlabs.org/patch/366967/)
    2. sysctl to conditionally drop all timestamps that have payload or
    cmsg from users without CAP_NET_RAW.
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

06 Dec, 2014

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following batch contains netfilter updates for net-next. Basically,
    enhancements for xt_recent, skip zeroing of timer in conntrack, fix
    linking problem with recent redirect support for nf_tables, ipset
    updates and a couple of cleanups. More specifically, they are:

    1) Rise maximum number per IP address to be remembered in xt_recent
    while retaining backward compatibility, from Florian Westphal.

    2) Skip zeroing timer area in nf_conn objects, also from Florian.

    3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
    using meta l4proto and transport layer header, from Alvaro Neira.

    4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
    and IP6_NF_IPTABLES=n.

    And ipset updates from Jozsef Kadlecsik:

    5) Support updating element extensions when the set is full (fixes
    netfilter bugzilla id 880).

    6) Fix set match with 32-bits userspace / 64-bits kernel.

    7) Indicate explicitly when /0 networks are supported in ipset.

    8) Simplify cidr handling for hash:*net* types.

    9) Allocate the proper size of memory when /0 networks are supported.

    10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

    Jozsef is also cooking ipset RCU conversion which should land soon if
    they reach the merge window in time.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Nov, 2014

1 commit


29 Nov, 2014

1 commit


27 Nov, 2014

2 commits

  • This resolves linking problems with CONFIG_IPV6=n:

    net/built-in.o: In function `redirect_tg6':
    xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'

    Reported-by: Andreas Ruprecht
    Reported-by: Or Gerlitz
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • The "init_net" test in function addrconf_exit_net is introduced
    in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing
    init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net],
    function addrconf_init_net will allocate memory for every net regardless of
    init_net. In this case, it is unnecessary to make "init_net" test.

    CC: Hong Zhiguo
    CC: Octavian Purdila
    CC: Pavel Emelyanov
    CC: Cong Wang
    Suggested-by: David S. Miller
    Signed-off-by: Zhu Yanjun
    Signed-off-by: David S. Miller

    zhuyj
     

26 Nov, 2014

3 commits

  • More work from Al Viro to move away from modifying iovecs
    by using iov_iter instead.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • After commit ca777eff51f7 ("tcp: remove dst refcount false sharing for
    prequeue mode") we have to relax check against skb dst in
    tcp_v[46]_send_reset() if prequeue dropped the dst.

    If a socket is provided, a full lookup was done to find this socket,
    so the dst test can be skipped.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
    Reported-by: Jaša Bartelj
    Signed-off-by: Eric Dumazet
    Reported-by: Daniel Borkmann
    Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The UDP checksum calculation for VXLAN tunnels is currently using the
    socket addresses instead of the actual packet source and destination
    addresses. As a result the checksum calculated is incorrect in some
    cases.

    Also uh->check was being set twice, first it was set to 0, and then it is
    set again in udp6_set_csum. This change removes the redundant assignment
    to 0.

    Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")

    Cc: Andy Zhou
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

25 Nov, 2014

2 commits

  • When using GRE redirection in WCCP, it sets the wrong skb->protocol,
    that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Cc: Dmitry Kozlov
    Signed-off-by: Yuri Chislov
    Tested-by: Yuri Chislov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Yuri Chislov
     
  • Pablo Neira Ayuso says:

    ====================
    netfilter/ipvs updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree, this includes the NAT redirection support for nf_tables, the
    cgroup support for nft meta and conntrack zone support for the connlimit
    match. Coming after those, a bunch of sparse warning fixes, missing
    netns bits and cleanups. More specifically, they are:

    1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
    patches from Arturo Borrero.

    2) Introduce the nf_tables redir expression, from Arturo Borrero.

    3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
    Patch from Alex Gartrell.

    4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
    from Marcelo Leitner.

    5) Add some extra validation when registering logger families, also
    from Marcelo.

    6) Some spelling cleanups from stephen hemminger.

    7) Fix sparse warning in nf_logger_find_get().

    8) Add cgroup support to nf_tables meta, patch from Ana Rey.

    9) A Kconfig fix for the new redir expression and fix sparse warnings in
    the new redir expression.

    10) Fix several sparse warnings in the netfilter tree, from
    Florian Westphal.

    11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
    nothing when this situation occurs.

    12) Add conntrack zone support to xt_connlimit, again from Florian.

    13) Add netnamespace support to the h323 conntrack helper, contributed
    by Vasily Averin.

    14) Remove unnecessary nul-pointer checks before free_percpu() and
    module_put(), from Markus Elfring.

    15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Nov, 2014

4 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Now the vti_link_ops do not point the .dellink, for fb tunnel device
    (ip_vti0), the net_device will be removed as the default .dellink is
    unregister_netdevice_queue,but the tunnel still in the tunnel list,
    then if we add a new vti tunnel, in ip_tunnel_find():

    hlist_for_each_entry_rcu(t, head, hash_node) {
    if (local == t->parms.iph.saddr &&
    remote == t->parms.iph.daddr &&
    link == t->parms.link &&
    ==> type == t->dev->type &&
    ip_tunnel_key_match(&t->parms, flags, key))
    break;
    }

    the panic will happen, cause dev of ip_tunnel *t is null:
    [ 3835.072977] IP: [] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
    [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
    [ 3835.073008] Oops: 0000 [#1] SMP
    .....
    [ 3835.073008] Stack:
    [ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
    [ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
    [ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
    [ 3835.073008] Call Trace:
    [ 3835.073008] [] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
    [ 3835.073008] [] vti_newlink+0x43/0x70 [ip_vti]
    [ 3835.073008] [] rtnl_newlink+0x4fa/0x5f0
    [ 3835.073008] [] ? nla_strlcpy+0x5b/0x70
    [ 3835.073008] [] ? rtnl_link_ops_get+0x40/0x60
    [ 3835.073008] [] ? rtnl_newlink+0x13f/0x5f0
    [ 3835.073008] [] rtnetlink_rcv_msg+0xa4/0x270
    [ 3835.073008] [] ? sock_has_perm+0x75/0x90
    [ 3835.073008] [] ? rtnetlink_rcv+0x30/0x30
    [ 3835.073008] [] netlink_rcv_skb+0xa9/0xc0
    [ 3835.073008] [] rtnetlink_rcv+0x28/0x30
    ....

    modprobe ip_vti
    ip link del ip_vti0 type vti
    ip link add ip_vti0 type vti
    rmmod ip_vti

    do that one or more times, kernel will panic.

    fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
    which we skip the unregister of fb tunnel device. do the same on ip6_vti.

    Signed-off-by: Xin Long
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    lucien
     
  • This change has no functional impact and simply addresses some coding
    style issues detected by checkpatch. Specifically this change
    adjusts "if" statements which also include the assignment of a
    variable.

    No changes to the resultant object files result as determined by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     
  • This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
    the IPv6 GSO offloads. Without this change VXLAN tunnels running over IPv6
    do not currently handle IPv4 TCP TSO requests correctly and end up handing
    the non-segmented frame off to the device.

    Below is the before and after for a simple netperf TCP_STREAM test between
    two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
    a 1Gb/s network adapter.

    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.29 0.88 Before
    87380 16384 16384 10.03 895.69 After

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

22 Nov, 2014

1 commit

  • Conflicts:
    drivers/net/ieee802154/fakehard.c

    A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
    in 'net-next'.

    Add build fix into the merge from Stephen Rothwell in openvswitch, the
    logging macros take a new initial 'log' argument, a new call was added
    in 'net' so when we merge that in here we have to explicitly add the
    new 'log' arg to it else the build fails.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Nov, 2014

2 commits


19 Nov, 2014

1 commit


17 Nov, 2014

2 commits

  • It has been reported that generating an MLD listener report on
    devices with large MTUs (e.g. 9000) and a high number of IPv6
    addresses can trigger a skb_over_panic():

    skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
    head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
    dev:port1
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:100!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ixgbe(O)
    CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
    [...]
    Call Trace:

    [] ? skb_put+0x3a/0x3b
    [] ? add_grhead+0x45/0x8e
    [] ? add_grec+0x394/0x3d4
    [] ? mld_ifc_timer_expire+0x195/0x20d
    [] ? mld_dad_timer_expire+0x45/0x45
    [] ? call_timer_fn.isra.29+0x12/0x68
    [] ? run_timer_softirq+0x163/0x182
    [] ? __do_softirq+0xe0/0x21d
    [] ? irq_exit+0x4e/0xd3
    [] ? smp_apic_timer_interrupt+0x3b/0x46
    [] ? apic_timer_interrupt+0x6a/0x70

    mld_newpack() skb allocations are usually requested with dev->mtu
    in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
    we have changed the limit in order to be less likely to fail.

    However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
    macros, which determine if we may end up doing an skb_put() for
    adding another record. To avoid possible fragmentation, we check
    the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
    assumption as the actual max allocation size can be much smaller.

    The IGMP case doesn't have this issue as commit 57e1ab6eaddc
    ("igmp: refine skb allocations") stores the allocation size in
    the cb[].

    Set a reserved_tailroom to make it fit into the MTU and use
    skb_availroom() helper instead. This also allows to get rid of
    igmp_skb_size().

    Reported-by: Wei Liu
    Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
    Signed-off-by: Daniel Borkmann
    Cc: Eric Dumazet
    Cc: Hannes Frederic Sowa
    Cc: David L Stevens
    Acked-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS fixes for net

    The following patchset contains Netfilter updates for your net tree,
    they are:

    1) Fix missing initialization of the range structure (allocated in the
    stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

    2) Make sure the data we receive from userspace contains the req_version
    structure, otherwise return an error incomplete on truncated input.
    From Dan Carpenter.

    3) Fix handling og skb->sk which may cause incorrect handling
    of connections from a local process. Via Simon Horman, patch from
    Calvin Owens.

    4) Fix wrong netns in nft_compat when setting target and match params
    structure.

    5) Relax chain type validation in nft_compat that was recently included,
    this broke the matches that need to be run from the route chain type.
    Now iptables-test.py automated regression tests report success again
    and we avoid the only possible problematic case, which is the use of
    nat targets out of nat chain type.

    6) Use match->table to validate the tablename, instead of the match->name.
    Again patch for nft_compat.

    7) Restore the synchronous release of objects from the commit and abort
    path in nf_tables. This is causing two major problems: splats when using
    nft_compat, given that matches and targets may sleep and call_rcu is
    invoked from softirq context. Moreover Patrick reported possible event
    notification reordering when rules refer to anonymous sets.

    8) Fix race condition in between packets that are being confirmed by
    conntrack and the ctnetlink flush operation. This happens since the
    removal of the central spinlock. Thanks to Jesper D. Brouer to looking
    into this.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Nov, 2014

1 commit

  • net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
    no; add include
    net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
    yes
    net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
    no; add include
    net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
    yes
    net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
    no; add include
    net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
    net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
    add __force, 3 is what we want.
    net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
    yes
    net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
    no; add include

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

12 Nov, 2014

3 commits

  • Currently there are only three neigh tables in the whole kernel:
    arp table, ndisc table and decnet neigh table. What's more,
    we don't support registering multiple tables per family.
    Therefore we can just make these tables statically built-in.

    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Use the more common dynamic_debug capable net_dbg_ratelimited
    and remove the LIMIT_NETDEBUG macro.

    All messages are still ratelimited.

    Some KERN_ uses are changed to KERN_DEBUG.

    This may have some negative impact on messages that were
    emitted at KERN_INFO that are not not enabled at all unless
    DEBUG is defined or dynamic_debug is enabled. Even so,
    these messages are now _not_ emitted by default.

    This also eliminates the use of the net_msg_warn sysctl
    "/proc/sys/net/core/warnings". For backward compatibility,
    the sysctl is not removed, but it has no function. The extern
    declaration of net_msg_warn is removed from sock.h and made
    static in net/core/sysctl_net_core.c

    Miscellanea:

    o Update the sysctl documentation
    o Remove the embedded uses of pr_fmt
    o Coalesce format fragments
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Alternative to RPS/RFS is to use hardware support for multiple
    queues.

    Then split a set of million of sockets into worker threads, each
    one using epoll() to manage events on its own socket pool.

    Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
    know after accept() or connect() on which queue/cpu a socket is managed.

    We normally use one cpu per RX queue (IRQ smp_affinity being properly
    set), so remembering on socket structure which cpu delivered last packet
    is enough to solve the problem.

    After accept(), connect(), or even file descriptor passing around
    processes, applications can use :

    int cpu;
    socklen_t len = sizeof(cpu);

    getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

    And use this information to put the socket into the right silo
    for optimal performance, as all networking stack should run
    on the appropriate cpu, without need to send IPI (RPS/RFS).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet