09 Feb, 2016

1 commit

  • Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
    were leading to RST.

    This is of course incorrect.

    A specific list of ICMP messages should be able to drop a SYN_RECV.

    For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
    not hold a dst per SYN_RECV pseudo request.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Reported-by: Petr Novopashenniy
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Feb, 2016

1 commit

  • Silence lockdep false positive about rcu_dereference() being
    used in the wrong context.

    First one should use rcu_dereference_protected() as we own the spinlock.

    Second one should be a normal assignation, as no barrier is needed.

    Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Feb, 2016

1 commit

  • A rcu stall with the following backtrace was seen on a system with
    forwarding, optimistic_dad and use_optimistic set. To reproduce,
    set these flags and allow ipv6 autoconf.

    This occurs because the device write_lock is acquired while already
    holding the read_lock. Back trace below -

    INFO: rcu_preempt self-detected stall on CPU { 1} (t=2100 jiffies
    g=3992 c=3991 q=4471)
    Task dump for CPU 1:
    kworker/1:0 R running task 12168 15 2 0x00000002
    Workqueue: ipv6_addrconf addrconf_dad_work
    Call trace:
    [] el1_irq+0x68/0xdc
    [] _raw_write_lock_bh+0x20/0x30
    [] __ipv6_dev_ac_inc+0x64/0x1b4
    [] addrconf_join_anycast+0x9c/0xc4
    [] __ipv6_ifa_notify+0x160/0x29c
    [] ipv6_ifa_notify+0x50/0x70
    [] addrconf_dad_work+0x314/0x334
    [] process_one_work+0x244/0x3fc
    [] worker_thread+0x2f8/0x418
    [] kthread+0xe0/0xec

    v2: do addrconf_dad_kick inside read lock and then acquire write
    lock for ipv6_ifa_notify as suggested by Eric

    Fixes: 7fd2561e4ebdd ("net: ipv6: Add a sysctl to make optimistic
    addresses useful candidates")

    Cc: Eric Dumazet
    Cc: Erik Kline
    Cc: Hannes Frederic Sowa
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Acked-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    subashab@codeaurora.org
     

02 Feb, 2016

1 commit

  • Pull networking fixes from David Miller:
    "This looks like a lot but it's a mixture of regression fixes as well
    as fixes for longer standing issues.

    1) Fix on-channel cancellation in mac80211, from Johannes Berg.

    2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
    module, from Eric Dumazet.

    3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
    Dumazet.

    4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
    bound, from Craig Gallek.

    5) GRO key comparisons don't take lightweight tunnels into account,
    from Jesse Gross.

    6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
    Dumazet.

    7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
    register them, otherwise the NEWLINK netlink message is missing
    the proper attributes. From Thadeu Lima de Souza Cascardo.

    8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
    Schimmel

    9) Handle fragments properly in ipv4 easly socket demux, from Eric
    Dumazet.

    10) Don't ignore the ifindex key specifier on ipv6 output route
    lookups, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
    tcp: avoid cwnd undo after receiving ECN
    irda: fix a potential use-after-free in ircomm_param_request
    net: tg3: avoid uninitialized variable warning
    net: nb8800: avoid uninitialized variable warning
    net: vxge: avoid unused function warnings
    net: bgmac: clarify CONFIG_BCMA dependency
    net: hp100: remove unnecessary #ifdefs
    net: davinci_cpdma: use dma_addr_t for DMA address
    ipv6/udp: use sticky pktinfo egress ifindex on connect()
    ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
    netlink: not trim skb for mmaped socket when dump
    vxlan: fix a out of bounds access in __vxlan_find_mac
    net: dsa: mv88e6xxx: fix port VLAN maps
    fib_trie: Fix shift by 32 in fib_table_lookup
    net: moxart: use correct accessors for DMA memory
    ipv4: ipconfig: avoid unused ic_proto_used symbol
    bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
    bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
    bnxt_en: Ring free response from close path should use completion ring
    net_sched: drr: check for NULL pointer in drr_dequeue
    ...

    Linus Torvalds
     

30 Jan, 2016

2 commits

  • Currently, the egress interface index specified via IPV6_PKTINFO
    is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
    can be subverted when the user space application calls connect()
    before sendmsg().
    Fix it by initializing properly flowi6_oif in connect() before
    performing the route lookup.

    Signed-off-by: Paolo Abeni
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • The current implementation of ip6_dst_lookup_tail basically
    ignore the egress ifindex match: if the saddr is set,
    ip6_route_output() purposefully ignores flowi6_oif, due
    to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
    flag if saddr set"), if the saddr is 'any' the first route lookup
    in ip6_dst_lookup_tail fails, but upon failure a second lookup will
    be performed with saddr set, thus ignoring the ifindex constraint.

    This commit adds an output route lookup function variant, which
    allows the caller to specify lookup flags, and modify
    ip6_dst_lookup_tail() to enforce the ifindex match on the second
    lookup via said helper.

    ip6_route_output() becames now a static inline function build on
    top of ip6_route_output_flags(); as a side effect, out-of-tree
    modules need now a GPL license to access the output route lookup
    functionality.

    Signed-off-by: Paolo Abeni
    Acked-by: Hannes Frederic Sowa
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Paolo Abeni
     

26 Jan, 2016

2 commits

  • When creating a SIT tunnel with ip tunnel, rtnl_link_ops is not set before
    ipip6_tunnel_create is called. When register_netdevice is called, there is
    no linkinfo attribute in the NEWLINK message because of that.

    Setting rtnl_link_ops before calling register_netdevice fixes that.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have
    to select CRYPTO_ECHAINIV in order to work properly. This solves the
    issues caused by a misconfiguration as described in [1].
    The original approach, patching crypto/Kconfig was turned down by
    Herbert Xu [2].

    [1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html
    [2] http://marc.info/?l=linux-crypto-vger&m=145224655809562&w=2

    Signed-off-by: Thomas Egerer
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Thomas Egerer
     

21 Jan, 2016

1 commit

  • tcp_memcontrol.c only contains legacy memory.tcp.kmem.* file definitions
    and mem_cgroup->tcp_mem init/destroy stuff. This doesn't belong to
    network subsys. Let's move it to memcontrol.c. This also allows us to
    reuse generic code for handling legacy memcg files.

    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: "David S. Miller"
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

20 Jan, 2016

1 commit

  • Using a combination of connected and un-connected sockets, Dmitry
    was able to trigger soft lockups with his fuzzer.

    The problem is that sockets in the SO_REUSEPORT array might have
    different scores.

    Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and
    bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into
    the soreuseport array, with a score which is smaller than the score of
    first socket sk1 found in hash table (I am speaking of the regular UDP
    hash table), if sk1 had the connect() done, giving a +8 to its score.

    hash bucket [X] -> sk1 -> sk2 -> NULL

    sk1 score = 14 (because it did a connect())
    sk2 score = 6

    SO_REUSEPORT fast selection is an optimization. If it turns out the
    score of the selected socket does not match score of first socket, just
    fallback to old SO_REUSEPORT logic instead of trying to be too smart.

    Normal SO_REUSEPORT users do not mix different kind of sockets, as this
    mechanism is used for load balance traffic.

    Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Cc: Craig Gallek
    Acked-by: Craig Gallek
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Jan, 2016

2 commits

  • Pull networking fixes from David Miller:
    "A quick set of bug fixes after there initial networking merge:

    1) Netlink multicast group storage allocator only was tested with
    nr_groups equal to 1, make it work for other values too. From
    Matti Vaittinen.

    2) Check build_skb() return value in macb and hip04_eth drivers, from
    Weidong Wang.

    3) Don't leak x25_asy on x25_asy_open() failure.

    4) More DMA map/unmap fixes in 3c59x from Neil Horman.

    5) Don't clobber IP skb control block during GSO segmentation, from
    Konstantin Khlebnikov.

    6) ECN helpers for ipv6 don't fixup the checksum, from Eric Dumazet.

    7) Fix SKB segment utilization estimation in xen-netback, from David
    Vrabel.

    8) Fix lockdep splat in bridge addrlist handling, from Nikolay
    Aleksandrov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
    bgmac: Fix reversed test of build_skb() return value.
    bridge: fix lockdep addr_list_lock false positive splat
    net: smsc: Add support h8300
    xen-netback: free queues after freeing the net device
    xen-netback: delete NAPI instance when queue fails to initialize
    xen-netback: use skb to determine number of required guest Rx requests
    net: sctp: Move sequence start handling into sctp_transport_get_idx()
    ipv6: update skb->csum when CE mark is propagated
    net: phy: turn carrier off on phy attach
    net: macb: clear interrupts when disabling them
    sctp: support to lookup with ep+paddr in transport rhashtable
    net: hns: fixes no syscon error when init mdio
    dts: hisi: fixes no syscon fault when init mdio
    net: preserve IP control block during GSO segmentation
    fsl/fman: Delete one function call "put_device" in dtsec_config()
    hip04_eth: fix missing error handle for build_skb failed
    3c59x: fix another page map/single unmap imbalance
    3c59x: balance page maps and unmaps
    x25_asy: Free x25_asy on x25_asy_open() failure.
    mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB
    ...

    Linus Torvalds
     
  • When a tunnel decapsulates the outer header, it has to comply
    with RFC 6080 and eventually propagate CE mark into inner header.

    It turns out IP6_ECN_set_ce() does not correctly update skb->csum
    for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
    messages and stack traces.

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Jan, 2016

1 commit


12 Jan, 2016

2 commits

  • Conflicts:
    drivers/net/bonding/bond_main.c
    drivers/net/ethernet/mellanox/mlxsw/spectrum.h
    drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

    The bond_main.c and mellanox switch conflicts were cases of
    overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM
    sockets") disallows UFO for packets sent from raw sockets. We need to do
    the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if
    for a bit different reason: while such socket would override the
    CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and
    bad offloading flags warning is triggered in __skb_gso_segment().

    In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow
    UFO for packets sent by sockets with UDP_NO_CHECK6_TX option.

    Signed-off-by: Michal Kubecek
    Tested-by: Shannon Nelson
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Michal Kubeček
     

11 Jan, 2016

2 commits

  • When first SYNACK is sent, we already hold rcu_read_lock(), but this
    is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
    does not hold rcu_read_lock()

    Fixes: 45f6fad84cc30 ("ipv6: add complete rcu protection around np->opt")
    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The userspace needs to know why is the address being removed so that it can
    perhaps obtain a new address.

    Without the DADFAILED flag it's impossible to distinguish removal of a
    temporary and tentative address due to DAD failure from other reasons (device
    removed, manual address removal).

    Signed-off-by: Lubomir Rintel
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Lubomir Rintel
     

09 Jan, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next, they are:

    1) Release nf_tables objects on netns destructions via
    nft_release_afinfo().

    2) Destroy basechain and rules on netdevice removal in the new netdev
    family.

    3) Get rid of defensive check against removal of inactive objects in
    nf_tables.

    4) Pass down netns pointer to our existing nfnetlink callbacks, as well
    as commit() and abort() nfnetlink callbacks.

    5) Allow to invert limit expression in nf_tables, so we can throttle
    overlimit traffic.

    6) Add packet duplication for the netdev family.

    7) Add forward expression for the netdev family.

    8) Define pr_fmt() in conntrack helpers.

    9) Don't leave nfqueue configuration on inconsistent state in case of
    errors, from Ken-ichirou MATSUZAWA, follow up patches are also from
    him.

    10) Skip queue option handling after unbind.

    11) Return error on unknown both in nfqueue and nflog command.

    12) Autoload ctnetlink when NFQA_CFG_F_CONNTRACK is set.

    13) Add new NFTA_SET_USERDATA attribute to store user data in sets,
    from Carlos Falgueras.

    14) Add support for 64 bit byteordering changes nf_tables, from Florian
    Westphal.

    15) Add conntrack byte/packet counter matching support to nf_tables,
    also from Florian.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Jan, 2016

2 commits

  • This socket-lookup path did not pass along the skb in question
    in my original BPF-based socket selection patch. The skb in the
    udpN_lib_lookup2 path can be used for BPF-based socket selection just
    like it is in the 'traditional' udpN_lib_lookup path.

    udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
    the same hlist slot. Coincidentally, I chose 10 sockets per
    reuseport group in my functional test, so the lookup2 path was not
    excersised. This adds an additional set of tests with 20 sockets.

    Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
    Fixes: 3ca8e4029969 ("soreuseport: BPF selection functional test")
    Suggested-by: Eric Dumazet
    Signed-off-by: Craig Gallek
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Craig Gallek
     
  • The only user was removed in commit
    029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations").

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Jan, 2016

3 commits

  • Expose socket options for setting a classic or extended BPF program
    for use when selecting sockets in an SO_REUSEPORT group. These options
    can be used on the first socket to belong to a group before bind or
    on any socket in the group after bind.

    This change includes refactoring of the existing sk_filter code to
    allow reuse of the existing BPF filter validation checks.

    Signed-off-by: Craig Gallek
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Craig Gallek
     
  • Include a struct sock_reuseport instance when a UDP socket binds to
    a specific address for the first time with the reuseport flag set.
    When selecting a socket for an incoming UDP packet, use the information
    available in sock_reuseport if present.

    This required adding an additional field to the UDP source address
    equality function to differentiate between exact and wildcard matches.
    The original use case allowed wildcard matches when checking for
    existing port uses during bind. The new use case of adding a socket
    to a reuseport group requires exact address matching.

    Performance test (using a machine with 2 CPU sockets and a total of
    48 cores): Create reuseport groups of varying size. Use one socket
    from this group per user thread (pinning each thread to a different
    core) calling recvmmsg in a tight loop. Record number of messages
    received per second while saturating a 10G link.
    10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
    20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
    40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

    This work is based off a similar implementation written by
    Ying Cai for implementing policy-based reuseport
    selection.

    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     
  • Backport of this upstream commit into stable kernels :
    89c22d8c3b27 ("net: Fix skb csum races when peeking")
    exposed a bug in udp stack vs MSG_PEEK support, when user provides
    a buffer smaller than skb payload.

    In this case,
    skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
    msg->msg_iov);
    returns -EFAULT.

    This bug does not happen in upstream kernels since Al Viro did a great
    job to replace this into :
    skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
    This variant is safe vs short buffers.

    For the time being, instead reverting Herbert Xu patch and add back
    skb->ip_summed invalid changes, simply store the result of
    udp_lib_checksum_complete() so that we avoid computing the checksum a
    second time, and avoid the problematic
    skb_copy_and_csum_datagram_iovec() call.

    This patch can be applied on recent kernels as it avoids a double
    checksumming, then backported to stable kernels as a bug fix.

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Jan, 2016

1 commit


29 Dec, 2015

1 commit

  • We have to release the existing objects on netns removal otherwise we
    leak them. Chains are unregistered in first place to make sure no
    packets are walking on our rules and sets anymore.

    The object release happens by when we unregister the family via
    nft_release_afinfo() which is called from nft_unregister_afinfo() from
    the corresponding __net_exit path in every family.

    Reported-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

26 Dec, 2015

1 commit


24 Dec, 2015

1 commit

  • Marc Haber reported we don't honor interface indexes when we receive link
    local router addresses in router advertisements. Luckily the non-strict
    version of ipv6_chk_addr already does the correct job here, so we can
    simply use it to lighten the checks and use those addresses by default
    without any configuration change.

    Link:
    Reported-by: Marc Haber
    Cc: Marc Haber
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

23 Dec, 2015

5 commits

  • Hannes points out that when we generate tcp reset for timewait sockets we
    pretend we found no socket and pass NULL sk to tcp_vX_send_reset().

    Make it cope with inet tw sockets and then provide tw sk.

    This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.

    Packetdrill test case:
    // want default route to be used, we rely on BINDTODEVICE
    `ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`

    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    // test case still works due to BINDTODEVICE
    0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
    0.100...0.200 connect(3, ..., ...) = 0

    0.100 > S 0:0(0)
    0.200 < S. 0:0(0) ack 1 win 32792
    0.200 > . 1:1(0) ack 1

    0.210 close(3) = 0

    0.210 > F. 1:1(0) ack 1 win 29200
    0.300 < . 1:1(0) ack 2 win 46

    // more data while in FIN_WAIT2, expect RST
    1.300 < P. 1:1001(1000) ack 1 win 46

    // fails without this change -- default route is used
    1.301 > R 1:1(0) win 0

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • tcp_md5_do_lookup requires a full socket, so once we extend
    _send_reset() to also accept timewait socket we would have to change

    if (!sk && hash_location)

    to something like

    if ((!sk || !sk_fullsock(sk)) && hash_location) {
    ...
    } else {
    (sk && sk_fullsock(sk)) tcp_md5_do_lookup()
    }

    Switch the two branches: check if we have a socket first, then
    fall back to a listener lookup if we saw a md5 option (hash_location).

    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • When sysctl performs restrict writes, it allows to write from
    a middle position of a sysctl file, which requires us to initialize
    the table data before calling proc_dostring() for the write case.

    Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf")
    Reported-by: Sasha Levin
    Acked-by: Hannes Frederic Sowa
    Tested-by: Sasha Levin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2015-12-22

    Just one patch to fix dst_entries_init with multiple namespaces.
    From Dan Streetman.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
    ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
    ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.

    Fix this by inverting ip6addrlbl_hold() check.

    Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
    Signed-off-by: Andrey Ryabinin
    Reviewed-by: Cong Wang
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Andrey Ryabinin
     

19 Dec, 2015

4 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains the first batch of Netfilter updates for
    the upcoming 4.5 kernel. This batch contains userspace netfilter header
    compilation fixes, support for packet mangling in nf_tables, the new
    tracing infrastructure for nf_tables and cgroup2 support for iptables.
    More specifically, they are:

    1) Two patches to include dependencies in our netfilter userspace
    headers to resolve compilation problems, from Mikko Rapeli.

    2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

    3) Remove duplicate include in the netfilter reject infrastructure,
    from Stephen Hemminger.

    4) Two patches to simplify the netfilter defragmentation code for IPv6,
    patch from Florian Westphal.

    5) Fix root ownership of /proc/net netfilter for unpriviledged net
    namespaces, from Philip Whineray.

    6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

    7) Add mangling support to our nf_tables payload expression, from
    Patrick McHardy.

    8) Introduce a new netlink-based tracing infrastructure for nf_tables,
    from Florian Westphal.

    9) Change setter functions in nfnetlink_log to be void, from
    Rami Rosen.

    10) Add netns support to the cttimeout infrastructure.

    11) Add cgroup2 support to iptables, from Tejun Heo.

    12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

    13) Add support for mangling pkttype in the nf_tables meta expression,
    also from Florian.

    BTW, I need that you pull net into net-next, I have another batch that
    requires changes that I don't yet see in net.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Allow accepted sockets to derive their sk_bound_dev_if setting from the
    l3mdev domain in which the packets originated. A sysctl setting is added
    to control the behavior which is similar to sk_mark and
    sysctl_tcp_fwmark_accept.

    This effectively allow a process to have a "VRF-global" listen socket,
    with child sockets bound to the VRF device in which the packet originated.
    A similar behavior can be achieved using sk_mark, but a solution using marks
    is incomplete as it does not handle duplicate addresses in different L3
    domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
    domain provides a complete solution.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add a new address generator mode, using the stable address generator
    with an automatically generated secret. This is intended as a default
    address generator mode for device types with no EUI64 implementation.
    The new generator is used for ARPHRD_NONE interfaces initially, adding
    default IPv6 autoconf support to e.g. tun interfaces.

    If the addrgenmode is set to 'random', either by default or manually,
    and no stable secret is available, then a random secret is used as
    input for the stable-privacy address generator. The secret can be
    read and modified like manually configured secrets, using the proc
    interface. Modifying the secret will change the addrgen mode to
    'stable-privacy' to indicate that it operates on a known secret.

    Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
    a known secret is available when the device is created, then the mode
    will default to 'stable-privacy' as before. The mode can be manually
    set to 'random' but it will behave exactly like 'stable-privacy' in
    this case. The secret will not change.

    Cc: Hannes Frederic Sowa
    Cc: 吉藤英明
    Signed-off-by: Bjørn Mork
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Bjørn Mork
     
  • The recently added generic ILA translation facility fails to
    build when CONFIG_NETFILTER is disabled:

    net/ipv6/ila/ila_xlat.c:229:20: warning: 'struct nf_hook_state' declared inside parameter list
    net/ipv6/ila/ila_xlat.c:235:27: error: array type has incomplete element type 'struct nf_hook_ops'
    static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {

    This adds an explicit Kconfig dependency to avoid that case.

    Signed-off-by: Arnd Bergmann
    Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility")
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

18 Dec, 2015

3 commits


16 Dec, 2015

1 commit

  • Bjørn reported that while we switch all interfaces to privacy stable mode
    when setting the secret, we don't set this mode for new interfaces. This
    does not make sense, so change this behaviour.

    Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf")
    Reported-by: Bjørn Mork
    Cc: Bjørn Mork
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa