27 Nov, 2020

1 commit

  • kmemleak report a memory leak as follows:

    BUG: memory leak
    unreferenced object 0xffff8880759ea000 (size 256):
    backtrace:
    [] kmem_cache_zalloc include/linux/slab.h:656 [inline]
    [] __proc_create+0x23d/0x7d0 fs/proc/generic.c:421
    [] proc_create_reg+0x8e/0x140 fs/proc/generic.c:535
    [] proc_create_net_data+0x8c/0x1b0 fs/proc/proc_net.c:126
    [] ip_vs_control_net_init+0x308/0x13a0 net/netfilter/ipvs/ip_vs_ctl.c:4169
    [] __ip_vs_init+0x211/0x400 net/netfilter/ipvs/ip_vs_core.c:2429
    [] ops_init+0xa8/0x3c0 net/core/net_namespace.c:151
    [] setup_net+0x2de/0x7e0 net/core/net_namespace.c:341
    [] copy_net_ns+0x27d/0x530 net/core/net_namespace.c:482
    [] create_new_namespaces+0x382/0xa30 kernel/nsproxy.c:110
    [] copy_namespaces+0x2e6/0x3b0 kernel/nsproxy.c:179
    [] copy_process+0x220a/0x5f00 kernel/fork.c:2072
    [] _do_fork+0xc7/0xda0 kernel/fork.c:2428
    [] __do_sys_clone3+0x18a/0x280 kernel/fork.c:2703
    [] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    In the error path of ip_vs_control_net_init(), remove_proc_entry() needs
    to be called to remove the added proc entry, otherwise a memory leak
    will occur.

    Also, add some '#ifdef CONFIG_PROC_FS' because proc_create_net* return NULL
    when PROC is not used.

    Fixes: b17fc9963f83 ("IPVS: netns, ip_vs_stats and its procfs")
    Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.")
    Reported-by: Hulk Robot
    Signed-off-by: Wang Hai
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Wang Hai
     

30 Oct, 2020

1 commit

  • If netfilter changes the packet mark when mangling, the packet is
    rerouted using the route_me_harder set of functions. Prior to this
    commit, there's one big difference between route_me_harder and the
    ordinary initial routing functions, described in the comment above
    __ip_queue_xmit():

    /* Note: skb->sk can be different from sk, in case of tunnels */
    int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,

    That function goes on to correctly make use of sk->sk_bound_dev_if,
    rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
    tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
    It will make some transformations to that packet, and then it will send
    the encapsulated packet out of a *new* socket. That new socket will
    basically always have a different sk_bound_dev_if (otherwise there'd be
    a routing loop). So for the purposes of routing the encapsulated packet,
    the routing information as it pertains to the socket should come from
    that socket's sk, rather than the packet's original skb->sk. For that
    reason __ip_queue_xmit() and related functions all do the right thing.

    One might argue that all tunnels should just call skb_orphan(skb) before
    transmitting the encapsulated packet into the new socket. But tunnels do
    *not* do this -- and this is wisely avoided in skb_scrub_packet() too --
    because features like TSQ rely on skb->destructor() being called when
    that buffer space is truely available again. Calling skb_orphan(skb) too
    early would result in buffers filling up unnecessarily and accounting
    info being all wrong. Instead, additional routing must take into account
    the new sk, just as __ip_queue_xmit() notes.

    So, this commit addresses the problem by fishing the correct sk out of
    state->sk -- it's already set properly in the call to nf_hook() in
    __ip_local_out(), which receives the sk as part of its normal
    functionality. So we make sure to plumb state->sk through the various
    route_me_harder functions, and then make correct use of it following the
    example of __ip_queue_xmit().

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Jason A. Donenfeld
     

20 Oct, 2020

1 commit


16 Oct, 2020

1 commit


12 Oct, 2020

2 commits

  • fq qdisc requires tstamp to be cleared in forwarding path

    Reported-by: Evgeny B
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=209427
    Suggested-by: Eric Dumazet
    Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths")
    Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
    Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Julian Anastasov
     
  • Just like for MASQ, inspect the reply packets coming from DR/TUN
    real servers and alter the connection's state and timeout
    according to the protocol.

    It's ipvs's duty to do traffic statistic if packets get hit,
    no matter what mode it is.

    Signed-off-by: longguang.yue
    Signed-off-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    longguang.yue
     

05 Oct, 2020

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next:

    1) Rename 'searched' column to 'clashres' in conntrack /proc/ stats
    to amend a recent patch, from Florian Westphal.

    2) Remove unused nft_data_debug(), from YueHaibing.

    3) Remove unused definitions in IPVS, also from YueHaibing.

    4) Fix user data memleak in tables and objects, this is also amending
    a recent patch, from Jose M. Guisado.

    5) Use nla_memdup() to allocate user data in table and objects, also
    from Jose M. Guisado

    6) User data support for chains, from Jose M. Guisado

    7) Remove unused definition in nf_tables_offload, from YueHaibing.

    8) Use kvzalloc() in ip_set_alloc(), from Vasily Averin.

    9) Fix false positive reported by lockdep in nfnetlink mutexes,
    from Florian Westphal.

    10) Extend fast variant of cmp for neq operation, from Phil Sutter.

    11) Implement fast bitwise variant, also from Phil Sutter.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Oct, 2020

1 commit


22 Sep, 2020

1 commit

  • They are not used since commit e4ff67513096 ("ipvs: add
    sync_maxlen parameter for the sync daemon")

    Signed-off-by: YueHaibing
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    YueHaibing
     

10 Sep, 2020

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next:

    1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
    from Michael Zhou.

    2) do_ip_vs_set_ctl() dereferences uninitialized value,
    from Peilin Ye.

    3) Support for userdata in tables, from Jose M. Guisado.

    4) Do not increment ct error and invalid stats at the same time,
    from Florian Westphal.

    5) Remove ct ignore stats, also from Florian.

    6) Add ct stats for clash resolution, from Florian Westphal.

    7) Bump reference counter bump on ct clash resolution only,
    this is safe because bucket lock is held, again from Florian.

    8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.

    9) Add wildcard support for nft_socket, from Balazs Scheidler.

    10) Remove superfluous IPVS dependency on iptables, from
    Yaroslav Bolyukin.

    11) Remove unused definition in ebt_stp, from Wang Hai.

    12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
    in selftests/net, from Fabian Frederick.

    13) Add userdata support for nft_object, from Jose M. Guisado.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Sep, 2020

1 commit

  • This dependency was added because ipv6_find_hdr was in iptables specific
    code but is no longer required

    Fixes: f8f626754ebe ("ipv6: Move ipv6_find_hdr() out of Netfilter code.")
    Fixes: 63dca2c0b0e7 ("ipvs: Fix faulty IPv6 extension header handling in IPVS")
    Signed-off-by: Yaroslav Bolyukin
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Yaroslav Bolyukin
     

29 Aug, 2020

1 commit

  • do_ip_vs_set_ctl() is referencing uninitialized stack value when `len` is
    zero. Fix it.

    Reported-by: syzbot+23b5f9e7caf61d9a3898@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?id=46ebfb92a8a812621a001ef04d90dfa459520fe2
    Suggested-by: Julian Anastasov
    Signed-off-by: Peilin Ye
    Acked-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Peilin Ye
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

04 Aug, 2020

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    1) UAF in chain binding support from previous batch, from Dan Carpenter.

    2) Queue up delayed work to expire connections with no destination,
    from Andrew Sy Kim.

    3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva.

    4) Replace HTTP links with HTTPS, from Alexander A. Klimov.

    5) Remove superfluous null header checks in ip6tables, from
    Gaurav Singh.

    6) Add extended netlink error reporting for expression.

    7) Report EEXIST on overlapping chain, set elements and flowtable
    devices.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Jul, 2020

1 commit

  • The UDP reuseport conflict was a little bit tricky.

    The net-next code, via bpf-next, extracted the reuseport handling
    into a helper so that the BPF sk lookup code could invoke it.

    At the same time, the logic for reuseport handling of unconnected
    sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
    which changed the logic to carry on the reuseport result into the
    rest of the lookup loop if we do not return immediately.

    This requires moving the reuseport_has_conns() logic into the callers.

    While we are here, get rid of inline directives as they do not belong
    in foo.c files.

    The other changes were cases of more straightforward overlapping
    modifications.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Jul, 2020

1 commit


22 Jul, 2020

2 commits

  • The sync_thread_backup only checks sk_receive_queue is empty or not,
    there is a situation which cannot sync the connection entries when
    sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf,
    the sync packets are dropped in __udp_enqueue_schedule_skb, this is
    because the packets in reader_queue is not read, so the rmem is
    not reclaimed.

    Here I add the check of whether the reader_queue of the udp sock is
    empty or not to solve this problem.

    Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
    Reported-by: zhouxudong
    Signed-off-by: guodeqing
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    guodeqing
     
  • When expire_nodest_conn=1 and a destination is deleted, IPVS does not
    expire the existing connections until the next matching incoming packet.
    If there are many connection entries from a single client to a single
    destination, many packets may get dropped before all the connections are
    expired (more likely with lots of UDP traffic). An optimization can be
    made where upon deletion of a destination, IPVS queues up delayed work
    to immediately expire any connections with a deleted destination. This
    ensures any reused source ports from a client (within the IPVS timeouts)
    are scheduled to new real servers instead of silently dropped.

    Signed-off-by: Andrew Sy Kim
    Signed-off-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Andrew Sy Kim
     

04 Jul, 2020

1 commit

  • YangYuxi is reporting that connection reuse
    is causing one-second delay when SYN hits
    existing connection in TIME_WAIT state.
    Such delay was added to give time to expire
    both the IPVS connection and the corresponding
    conntrack. This was considered a rare case
    at that time but it is causing problem for
    some environments such as Kubernetes.

    As nf_conntrack_tcp_packet() can decide to
    release the conntrack in TIME_WAIT state and
    to replace it with a fresh NEW conntrack, we
    can use this to allow rescheduling just by
    tuning our check: if the conntrack is
    confirmed we can not schedule it to different
    real server and the one-second delay still
    applies but if new conntrack was created,
    we are free to select new real server without
    any delays.

    YangYuxi lists some of the problem reports:

    - One second connection delay in masquerading mode:
    https://marc.info/?t=151683118100004&r=1&w=2

    - IPVS low throughput #70747
    https://github.com/kubernetes/kubernetes/issues/70747

    - Apache Bench can fill up ipvs service proxy in seconds #544
    https://github.com/cloudnativelabs/kube-router/issues/544

    - Additional 1s latency in `host -> service IP -> pod`
    https://github.com/kubernetes/kubernetes/issues/90854

    Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
    Co-developed-by: YangYuxi
    Signed-off-by: YangYuxi
    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Julian Anastasov
     

01 Jul, 2020

2 commits

  • Add new functions ip_vs_conn_del() and ip_vs_conn_del_put()
    to release many IPVS connections in process context.
    They are suitable for connections found in table
    when we do not want to overload the timers.

    Currently, the change is useful for the dropentry delayed
    work but it will be used also in following patch
    when flushing connections to failed destinations.

    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Julian Anastasov
     
  • Keep the IPVS hooks registered in Netfilter only
    while there are configured virtual services. This
    saves CPU cycles while IPVS is loaded but not used.

    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Julian Anastasov
     

14 Jun, 2020

1 commit

  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

31 Mar, 2020

1 commit

  • If outer_proto is not set, GCC warning as following:

    In file included from net/netfilter/ipvs/ip_vs_core.c:52:
    net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp':
    include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized]
    233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \
    | ^~~~~~
    net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here
    1666 | char *outer_proto;
    | ^~~~~~~~~~~

    Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors")
    Signed-off-by: Haishuang Yan
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Haishuang Yan
     

28 Mar, 2020

1 commit


24 Jan, 2020

1 commit


25 Dec, 2019

1 commit

  • The MTU update code is supposed to be invoked in response to real
    networking events that update the PMTU. In IPv6 PMTU update function
    __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
    confirmed time.

    But for tunnel code, it will call pmtu before xmit, like:
    - tnl_update_pmtu()
    - skb_dst_update_pmtu()
    - ip6_rt_update_pmtu()
    - __ip6_rt_update_pmtu()
    - dst_confirm_neigh()

    If the tunnel remote dst mac address changed and we still do the neigh
    confirm, we will not be able to update neigh cache and ping6 remote
    will failed.

    So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
    should not be invoking dst_confirm_neigh() as we have no evidence
    of successful two-way communication at this point.

    On the other hand it is also important to keep the neigh reachability fresh
    for TCP flows, so we cannot remove this dst_confirm_neigh() call.

    To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
    to choose whether we should do neigh update or not. I will add the parameter
    in this patch and set all the callers to true to comply with the previous
    way, and fix the tunnel code one by one on later patches.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Suggested-by: David Miller
    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

27 Nov, 2019

1 commit

  • Note that the sysctl write accessor functions guarantee that:
    net->ipv4.sysctl_ip_prot_sock ipv4.ip_local_ports.range[0]
    invariant is maintained, and as such the max() in selinux hooks is actually spurious.

    ie. even though
    if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) {
    per logic is the same as
    if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) {
    it is actually functionally equivalent to:
    if (snum < low || snum > high) {
    which is equivalent to:
    if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) {
    even though the first clause is spurious.

    But we want to hold on to it in case we ever want to change what what
    inet_port_requires_bind_service() means (for example by changing
    it from a, by default, [0..1024) range to some sort of set).

    Test: builds, git 'grep inet_prot_sock' finds no other references
    Cc: Eric Dumazet
    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

03 Nov, 2019

1 commit


29 Oct, 2019

1 commit


24 Oct, 2019

2 commits

  • syzbot reported the following issue :

    BUG: KCSAN: data-race in update_defense_level / update_defense_level

    read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1:
    update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177
    defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0:
    update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205
    defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: events defense_work_handler

    Indeed, old_secure_tcp is currently a static variable, while it
    needs to be a per netns variable.

    Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Simon Horman

    Eric Dumazet
     
  • if the IPVS module is removed while the sync daemon is starting, there is
    a small gap where try_module_get() might fail getting the refcount inside
    ip_vs_use_count_inc(). Then, the refcounts of IPVS module are unbalanced,
    and the subsequent call to stop_sync_thread() causes the following splat:

    WARNING: CPU: 0 PID: 4013 at kernel/module.c:1146 module_put.part.44+0x15b/0x290
    Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
    CPU: 0 PID: 4013 Comm: modprobe Tainted: G W 5.4.0-rc1.upstream+ #741
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:module_put.part.44+0x15b/0x290
    Code: 04 25 28 00 00 00 0f 85 18 01 00 00 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff 0b e9 78 ff ff ff 65 8b 1d 67 83 26 4a 89 db be 08 00 00 00 48
    RSP: 0018:ffff888050607c78 EFLAGS: 00010297
    RAX: 0000000000000003 RBX: ffffffffc1420590 RCX: ffffffffb5db0ef9
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffc1420590
    RBP: 00000000ffffffff R08: fffffbfff82840b3 R09: fffffbfff82840b3
    R10: 0000000000000001 R11: fffffbfff82840b2 R12: 1ffff1100a0c0f90
    R13: ffffffffc1420200 R14: ffff88804f533300 R15: ffff88804f533ca0
    FS: 00007f8ea9720740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3245abe000 CR3: 000000004c28a006 CR4: 00000000001606f0
    Call Trace:
    stop_sync_thread+0x3a3/0x7c0 [ip_vs]
    ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
    ops_exit_list.isra.5+0x94/0x140
    unregister_pernet_operations+0x29d/0x460
    unregister_pernet_device+0x26/0x60
    ip_vs_cleanup+0x11/0x38 [ip_vs]
    __x64_sys_delete_module+0x2d5/0x400
    do_syscall_64+0xa5/0x4e0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f8ea8bf0db7
    Code: 73 01 c3 48 8b 0d b9 80 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 80 2c 00 f7 d8 64 89 01 48
    RSP: 002b:00007ffcd38d2fe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    RAX: ffffffffffffffda RBX: 0000000002436240 RCX: 00007f8ea8bf0db7
    RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000024362a8
    RBP: 0000000000000000 R08: 00007f8ea8eba060 R09: 00007f8ea8c658a0
    R10: 00007ffcd38d2a60 R11: 0000000000000206 R12: 0000000000000000
    R13: 0000000000000001 R14: 00000000024362a8 R15: 0000000000000000
    irq event stamp: 4538
    hardirqs last enabled at (4537): [] quarantine_put+0x9e/0x170
    hardirqs last disabled at (4538): [] trace_hardirqs_off_thunk+0x1a/0x20
    softirqs last enabled at (4522): [] sk_common_release+0x169/0x2d0
    softirqs last disabled at (4520): [] sk_common_release+0xbe/0x2d0

    Check the return value of ip_vs_use_count_inc() and let its caller return
    proper error. Inside do_ip_vs_set_ctl() the module is already refcounted,
    we don't need refcount/derefcount there. Finally, in register_ip_vs_app()
    and start_sync_thread(), take the module refcount earlier and ensure it's
    released in the error path.

    Change since v1:
    - better return values in case of failure of ip_vs_use_count_inc(),
    thanks to Julian Anastasov
    - no need to increase/decrease the module refcount in ip_vs_set_ctl(),
    thanks to Julian Anastasov

    Signed-off-by: Davide Caratti
    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Davide Caratti
     

08 Oct, 2019

3 commits


02 Oct, 2019

1 commit

  • commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions. The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle. The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

26 Sep, 2019

1 commit


14 Aug, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next:

    1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.

    2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.

    3) More strict validation of IPVS sysctl values, from Junwei Hu.

    4) Remove unnecessary spaces after on the right hand side of assignments,
    from yangxingwu.

    5) Add offload support for bitwise operation.

    6) Extend the nft_offload_reg structure to store immediate date.

    7) Collapse several ip_set header files into ip_set.h, from
    Jeremy Sowden.

    8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
    from Jeremy Sowden.

    9) Fix several sparse warnings due to missing prototypes, from
    Valdis Kletnieks.

    10) Use static lock initialiser to ensure connlabel spinlock is
    initialized on boot time to fix sched/act_ct.c, patch
    from Florian Westphal.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

13 Aug, 2019

1 commit


09 Aug, 2019

1 commit