21 Dec, 2018

6 commits

  • Needless copy&paste, just handle all in one. Next patch will handle
    acct and timestamp, which have similar functions.

    Intentionally leaves cruft behind, will be cleaned up in a followup
    patch.

    The obsolete sysctl pointers in netns_ct struct are left in place and
    removed in a single change, as changes to netns trigger rebuild of
    almost all files.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Its a bit hard to see what table[3] really lines up with, so add
    human-readable mnemonics and use them for initialisation.

    This makes it easier to see e.g. which sysctls are not exported to
    unprivileged userns.

    objdiff shows no changes.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Only one caller, just place it where its needed.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This patch adds two sysctl knobs for GRE:

    net.netfilter.nf_conntrack_gre_timeout = 30
    net.netfilter.nf_conntrack_gre_timeout_stream = 180

    Update the Documentation as well.

    Signed-off-by: Yafang Shao
    Signed-off-by: Pablo Neira Ayuso

    Yafang Shao
     
  • We have no explicit signal when a UDP stream has terminated, peers just
    stop sending.

    For suspected stream connections a timeout of two minutes is sane to keep
    NAT mapping alive a while longer.

    It matches tcp conntracks 'timewait' default timeout value.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Currently DNS resolvers that send both A and AAAA queries from same source port
    can trigger stream mode prematurely, which results in non-early-evictable conntrack entry
    for three minutes, even though DNS requests are done in a few milliseconds.

    Add a two second grace period where we continue to use the ordinary
    30-second default timeout. Its enough for DNS request/response traffic,
    even if two request/reply packets are involved.

    ASSURED is still set, else conntrack (and thus a possible
    NAT mapping ...) gets zapped too in case conntrack table runs full.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

18 Dec, 2018

17 commits

  • If same destination IP address config is already existing, that config is
    just used. MAC address also should be same.
    However, there is no MAC address checking routine.
    So that MAC address checking routine is added.

    test commands:
    %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
    -j CLUSTERIP --new --hashmode sourceip \
    --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
    %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
    -j CLUSTERIP --new --hashmode sourceip \
    --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1

    After this patch, above commands are disallowed.

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • A proc_remove() can sleep. so that it can't be inside of spin_lock.
    Hence proc_remove() is moved to outside of spin_lock. and it also
    adds mutex to sync create and remove of proc entry(config->pde).

    test commands:
    SHELL#1
    %while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
    --dport 9000 -j CLUSTERIP --new --hashmode sourceip \
    --clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
    iptables -F; done

    SHELL#2
    %while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
    echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done

    [ 2949.569864] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
    [ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
    [ 2949.587920] 1 lock held by iptables/5472:
    [ 2949.592711] #0: 000000008f0ebcf2 (&(&cn->lock)->rlock){+...}, at: refcount_dec_and_lock+0x24/0x50
    [ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: G W 4.19.0-rc5+ #16
    [ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 2949.604212] Call Trace:
    [ 2949.604212] dump_stack+0xc9/0x16b
    [ 2949.604212] ? show_regs_print_info+0x5/0x5
    [ 2949.604212] ___might_sleep+0x2eb/0x420
    [ 2949.604212] ? set_rq_offline.part.87+0x140/0x140
    [ 2949.604212] ? _rcu_barrier_trace+0x400/0x400
    [ 2949.604212] wait_for_completion+0x94/0x710
    [ 2949.604212] ? wait_for_completion_interruptible+0x780/0x780
    [ 2949.604212] ? __kernel_text_address+0xe/0x30
    [ 2949.604212] ? __lockdep_init_map+0x10e/0x5c0
    [ 2949.604212] ? __lockdep_init_map+0x10e/0x5c0
    [ 2949.604212] ? __init_waitqueue_head+0x86/0x130
    [ 2949.604212] ? init_wait_entry+0x1a0/0x1a0
    [ 2949.604212] proc_entry_rundown+0x208/0x270
    [ 2949.604212] ? proc_reg_get_unmapped_area+0x370/0x370
    [ 2949.604212] ? __lock_acquire+0x4500/0x4500
    [ 2949.604212] ? complete+0x18/0x70
    [ 2949.604212] remove_proc_subtree+0x143/0x2a0
    [ 2949.708655] ? remove_proc_entry+0x390/0x390
    [ 2949.708655] clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
    [ ... ]

    Fixes: b3e456fce9f5 ("netfilter: ipt_CLUSTERIP: fix a race condition of proc file creation")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • When network namespace is destroyed, both clusterip_tg_destroy() and
    clusterip_net_exit() are called. and clusterip_net_exit() is called
    before clusterip_tg_destroy().
    Hence cleanup check code in clusterip_net_exit() doesn't make sense.

    test commands:
    %ip netns add vm1
    %ip netns exec vm1 bash
    %ip link set lo up
    %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
    -j CLUSTERIP --new --hashmode sourceip \
    --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
    %exit
    %ip netns del vm1

    splat looks like:
    [ 341.184508] WARNING: CPU: 1 PID: 87 at net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
    [ 341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
    [ 341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
    [ 341.227509] Workqueue: netns cleanup_net
    [ 341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
    [ 341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff 0b e9 33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
    [ 341.227509] RSP: 0018:ffff88011086f408 EFLAGS: 00010202
    [ 341.227509] RAX: dffffc0000000000 RBX: 1ffff1002210de85 RCX: 0000000000000000
    [ 341.227509] RDX: 1ffff1002210de85 RSI: ffff880110813be8 RDI: ffffed002210de58
    [ 341.227509] RBP: ffff88011086f4d0 R08: 0000000000000000 R09: 0000000000000000
    [ 341.227509] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1002210de81
    [ 341.227509] R13: ffff880110625a48 R14: ffff880114cec8c8 R15: 0000000000000014
    [ 341.227509] FS: 0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000
    [ 341.227509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 341.227509] CR2: 00007f11fd38e000 CR3: 000000013ca16000 CR4: 00000000001006e0
    [ 341.227509] Call Trace:
    [ 341.227509] ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
    [ 341.227509] ? default_device_exit+0x1ca/0x270
    [ 341.227509] ? remove_proc_entry+0x1cd/0x390
    [ 341.227509] ? dev_change_net_namespace+0xd00/0xd00
    [ 341.227509] ? __init_waitqueue_head+0x130/0x130
    [ 341.227509] ops_exit_list.isra.10+0x94/0x140
    [ 341.227509] cleanup_net+0x45b/0x900
    [ ... ]

    Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • When network namespace is destroyed, cleanup_net() is called.
    cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
    So that clusterip_tg_destroy() is called by cleanup_net().
    And clusterip_tg_destroy() calls unregister_netdevice_notifier().

    But both cleanup_net() and clusterip_tg_destroy() hold same
    lock(pernet_ops_rwsem). hence deadlock occurrs.

    After this patch, only 1 notifier is registered when module is inserted.
    And all of configs are added to per-net list.

    test commands:
    %ip netns add vm1
    %ip netns exec vm1 bash
    %ip link set lo up
    %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
    -j CLUSTERIP --new --hashmode sourceip \
    --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
    %exit
    %ip netns del vm1

    splat looks like:
    [ 341.809674] ============================================
    [ 341.809674] WARNING: possible recursive locking detected
    [ 341.809674] 4.19.0-rc5+ #16 Tainted: G W
    [ 341.809674] --------------------------------------------
    [ 341.809674] kworker/u4:2/87 is trying to acquire lock:
    [ 341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x8c/0x460
    [ 341.809674]
    [ 341.809674] but task is already holding lock:
    [ 341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
    [ 341.809674]
    [ 341.809674] other info that might help us debug this:
    [ 341.809674] Possible unsafe locking scenario:
    [ 341.809674]
    [ 341.809674] CPU0
    [ 341.809674] ----
    [ 341.809674] lock(pernet_ops_rwsem);
    [ 341.809674] lock(pernet_ops_rwsem);
    [ 341.809674]
    [ 341.809674] *** DEADLOCK ***
    [ 341.809674]
    [ 341.809674] May be due to missing lock nesting notation
    [ 341.809674]
    [ 341.809674] 3 locks held by kworker/u4:2/87:
    [ 341.809674] #0: 00000000d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: process_one_work+0xafe/0x1de0
    [ 341.809674] #1: 00000000c2cbcee2 (net_cleanup_work){+.+.}, at: process_one_work+0xb60/0x1de0
    [ 341.809674] #2: 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
    [ 341.809674]
    [ 341.809674] stack backtrace:
    [ 341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: G W 4.19.0-rc5+ #16
    [ 341.809674] Workqueue: netns cleanup_net
    [ 341.809674] Call Trace:
    [ ... ]
    [ 342.070196] down_write+0x93/0x160
    [ 342.070196] ? unregister_netdevice_notifier+0x8c/0x460
    [ 342.070196] ? down_read+0x1e0/0x1e0
    [ 342.070196] ? sched_clock_cpu+0x126/0x170
    [ 342.070196] ? find_held_lock+0x39/0x1c0
    [ 342.070196] unregister_netdevice_notifier+0x8c/0x460
    [ 342.070196] ? register_netdevice_notifier+0x790/0x790
    [ 342.070196] ? __local_bh_enable_ip+0xe9/0x1b0
    [ 342.070196] ? __local_bh_enable_ip+0xe9/0x1b0
    [ 342.070196] ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
    [ 342.070196] ? trace_hardirqs_on+0x93/0x210
    [ 342.070196] ? __bpf_trace_preemptirq_template+0x10/0x10
    [ 342.070196] ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
    [ 342.123094] clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
    [ 342.123094] ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
    [ 342.123094] ? cleanup_match+0x17d/0x200 [ip_tables]
    [ 342.123094] ? xt_unregister_table+0x215/0x300 [x_tables]
    [ 342.123094] ? kfree+0xe2/0x2a0
    [ 342.123094] cleanup_entry+0x1d5/0x2f0 [ip_tables]
    [ 342.123094] ? cleanup_match+0x200/0x200 [ip_tables]
    [ 342.123094] __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
    [ 342.123094] iptable_filter_net_exit+0x43/0x80 [iptable_filter]
    [ 342.123094] ops_exit_list.isra.10+0x94/0x140
    [ 342.123094] cleanup_net+0x45b/0x900
    [ ... ]

    Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • If just a table name was given, nf_tables_dump_rules() continued over
    the list of tables even after a match was found. The simple fix is to
    exit the loop if it reached the bottom and ctx->table was not NULL.

    When iterating over the table's chains, the same problem as above
    existed. But worse than that, if a chain name was given the hash table
    wasn't used to find the corresponding chain. Fix this by introducing a
    helper function iterating over a chain's rules (and taking care of the
    cb->args handling), then introduce a shortcut to it if a chain name was
    given.

    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso

    Phil Sutter
     
  • Each media stream negotiation between 2 SIP peers will trigger creation
    of 4 different expectations (2 RTP and 2 RTCP):
    - INVITE will create expectations for the media packets sent by the
    called peer
    - reply to the INVITE will create expectations for media packets sent
    by the caller

    The dport used by these expectations usually match the ones selected
    by the SIP peers, but they might get translated due to conflicts with
    another expectation. When such event occur, it is important to do
    this translation in both directions, dport translation on the receiving
    path and sport translation on the sending path.

    This commit fixes the sport translation when the peer requiring it is
    also the one that starts the media stream. In this scenario, first media
    stream packet is forwarded from LAN to WAN and will rely on
    nf_nat_sip_expected() to do the necessary sport translation. However, the
    expectation matched by this packet does not contain the necessary information
    for doing SNAT, this data being stored in the paired expectation created by
    the sender's SIP message (INVITE or reply to it).

    Signed-off-by: Alin Nastac
    Signed-off-by: Pablo Neira Ayuso

    Alin Nastac
     
  • This removes the (now empty) nf_nat_l4proto struct, all its instances
    and all the no longer needed runtime (un)register functionality.

    nf_nat_need_gre() can be axed as well: the module that calls it (to
    load the no-longer-existing nat_gre module) also calls other nat core
    functions. GRE nat is now always available if kernel is built with it.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This removes the last l4proto indirection, the two callers, the l3proto
    packet mangling helpers for ipv4 and ipv6, now call the
    nf_nat_l4proto_manip_pkt() helper.

    nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
    they contain no functionality anymore to not clutter this patch.

    Next patch will remove the empty files and the nf_nat_l4proto
    struct.

    nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
    other nat manip functionality as well, not just udp and udplite.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • all protocols did set this to nf_nat_l4proto_nlattr_to_range, so
    just call it directly.

    The important difference is that we'll now also call it for
    protocols that we don't support (i.e., nf_nat_proto_unknown did
    not provide .nlattr_to_range).

    However, there should be no harm, even icmp provided this callback.
    If we don't implement a specific l4nat for this, nothing would make
    use of this information, so adding a big switch/case construct listing
    all supported l4protocols seems a bit pointless.

    This change leaves a single function pointer in the l4proto struct.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • With exception of icmp, all of the l4 nat protocols set this to
    nf_nat_l4proto_in_range.

    Get rid of this and just check the l4proto in the caller.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • No need for indirections here, we only support ipv4 and ipv6
    and the called functions are very small.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • fold remaining users (icmp, icmpv6, gre) into nf_nat_l4proto_unique_tuple.
    The static-save of old incarnation of resolved key in gre and icmp is
    removed as well, just use the prandom based offset like the others.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • almost all l4proto->unique_tuple implementations just call this helper,
    so make ->unique_tuple() optional and call its helper directly if the
    l4proto doesn't override it.

    This is an intermediate step to get rid of ->unique_tuple completely.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Historically this was net_random() based, and was then converted to
    a hash based algorithm (private boot seed + hash of endpoint addresses)
    due to concerns of leaking net_random() bits.

    RANDOM_FULLY mode was added later to avoid problems with hash
    based mode (see commit 34ce324019e76,
    "netfilter: nf_nat: add full port randomization support" for details).

    Just make prandom_u32() the default search starting point and get rid of
    ->secure_port() altogether.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • These parameters aren't used now.
    So remove them.

    Signed-off-by: Yafang Shao
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Yafang Shao
     
  • In case almost or all available ports are taken, clash resolution can
    take a very long time, resulting in soft lockup.

    This can happen when many to-be-natted hosts connect to same
    destination:port (e.g. a proxy) and all connections pass the same SNAT.

    Pick a random offset in the acceptable range, then try ever smaller
    number of adjacent port numbers, until either the limit is reached or a
    useable port was found. This results in at most 248 attempts
    (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
    instead of 64000+,

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Since a pseudo-random starting point is used in finding a port in
    the default case, that 'else if' branch above is no longer a necessity.
    So remove it to simplify code.

    Signed-off-by: Xiaozhou Liu
    Signed-off-by: Pablo Neira Ayuso

    Xiaozhou Liu
     

14 Dec, 2018

2 commits

  • To make overflows as obvious as possible and to prevent code from blithely
    proceeding with a truncated string. This also has a side-effect to fix a
    compilation warning when using GCC 8.2.1.

    net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get':
    net/netfilter/ipset/ip_set_core.c:2027:3: warning: 'strncpy' writing 32 bytes into a region of size 2 overflows the destination [-Wstringop-overflow=]

    Signed-off-by: Qian Cai
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Qian Cai
     
  • New function added by "Introduction of new commands and protocol
    version 7" is not working, since we return skb2 to user

    Signed-off-by: Victorien Molle
    Signed-off-by: Florent Fourcot
    Signed-off-by: Pablo Neira Ayuso

    Florent Fourcot
     

01 Dec, 2018

3 commits

  • This is a leftover from days where single-cpu systems were common:
    Store last port used to resolve a clash to use it as a starting point when
    the next conflict needs to be resolved.

    When we have parallel attempt to connect to same address:port pair,
    its likely that both cores end up computing the same "available" port,
    as both use same starting port, and newly used ports won't become
    visible to other cores until the conntrack gets confirmed later.

    One of the cores then has to drop the packet at insertion time because
    the chosen new tuple turns out to be in use after all.

    Lets simplify this: remove port rover and use a pseudo-random starting
    point.

    Note that this doesn't make netfilter default to 'fully random' mode;
    the 'rover' was only used if NAT could not reuse source port as-is.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • These are very very (for long time unused) caching infrastructure
    definition, remove then. They have nothing to do with the NFC subsystem.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Now that call_rcu()'s callback is not invoked until after bh-disable
    regions of code have completed (in addition to explicitly marked
    RCU read-side critical sections), call_rcu() can be used in place
    of call_rcu_bh(). Similarly, rcu_barrier() can be used in place of
    rcu_barrier_bh() and synchronize_rcu() in place of synchronize_rcu_bh().
    This commit therefore makes these changes.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Pablo Neira Ayuso

    Paul E. McKenney
     

12 Nov, 2018

12 commits

  • nf_flow_offload_gc_step() and nf_flow_table_iterate() are very similar.
    so that many duplicate code can be removed.
    After this patch, nf_flow_offload_gc_step() is simple callback function of
    nf_flow_table_iterate() like nf_flow_table_do_cleanup().

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • nf_flow_table_iterate() is local function, make it static.

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • Useful to only set a particular range of the conntrack mark while
    leaving existing parts of the value alone, e.g. when updating
    conntrack marks via netlink from userspace.

    For NFQUEUE it was already implemented in commit 534473c6080e
    ("netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark").

    This now adds the same functionality also for the other netlink
    conntrack mark changes.

    Signed-off-by: Andreas Jaggi
    Signed-off-by: Pablo Neira Ayuso

    Andreas Jaggi
     
  • Jozsef Kadlecsik says:

    ====================
    - Introduction of new commands and thus protocol version 7. The
    new commands makes possible to eliminate the getsockopt interface
    of ipset and use solely netlink to communicate with the kernel.
    Due to the strict attribute checking both in user/kernel space,
    a new protocol number was introduced. Both the kernel/userspace is
    fully backward compatible.
    - Make invalid MAC address checks consisten, from Stefano Brivio.
    The patch depends on the next one.
    - Allow matching on destination MAC address for mac and ipmac sets,
    also from Stefano Brivio.
    ====================

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Fixes gcc '-Wunused-but-set-variable' warning:

    drivers/net/phy/marvell.c: In function 'm88e1510_config_init':
    drivers/net/phy/marvell.c:850:7: warning:
    variable 'pause' set but not used [-Wunused-but-set-variable]

    It not used any more after commit 3c1bcc8614db ("net: ethernet: Convert phydev
    advertize and supported from u32 to link mode")

    Signed-off-by: YueHaibing
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    YueHaibing
     
  • David S. Miller
     
  • Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "One last pull request before heading to Vancouver for LPC, here we have:

    1) Don't forget to free VSI contexts during ice driver unload, from
    Victor Raj.

    2) Don't forget napi delete calls during device remove in ice driver,
    from Dave Ertman.

    3) Don't request VLAN tag insertion of ibmvnic device when SKB
    doesn't have VLAN tags at all.

    4) IPV4 frag handling code has to accomodate the situation where two
    threads try to insert the same fragment into the hash table at the
    same time. From Eric Dumazet.

    5) Relatedly, don't flow separate on protocol ports for fragmented
    frames, also from Eric Dumazet.

    6) Memory leaks in qed driver, from Denis Bolotin.

    7) Correct valid MTU range in smsc95xx driver, from Stefan Wahren.

    8) Validate cls_flower nested policies properly, from Jakub Kicinski.

    9) Clearing of stats counters in mc88e6xxx driver doesn't retain
    important bits in the G1_STATS_OP register causing the chip to
    hang. Fix from Andrew Lunn"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    act_mirred: clear skb->tstamp on redirect
    net: dsa: mv88e6xxx: Fix clearing of stats counters
    tipc: fix link re-establish failure
    net: sched: cls_flower: validate nested enc_opts_policy to avoid warning
    net: mvneta: correct typo
    flow_dissector: do not dissect l4 ports for fragments
    net: qualcomm: rmnet: Fix incorrect assignment of real_dev
    net: aquantia: allow rx checksum offload configuration
    net: aquantia: invalid checksumm offload implementation
    net: aquantia: fixed enable unicast on 32 macvlan
    net: aquantia: fix potential IOMMU fault after driver unbind
    net: aquantia: synchronized flow control between mac/phy
    net: smsc95xx: Fix MTU range
    net: stmmac: Fix RX packet size > 8191
    qed: Fix potential memory corruption
    qed: Fix SPQ entries not returned to pool in error flows
    qed: Fix blocking/unlimited SPQ entries leak
    qed: Fix memory/entry leak in qed_init_sp_request()
    inet: frags: better deal with smp races
    net: hns3: bugfix for not checking return value
    ...

    Linus Torvalds
     
  • …masahiroy/linux-kbuild

    Pull Kbuild fixes from Masahiro Yamada:

    - fix build errors in binrpm-pkg and bindeb-pkg targets

    - fix false positive matches in merge_config.sh

    - fix build version mismatch in deb-pkg target

    - fix dtbs_install handling in (bin)deb-pkg target

    - revert a commit that allows setlocalversion to write to source tree

    * tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    builddeb: Fix inclusion of dtbs in debian package
    Revert "scripts/setlocalversion: git: Make -dirty check more robust"
    kbuild: deb-pkg: fix too low build version number
    kconfig: merge_config: avoid false positive matches from comment lines
    kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used
    kbuild: rpm-pkg: fix binrpm-pkg breakage when O= is used

    Linus Torvalds
     
  • Pull btrfs fixes from David Sterba:
    "Several fixes to recent release (4.19, fixes tagged for stable) and
    other fixes"

    * tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    Btrfs: fix missing delayed iputs on unmount
    Btrfs: fix data corruption due to cloning of eof block
    Btrfs: fix infinite loop on inode eviction after deduplication of eof block
    Btrfs: fix deadlock on tree root leaf when finding free extent
    btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
    btrfs: tree-checker: Fix misleading group system information
    Btrfs: fix missing data checksums after a ranged fsync (msync)
    btrfs: fix pinned underflow after transaction aborted
    Btrfs: fix cur_offset in the error case for nocow

    Linus Torvalds
     
  • Pull ext4 fixes from Ted Ts'o:
    "A large number of ext4 bug fixes, mostly buffer and memory leaks on
    error return cleanup paths"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: missing !bh check in ext4_xattr_inode_write()
    ext4: fix buffer leak in __ext4_read_dirblock() on error path
    ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path
    ext4: fix buffer leak in ext4_xattr_move_to_block() on error path
    ext4: release bs.bh before re-using in ext4_xattr_block_find()
    ext4: fix buffer leak in ext4_xattr_get_block() on error path
    ext4: fix possible leak of s_journal_flag_rwsem in error path
    ext4: fix possible leak of sbi->s_group_desc_leak in error path
    ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref()
    ext4: avoid possible double brelse() in add_new_gdb() on error path
    ext4: avoid buffer leak in ext4_orphan_add() after prior errors
    ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty()
    ext4: fix possible inode leak in the retry loop of ext4_resize_fs()
    ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing
    ext4: add missing brelse() update_backups()'s error path
    ext4: add missing brelse() add_new_gdb_meta_bg()'s error path
    ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path
    ext4: avoid potential extra brelse in setup_new_flex_group_blocks()

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A set of x86 fixes:

    - Cure the LDT remapping to user space on 5 level paging which ended
    up in the KASLR space

    - Remove LDT mapping before freeing the LDT pages

    - Make NFIT MCE handling more robust

    - Unbreak the VSMP build by removing the dependency on paravirt ops

    - Support broken PIT emulation on Microsoft hyperV

    - Don't trace vmware_sched_clock() to avoid tracer recursion

    - Remove -pipe from KBUILD CFLAGS which breaks clang and is also
    slower on GCC

    - Trivial coding style and typo fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/cpu/vmware: Do not trace vmware_sched_clock()
    x86/vsmp: Remove dependency on pv_irq_ops
    x86/ldt: Remove unused variable in map_ldt_struct()
    x86/ldt: Unmap PTEs for the slot before freeing LDT pages
    x86/mm: Move LDT remap out of KASLR region on 5-level paging
    acpi/nfit, x86/mce: Validate a MCE's address before using it
    acpi/nfit, x86/mce: Handle only uncorrectable machine checks
    x86/build: Remove -pipe from KBUILD_CFLAGS
    x86/hyper-v: Fix indentation in hv_do_fast_hypercall16()
    Documentation/x86: Fix typo in zero-page.txt
    x86/hyper-v: Enable PIT shutdown quirk
    clockevents/drivers/i8253: Add support for PIT shutdown quirk

    Linus Torvalds