13 Jan, 2019

3 commits

  • [ Upstream commit 542fbda0f08f1cbbc250f9e59f7537649651d0c8 ]

    The dst entry might already have a zero refcount, waiting on rcu list
    to be free'd. Using dst_hold() transitions its reference count to 1, and
    next dst release will try to free it again -- resulting in a double free:

    WARNING: CPU: 1 PID: 0 at include/net/dst.h:239 nf_xfrm_me_harder+0xe7/0x130 [nf_nat]
    RIP: 0010:nf_xfrm_me_harder+0xe7/0x130 [nf_nat]
    Code: 48 8b 5c 24 60 65 48 33 1c 25 28 00 00 00 75 53 48 83 c4 68 5b 5d 41 5c c3 85 c0 74 0d 8d 48 01 f0 0f b1 0a 74 86 85 c0 75 f3 0b e9 7b ff ff ff 29 c6 31 d2 b9 20 00 48 00 4c 89 e7 e8 31 27
    Call Trace:
    nf_nat_ipv4_out+0x78/0x90 [nf_nat_ipv4]
    nf_hook_slow+0x36/0xd0
    ip_output+0x9f/0xd0
    ip_forward+0x328/0x440
    ip_rcv+0x8a/0xb0

    Use dst_hold_safe instead and bail out if we cannot take a reference.

    Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag")
    Reported-by: Martin Zaharinov
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit 708abf74dd87f8640871b814faa195fb5970b0e3 ]

    In the error handling block, nla_nest_cancel(skb, atd) is called to
    cancel the nest operation. But then, ipset_nest_end(skb, atd) is
    unexpected called to end the nest operation. This patch calls the
    ipset_nest_end only on the branch that nla_nest_cancel is not called.

    Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
    Signed-off-by: Pan Bian
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pan Bian
     
  • [ Upstream commit 530aad77010b81526586dfc09130ec875cd084e4 ]

    When adjusting sack block sequence numbers, skb_make_writable() gets
    called to make sure tcp options are all in the linear area, and buffer
    is not shared.

    This can cause tcp header pointer to get reallocated, so we must
    reaload it to avoid memory corruption.

    This bug pre-dates git history.

    Reported-by: Neel Mehta
    Reported-by: Shane Huntley
    Reported-by: Heather Adkins
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     

10 Jan, 2019

5 commits

  • commit 4cd273bb91b3001f623f516ec726c49754571b1a upstream.

    (not in Linus's tree now, but in nf.git + linux-next.git already.)

    age is signed integer, so result can be negative when the timestamps
    have a large delta. In this case we want to discard the entry.

    Instead of using age >= 2 || age < 0, just make it unsigned.

    Fixes: b36e4523d4d56 ("netfilter: nf_conncount: fix garbage collection confirm race")
    Reviewed-by: Shawn Bohrer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    [mfo: backport: use older file name, nf_conncount.c -> xt_connlimit.c]
    Signed-off-by: Mauricio Faria de Oliveira
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • commit b36e4523d4d56e2595e28f16f6ccf1cd6a9fc452 upstream.

    Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme
    used by nf_conncount.

    When doing list walk, we lookup the tuple in the conntrack table.
    If the lookup fails we remove this tuple from our list because
    the conntrack entry is gone.

    This is the common cause, but turns out its not the only one.
    The list entry could have been created just before by another cpu, i.e. the
    conntrack entry might not yet have been inserted into the global hash.

    The avoid this, we introduce a timestamp and the owning cpu.
    If the entry appears to be stale, evict only if:
    1. The current cpu is the one that added the entry, or,
    2. The timestamp is older than two jiffies

    The second constraint allows GC to be taken over by other
    cpu too (e.g. because a cpu was offlined or napi got moved to another
    cpu).

    We can't pretend the 'doubtful' entry wasn't in our list.
    Instead, when we don't find an entry indicate via IS_ERR
    that entry was removed ('did not exist' or withheld
    ('might-be-unconfirmed').

    This most likely also fixes a xt_connlimit imbalance earlier reported by
    Dmitry Andrianov.

    Cc: Dmitry Andrianov
    Reported-by: Justin Pettit
    Reported-by: Yi-Hung Wei
    Signed-off-by: Florian Westphal
    Acked-by: Yi-Hung Wei
    Signed-off-by: Pablo Neira Ayuso

    [mfo: backport: refresh context lines and use older symbol/file names:
    - nf_conncount.c -> xt_connlimit.c.
    - nf_conncount_rb -> xt_connlimit_rb
    - nf_conncount_tuple -> xt_connlimit_conn
    - conncount_conn_cachep -> connlimit_conn_cachep]
    Signed-off-by: Mauricio Faria de Oliveira

    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • commit 21ba8847f857028dc83a0f341e16ecc616e34740 upstream.

    Currently, we use check_hlist() for garbage colleciton. However, we
    use the ‘zone’ from the counted entry to query the existence of
    existing entries in the hlist. This could be wrong when they are in
    different zones, and this patch fixes this issue.

    Fixes: e59ea3df3fc2 ("netfilter: xt_connlimit: honor conntrack zone if available")
    Signed-off-by: Yi-Hung Wei
    Signed-off-by: Pablo Neira Ayuso

    [mfo: backport: refresh context lines and use older symbol/file names, note hunk 5:
    - nf_conncount.c -> xt_connlimit.c
    - nf_conncount_rb -> xt_connlimit_rb
    - nf_conncount_tuple -> xt_connlimit_conn
    - hunk 5: remove check for non-NULL 'tuple', that isn't required as it's introduced
    by upstream commit 35d8deb80 ("netfilter: conncount: Support count only use case")
    which addresses nf_conncount_count() that does not exist yet -- it's introduced by
    upstream commit 625c556118f3 ("netfilter: connlimit: split xt_connlimit into front
    and backend"), a refactor change.
    - nft_connlimit.c -> removed, not used/doesn't exist yet.]
    Signed-off-by: Mauricio Faria de Oliveira

    Signed-off-by: Sasha Levin

    Yi-Hung Wei
     
  • commit 5e5cbc7b23eaf13e18652c03efbad5be6995de6a upstream.

    This patch provides an interface to maintain the list of connections and
    the lookup function to obtain the number of connections in the list.

    Signed-off-by: Pablo Neira Ayuso

    [mfo: backport: refresh context lines and use older symbol/file names:
    - nf_conntrack_count.h: new file, add include guards.
    - nf_conncount.c -> xt_connlimit.c.
    - nf_conncount_rb -> xt_connlimit_rb
    - nf_conncount_tuple -> xt_connlimit_conn
    - conncount_rb_cachep -> connlimit_rb_cachep
    - conncount_conn_cachep -> connlimit_conn_cachep]
    Signed-off-by: Mauricio Faria de Oliveira

    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • commit ce49480dba8666cba0106e8e31a942c9ce4c438a upstream.

    Only stored, never read. This is a leftover from commit 7d08487777c8
    ("netfilter: connlimit: use rbtree for per-host conntrack obj storage"),
    which added the rbtree node struct that stores the address instead.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    [mfo: backport: refresh context lines and use older symbol/file names:
    - nf_conncount.c -> xt_connlimit.c.
    - nf_conncount_rb -> xt_connlimit_rb
    - nf_conncount_tuple -> xt_connlimit_conn
    - additionally, remove the add_hlist() 'addr' parameter that isn't used and removed
    later upstream with commit 625c556118f3 ("netfilter: connlimit: split xt_connlimit
    into front and backend") in the rename from 'xt_connlimit.c' to 'nf_conncount.c',
    a big refactor, so do it here, while still here in this related patch.]
    Signed-off-by: Mauricio Faria de Oliveira

    Signed-off-by: Sasha Levin

    Florian Westphal
     

21 Dec, 2018

1 commit

  • [ Upstream commit 0b8d9073539e217f79ec1bff65eb205ac796723d ]

    Fix wraparound bug which could lead to memory exhaustion when adding an
    x.x.x.x-255.255.255.255 range to any hash:*net* types.

    Fixes Netfilter's bugzilla id #1212, reported by Thomas Schwark.

    Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Jozsef Kadlecsik
     

17 Dec, 2018

4 commits

  • [ Upstream commit ca08987885a147643817d02bf260bc4756ce8cd4 ]

    There is no expression deactivation call from the rule replacement path,
    hence, chain counter is not decremented. A few steps to reproduce the
    problem:

    %nft add table ip filter
    %nft add chain ip filter c1
    %nft add chain ip filter c1
    %nft add rule ip filter c1 jump c2
    %nft replace rule ip filter c1 handle 3 accept
    %nft flush ruleset

    expression means immediate NFT_JUMP to chain c2.
    Reference count of chain c2 is increased when the rule is added.

    When rule is deleted or replaced, the reference counter of c2 should be
    decreased via nft_rule_expr_deactivate() which calls
    nft_immediate_deactivate().

    Splat looks like:
    [ 214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
    [ 214.398983] Modules linked in: nf_tables nfnetlink
    [ 214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
    [ 214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
    [ 214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
    [ 214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
    [ 214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202
    [ 214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0
    [ 214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78
    [ 214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000
    [ 214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba
    [ 214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6
    [ 214.398983] FS: 0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000
    [ 214.398983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0
    [ 214.398983] Call Trace:
    [ 214.398983] ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
    [ 214.398983] ? __kasan_slab_free+0x145/0x180
    [ 214.398983] ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
    [ 214.398983] ? kfree+0xdb/0x280
    [ 214.398983] nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
    [ ... ]

    Fixes: bb7b40aecbf7 ("netfilter: nf_tables: bogus EBUSY in chain deletions")
    Reported by: Christoph Anton Mitterer
    Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Taehee Yoo
     
  • [ Upstream commit 2a31e4bd9ad255ee40809b5c798c4b1c2b09703b ]

    ip_vs_dst_event is supposed to clean up all dst used in ipvs'
    destinations when a net dev is going down. But it works only
    when the dst's dev is the same as the dev from the event.

    Now with the same priority but late registration,
    ip_vs_dst_notifier is always called later than ipv6_dev_notf
    where the dst's dev is set to lo for NETDEV_DOWN event.

    As the dst's dev lo is not the same as the dev from the event
    in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work.
    Also as these dst have to wait for dest_trash_timer to clean
    them up. It would cause some non-permanent kernel warnings:

    unregister_netdevice: waiting for br0 to become free. Usage count = 3

    To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf
    by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5.

    Note that for ipv4 route fib_netdev_notifier doesn't set dst's
    dev to lo in NETDEV_DOWN event, so this fix is only needed when
    IP_VS_IPV6 is defined.

    Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring")
    Reported-by: Li Shuang
    Signed-off-by: Xin Long
    Acked-by: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Xin Long
     
  • [ Upstream commit b4e955e9f372035361fbc6f07b21fe2cc6a5be4a ]

    In the htable_create(), hinfo is allocated by vmalloc()
    So that if error occurred, hinfo should be freed.

    Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Taehee Yoo
     
  • [ Upstream commit 29e3880109e357fdc607b4393f8308cef6af9413 ]

    nft_compat ops do not have static storage duration, unlike all other
    expressions.

    When nf_tables_expr_destroy() returns, expr->ops might have been
    free'd already, so we need to store next address before calling
    expression destructor.

    For same reason, we can't deref match pointer after nft_xt_put().

    This can be easily reproduced by adding msleep() before
    nft_match_destroy() returns.

    Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
    Reported-by: Pablo Neira Ayuso
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     

27 Nov, 2018

3 commits

  • [ Upstream commit 54451f60c8fa061af9051a53be9786393947367c ]

    When IDLETIMER rule is added, sysfs file is created under
    /sys/class/xt_idletimer/timers/
    But some label name shouldn't be used.
    ".", "..", "power", "uevent", "subsystem", etc...
    So that sysfs filename checking routine is needed.

    test commands:
    %iptables -I INPUT -j IDLETIMER --timeout 1 --label "power"

    splat looks like:
    [95765.423132] sysfs: cannot create duplicate filename '/devices/virtual/xt_idletimer/timers/power'
    [95765.433418] CPU: 0 PID: 8446 Comm: iptables Not tainted 4.19.0-rc6+ #20
    [95765.449755] Call Trace:
    [95765.449755] dump_stack+0xc9/0x16b
    [95765.449755] ? show_regs_print_info+0x5/0x5
    [95765.449755] sysfs_warn_dup+0x74/0x90
    [95765.449755] sysfs_add_file_mode_ns+0x352/0x500
    [95765.449755] sysfs_create_file_ns+0x179/0x270
    [95765.449755] ? sysfs_add_file_mode_ns+0x500/0x500
    [95765.449755] ? idletimer_tg_checkentry+0x3e5/0xb1b [xt_IDLETIMER]
    [95765.449755] ? rcu_read_lock_sched_held+0x114/0x130
    [95765.449755] ? __kmalloc_track_caller+0x211/0x2b0
    [95765.449755] ? memcpy+0x34/0x50
    [95765.449755] idletimer_tg_checkentry+0x4e2/0xb1b [xt_IDLETIMER]
    [ ... ]

    Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Taehee Yoo
     
  • [ Upstream commit 886503f34d63e681662057448819edb5b1057a97 ]

    Allow /0 as advertised for hash:net,port,net sets.

    For "hash:net,port,net", ipset(8) says that "either subnet
    is permitted to be a /0 should you wish to match port
    between all destinations."

    Make that statement true.

    Before:

    # ipset create cidrzero hash:net,port,net
    # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
    ipset v6.34: The value of the CIDR parameter of the IP address is invalid

    # ipset create cidrzero6 hash:net,port,net family inet6
    # ipset add cidrzero6 ::/0,12345,::/0
    ipset v6.34: The value of the CIDR parameter of the IP address is invalid

    After:

    # ipset create cidrzero hash:net,port,net
    # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
    # ipset test cidrzero 192.168.205.129,12345,172.16.205.129
    192.168.205.129,tcp:12345,172.16.205.129 is in set cidrzero.

    # ipset create cidrzero6 hash:net,port,net family inet6
    # ipset add cidrzero6 ::/0,12345,::/0
    # ipset test cidrzero6 fe80::1,12345,ff00::1
    fe80::1,tcp:12345,ff00::1 is in set cidrzero6.

    See also:

    https://bugzilla.kernel.org/show_bug.cgi?id=200897
    https://github.com/ewestbrook/linux/commit/df7ff6efb0934ab6acc11f003ff1a7580d6c1d9c

    Signed-off-by: Eric Westbrook
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Eric Westbrook
     
  • [ Upstream commit 439cd39ea136d2c026805264d58a91f36b6b64ca ]

    Commit 45040978c899 ("netfilter: ipset: Fix set:list type crash
    when flush/dump set in parallel") postponed decreasing set
    reference counters to the RCU callback.

    An 'ipset del' command can terminate before the RCU grace period
    is elapsed, and if sets are listed before then, the reference
    counter shown in userspace will be wrong:

    # ipset create h hash:ip; ipset create l list:set; ipset add l
    # ipset del l h; ipset list h
    Name: h
    Type: hash:ip
    Revision: 4
    Header: family inet hashsize 1024 maxelem 65536
    Size in memory: 88
    References: 1
    Number of entries: 0
    Members:
    # sleep 1; ipset list h
    Name: h
    Type: hash:ip
    Revision: 4
    Header: family inet hashsize 1024 maxelem 65536
    Size in memory: 88
    References: 0
    Number of entries: 0
    Members:

    Fix this by making the reference count update synchronous again.

    As a result, when sets are listed, ip_set_name_byindex() might
    now fetch a set whose reference count is already zero. Instead
    of relying on the reference count to protect against concurrent
    set renaming, grab ip_set_ref_lock as reader and copy the name,
    while holding the same lock in ip_set_rename() as writer
    instead.

    Reported-by: Li Shuang
    Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
    Signed-off-by: Stefano Brivio
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Stefano Brivio
     

21 Nov, 2018

1 commit

  • commit f393808dc64149ccd0e5a8427505ba2974a59854 upstream.

    If there's no entry to drop in bucket that corresponds to the hash,
    early_drop() should look for it in other buckets. But since it increments
    hash instead of bucket number, it actually looks in the same bucket 8
    times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
    reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
    most cases.

    Fix it by increasing bucket number instead of hash and rename _hash
    to bucket to avoid future confusion.

    Fixes: 3e86638e9a0b ("netfilter: conntrack: consider ct netns in early_drop logic")
    Cc: # v4.7+
    Signed-off-by: Vasily Khoruzhick
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Vasily Khoruzhick
     

10 Oct, 2018

2 commits

  • [ Upstream commit 7acfda539c0b9636a58bfee56abfb3aeee806d96 ]

    When element of verdict map is deleted, the delete routine should
    release chain. however, flush element of verdict map routine doesn't
    release chain.

    test commands:
    %nft add table ip filter
    %nft add chain ip filter c1
    %nft add map ip filter map1 { type ipv4_addr : verdict \; }
    %nft add element ip filter map1 { 1 : jump c1 }
    %nft flush map ip filter map1
    %nft flush ruleset

    splat looks like:
    [ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415!
    [ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55
    [ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables]
    [ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02
    [ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202
    [ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0
    [ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8
    [ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000
    [ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
    [ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000
    [ 4895.234841] FS: 00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
    [ 4895.234841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0
    [ 4895.234841] Call Trace:
    [ 4895.234841] nf_tables_commit+0x2704/0x2c70 [nf_tables]
    [ 4895.234841] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
    [ 4895.234841] ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables]
    [ 4895.323824] ? __lock_is_held+0x9d/0x130
    [ 4895.323824] ? kasan_unpoison_shadow+0x30/0x40
    [ 4895.333299] ? kasan_kmalloc+0xa9/0xc0
    [ 4895.333299] ? kmem_cache_alloc_trace+0x2c0/0x310
    [ 4895.333299] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
    [ 4895.333299] nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink]
    [ 4895.333299] ? debug_show_all_locks+0x290/0x290
    [ 4895.333299] ? nfnetlink_net_init+0x150/0x150 [nfnetlink]
    [ 4895.333299] ? sched_clock_cpu+0xe5/0x170
    [ 4895.333299] ? sched_clock_local+0xff/0x130
    [ 4895.333299] ? sched_clock_cpu+0xe5/0x170
    [ 4895.333299] ? find_held_lock+0x39/0x1b0
    [ 4895.333299] ? sched_clock_local+0xff/0x130
    [ 4895.333299] ? memset+0x1f/0x40
    [ 4895.333299] ? nla_parse+0x33/0x260
    [ 4895.333299] ? ns_capable_common+0x6e/0x110
    [ 4895.333299] nfnetlink_rcv+0x2c0/0x310 [nfnetlink]
    [ ... ]

    Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit c1dc2912059901f97345d9e10c96b841215fdc0f ]

    The cluster match requires conntrack for matching packets. If the
    netns does not have conntrack hooks registered, the match does not
    work at all.

    Implicitly load the conntrack hook for the family, exactly as many
    other extensions do. This ensures that the match works even if the
    hooks have not been registered by other means.

    Signed-off-by: Martin Willi
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Martin Willi
     

15 Sep, 2018

2 commits

  • [ Upstream commit 3e673b23b541b8e7f773b2d378d6eb99831741cd ]

    Shaochun Chen points out we leak dumper filter state allocations
    stored in dump_control->data in case there is an error before netlink sets
    cb_running (after which ->done will be called at some point).

    In order to fix this, add .start functions and move allocations there.

    Same pattern as used in commit 90fd131afc565159c9e0ea742f082b337e10f8c6
    ("netfilter: nf_tables: move dumper state allocation into ->start").

    Reported-by: shaochun chen
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit a53b42c11815d2357e31a9403ae3950517525894 ]

    We came across infinite loop in ipvs when using ipvs in docker
    env.

    When ipvs receives new packets and cannot find an ipvs connection,
    it will create a new connection, then if the dest is unavailable
    (i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.

    But if the dropped packet is the first packet of this connection,
    the connection control timer never has a chance to start and the
    ipvs connection cannot be released. This will lead to memory leak, or
    infinite loop in cleanup_net() when net namespace is released like
    this:

    ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs]
    __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs]
    ops_exit_list at ffffffff81567a49
    cleanup_net at ffffffff81568b40
    process_one_work at ffffffff810a851b
    worker_thread at ffffffff810a9356
    kthread at ffffffff810b0b6f
    ret_from_fork at ffffffff81697a18

    race condition:
    CPU1 CPU2
    ip_vs_in()
    ip_vs_conn_new()
    ip_vs_del_dest()
    __ip_vs_unlink_dest()
    ~IP_VS_DEST_F_AVAILABLE
    cp->dest && !IP_VS_DEST_F_AVAILABLE
    __ip_vs_conn_put
    ...
    cleanup_net ---> infinite looping

    Fix this by checking whether the timer already started.

    Signed-off-by: Tan Hu
    Reviewed-by: Jiang Biao
    Acked-by: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tan Hu
     

05 Sep, 2018

3 commits

  • [ Upstream commit c6cc94df65c3174be92afbee638f11cbb5e606a7 ]

    Its possible to rename two chains to the same name in one
    transaction:

    nft add chain t c1
    nft add chain t c2
    nft 'rename chain t c1 c3;rename chain t c2 c3'

    This creates two chains named 'c3'.

    Appears to be harmless, both chains can still be deleted both
    by name or handle, but, nevertheless, its a bug.

    Walk transaction log and also compare vs. the pending renames.

    Both chains can still be deleted, but nevertheless it is a bug as
    we don't allow to create chains with identical names, so we should
    prevent this from happening-by-rename too.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 9f8aac0be21ed5f99bd5ba0ff315d710737d1794 ]

    The new name is stored in the transaction metadata, on commit,
    the pointers to the old and new names are swapped.

    Therefore in abort and commit case we have to free the
    pointer in the chain_trans container.

    In commit case, the pointer can be used by another cpu that
    is currently dumping the renamed chain, thus kfree needs to
    happen after waiting for rcu readers to complete.

    Fixes: b7263e071a ("netfilter: nf_tables: Allow chain name of up to 255 chars")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 9970a8e40d4c39e23d62d32540366d1d7d2cce9b ]

    GC of set uses call_rcu() to destroy elements.
    So that elements would be destroyed after destroying sets and chains.
    But, elements should be destroyed before destroying sets and chains.
    In order to wait calling call_rcu(), a rcu_barrier() is added.

    In order to test correctly, below patch should be applied.
    https://patchwork.ozlabs.org/patch/940883/

    test scripts:
    %cat test.nft
    table ip aa {
    map map1 {
    type ipv4_addr : verdict; flags timeout;
    elements = {
    0 : jump a0,
    1 : jump a0,
    2 : jump a0,
    3 : jump a0,
    4 : jump a0,
    5 : jump a0,
    6 : jump a0,
    7 : jump a0,
    8 : jump a0,
    9 : jump a0,
    }
    timeout 1s;
    }
    chain a0 {
    }
    }
    flush ruleset

    [ ... ]

    table ip aa {
    map map1 {
    type ipv4_addr : verdict; flags timeout;
    elements = {
    0 : jump a0,
    1 : jump a0,
    2 : jump a0,
    3 : jump a0,
    4 : jump a0,
    5 : jump a0,
    6 : jump a0,
    7 : jump a0,
    8 : jump a0,
    9 : jump a0,
    }
    timeout 1s;
    }
    chain a0 {
    }
    }
    flush ruleset

    Splat looks like:
    [ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
    [ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
    [ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
    [ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0
    4c 8b 40 08 e8 58 e5 fd f8 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
    [ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
    [ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
    [ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
    [ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
    [ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
    [ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
    [ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
    [ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
    [ 200.930353] Call Trace:
    [ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
    [ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
    [ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
    [ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
    [ 200.959532] ? nla_parse+0xab/0x230
    [ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
    [ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
    [ 200.975525] ? debug_show_all_locks+0x290/0x290
    [ 200.980363] ? debug_show_all_locks+0x290/0x290
    [ 200.986356] ? sched_clock_cpu+0x132/0x170
    [ 200.990352] ? find_held_lock+0x39/0x1b0
    [ 200.994355] ? sched_clock_local+0x10d/0x130
    [ 200.999531] ? memset+0x1f/0x40

    Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

24 Aug, 2018

5 commits

  • commit 6613b6173dee098997229caf1f3b961c49da75e6 upstream.

    When first DCCP packet is SYNC or SYNCACK, we insert a new conntrack
    that has an un-initialized timeout value, i.e. such entry could be
    reaped at any time.

    Mark them as INVALID and only ignore SYNC/SYNCACK when connection had
    an old state.

    Reported-by: syzbot+6f18401420df260e37ed@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 2045cdfa1b40d66f126f3fd05604fc7c754f0022 ]

    Loading the nf_conntrack module with doubled hashsize parameter, i.e.
    modprobe nf_conntrack hashsize=12345 hashsize=12345
    causes NULL-ptr deref.

    If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function
    will be called also twice.
    The first nf_conntrack_set_hashsize() call will set the
    'nf_conntrack_htable_size' variable:

    nf_conntrack_set_hashsize()
    ...
    /* On boot, we can set this without any fancy locking. */
    if (!nf_conntrack_htable_size)
    return param_set_uint(val, kp);

    But on the second invocation, the nf_conntrack_htable_size is already set,
    so the nf_conntrack_set_hashsize() will take a different path and call
    the nf_conntrack_hash_resize() function. Which will crash on the attempt
    to dereference 'nf_conntrack_hash' pointer:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack]
    Call Trace:
    nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack]
    parse_args+0x1f9/0x5a0
    load_module+0x1281/0x1a50
    __se_sys_finit_module+0xbe/0xf0
    do_syscall_64+0x7c/0x390
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fix this, by checking !nf_conntrack_hash instead of
    !nf_conntrack_htable_size. nf_conntrack_hash will be initialized only
    after the module loaded, so the second invocation of the
    nf_conntrack_set_hashsize() won't crash, it will just reinitialize
    nf_conntrack_htable_size again.

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andrey Ryabinin
     
  • [ Upstream commit 21d5e078192d244df3d6049f9464fff2f72cfd68 ]

    iptables-nft never requests these, but make this explicitly illegal.
    If it were quested, kernel could oops as ->eval is NULL, furthermore,
    the builtin targets have no owning module so its possible to rmmod
    eb/ip/ip6_tables module even if they would be loaded.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit dffd22aed2aa1e804bccf19b30a421e89ee2ae61 ]

    When proc_dostring() is called with a non-zero offset in strict mode, it
    doesn't just write to the ->data buffer, it also reads. Make sure it
    doesn't read uninitialized data.

    Fixes: c6ac37d8d884 ("netfilter: nf_log: fix error on write NONE to [...]")
    Signed-off-by: Jann Horn
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • [ Upstream commit ad9852af97587b8abe8102f9ddcb05c9769656f6 ]

    The helper module would be unloaded after nf_conntrack_helper_unregister,
    so it may cause a possible panic caused by race.

    nf_ct_iterate_destroy(unhelp, me) reset the helper of conntrack as NULL,
    but maybe someone has gotten the helper pointer during this period. Then
    it would panic, when it accesses the helper and the module was unloaded.

    Take an example as following:
    CPU0 CPU1
    ctnetlink_dump_helpinfo
    helper = rcu_dereference(help->helper);
    unhelp
    set helper as NULL
    unload helper module
    helper->to_nlattr(skb, ct);

    As above, the cpu0 tries to access the helper and its module is unloaded,
    then the panic happens.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gao Feng
     

03 Aug, 2018

2 commits

  • [ Upstream commit 9c7f96fd77b0dbe1fe7ed1f9c462c45dc48a1076 ]

    The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before
    using nft_trans_set(trans). Otherwise we can get out of bounds read.

    For example, KASAN reported the one when running 0001_cache_handling_0 nft
    test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE:

    [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356
    ...
    [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1
    [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2
    [75517.618129] Call Trace:
    [75517.648821] dump_stack+0xd1/0x13b
    [75517.691040] ? show_regs_print_info+0x5/0x5
    [75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5
    [75517.799300] ? lock_acquire+0x143/0x310
    [75517.846738] print_address_description+0x85/0x3a0
    [75517.904547] kasan_report+0x18d/0x4b0
    [75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables]
    [75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables]
    [75518.357154] ? nla_parse+0x1a5/0x300
    [75518.401455] ? kasan_kmalloc+0xa6/0xd0
    [75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
    [75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink]
    [75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink]
    [75518.631711] ? lock_acquire+0x143/0x310
    [75518.679133] ? netlink_deliver_tap+0x9b/0x1070
    [75518.733840] ? kasan_unpoison_shadow+0x31/0x40
    [75518.788542] netlink_unicast+0x45d/0x680
    [75518.837111] ? __isolate_free_page+0x890/0x890
    [75518.891913] ? netlink_attachskb+0x6b0/0x6b0
    [75518.944542] netlink_sendmsg+0x6fa/0xd30
    [75518.993107] ? netlink_unicast+0x680/0x680
    [75519.043758] ? netlink_unicast+0x680/0x680
    [75519.094402] sock_sendmsg+0xd9/0x160
    [75519.138810] ___sys_sendmsg+0x64d/0x980
    [75519.186234] ? copy_msghdr_from_user+0x350/0x350
    [75519.243118] ? lock_downgrade+0x650/0x650
    [75519.292738] ? do_raw_spin_unlock+0x5d/0x250
    [75519.345456] ? _raw_spin_unlock+0x24/0x30
    [75519.395065] ? __handle_mm_fault+0xbde/0x3410
    [75519.448830] ? sock_setsockopt+0x3d2/0x1940
    [75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0
    [75519.558448] ? lock_downgrade+0x650/0x650
    [75519.608057] ? __audit_syscall_entry+0x317/0x720
    [75519.664960] ? __fget_light+0x58/0x250
    [75519.711325] ? __sys_sendmsg+0xde/0x170
    [75519.758850] __sys_sendmsg+0xde/0x170
    [75519.804193] ? __ia32_sys_shutdown+0x90/0x90
    [75519.856725] ? syscall_trace_enter+0x897/0x10e0
    [75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920
    [75519.979432] ? __audit_syscall_entry+0x720/0x720
    [75520.036118] do_syscall_64+0xa3/0x3d0
    [75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0
    [75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [75520.201680] RIP: 0033:0x7fc153320ba0
    [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0
    [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003
    [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090
    [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660
    [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001

    [75520.790946] Allocated by task 7356:
    [75520.833994] kasan_kmalloc+0xa6/0xd0
    [75520.878088] __kmalloc+0x189/0x450
    [75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables]
    [75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables]
    [75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
    [75521.108655] netlink_unicast+0x45d/0x680
    [75521.157013] netlink_sendmsg+0x6fa/0xd30
    [75521.205271] sock_sendmsg+0xd9/0x160
    [75521.249365] ___sys_sendmsg+0x64d/0x980
    [75521.296686] __sys_sendmsg+0xde/0x170
    [75521.341822] do_syscall_64+0xa3/0x3d0
    [75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [75521.467867] Freed by task 23454:
    [75521.507804] __kasan_slab_free+0x132/0x180
    [75521.558137] kfree+0x14d/0x4d0
    [75521.596005] free_rt_sched_group+0x153/0x280
    [75521.648410] sched_autogroup_create_attach+0x19a/0x520
    [75521.711330] ksys_setsid+0x2ba/0x400
    [75521.755529] __ia32_sys_setsid+0xa/0x10
    [75521.802850] do_syscall_64+0xa3/0x3d0
    [75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [75521.929000] The buggy address belongs to the object at ffff881bdb643f80
    which belongs to the cache kmalloc-96 of size 96
    [75522.079797] The buggy address is located 72 bytes inside of
    96-byte region [ffff881bdb643f80, ffff881bdb643fe0)
    [75522.221234] The buggy address belongs to the page:
    [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0
    [75522.377443] flags: 0x2fffff80000100(slab)
    [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020
    [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000
    [75522.615601] page dumped because: kasan: bad access detected

    Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets")
    Signed-off-by: Alexey Kodanev
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit cbdebe481a14b42c45aa9f4ceb5ff19b55de2c57 ]

    Userspace `ipset` command forbids family option for hash:mac type:

    ipset create test hash:mac family inet4
    ipset v6.30: Unknown argument: `family'

    However, this check is not done in kernel itself. When someone use
    external netlink applications (pyroute2 python library for example), one
    can create hash:mac with invalid family and inconsistant results from
    userspace (`ipset` command cannot read set content anymore).

    This patch enforce the logic in kernel, and forbids insertion of
    hash:mac with a family set.

    Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no
    impact on other hash:* sets

    Signed-off-by: Florent Fourcot
    Signed-off-by: Victorien Molle
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florent Fourcot
     

17 Jul, 2018

1 commit

  • commit ba062ebb2cd561d404e0fba8ee4b3f5ebce7cbfc upstream.

    Three attributes are currently not verified, thus can trigger KMSAN
    warnings such as :

    BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
    BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
    CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
    __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
    __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    __fswab32 include/uapi/linux/swab.h:59 [inline]
    nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
    nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212
    netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
    nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x43fd59
    RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59
    RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680
    R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2753 [inline]
    __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:988 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
    netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: fdb694a01f1f ("netfilter: Add fail-open support")
    Fixes: 829e17a1a602 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

11 Jul, 2018

1 commit

  • commit ce00bf07cc95a57cd20b208e02b3c2604e532ae8 upstream.

    The old code would indefinitely block other users of nf_log_mutex if
    a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
    region. Fix it by moving proc_dostring() out of the locked region.

    This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix
    sleeping function called from invalid context"), which changed this code
    from using rcu_read_lock() to taking nf_log_mutex.

    Fixes: 266d07cb1c9a ("netfilter: nf_log: fix sleeping function calle[...]")
    Signed-off-by: Jann Horn
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

08 Jul, 2018

7 commits

  • [ Upstream commit 52f96757905bbf0edef47f3ee6c7c784e7f8ff8a ]

    syzkaller reports for buffer overflow for interface name
    when starting sync daemons [1]

    What we do is that we copy user structure into larger stack
    buffer but later we search NUL past the stack buffer.
    The same happens for sched_name when adding/editing virtual server.

    We are restricted by IP_VS_SCHEDNAME_MAXLEN and IP_VS_IFNAME_MAXLEN
    being used as size in include/uapi/linux/ip_vs.h, so they
    include the space for NUL.

    As using strlcpy is wrong for unsafe source, replace it with
    strscpy and add checks to return EINVAL if source string is not
    NUL-terminated. The incomplete strlcpy fix comes from 2.6.13.

    For the netlink interface reduce the len parameter for
    IPVS_DAEMON_ATTR_MCAST_IFN and IPVS_SVC_ATTR_SCHED_NAME,
    so that we get proper EINVAL.

    [1]
    kernel BUG at lib/string.c:1052!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 373 Comm: syz-executor936 Not tainted 4.17.0-rc4+ #45
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
    RSP: 0018:ffff8801c976f800 EFLAGS: 00010282
    RAX: 0000000000000022 RBX: 0000000000000040 RCX: 0000000000000000
    RDX: 0000000000000022 RSI: ffffffff8160f6f1 RDI: ffffed00392edef6
    RBP: ffff8801c976f800 R08: ffff8801cf4c62c0 R09: ffffed003b5e4fb0
    R10: ffffed003b5e4fb0 R11: ffff8801daf27d87 R12: ffff8801c976fa20
    R13: ffff8801c976fae4 R14: ffff8801c976fae0 R15: 000000000000048b
    FS: 00007fd99f75e700(0000) GS:ffff8801daf00000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000200001c0 CR3: 00000001d6843000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    strlen include/linux/string.h:270 [inline]
    strlcpy include/linux/string.h:293 [inline]
    do_ip_vs_set_ctl+0x31c/0x1d00 net/netfilter/ipvs/ip_vs_ctl.c:2388
    nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
    nf_setsockopt+0x7d/0xd0 net/netfilter/nf_sockopt.c:115
    ip_setsockopt+0xd8/0xf0 net/ipv4/ip_sockglue.c:1253
    udp_setsockopt+0x62/0xa0 net/ipv4/udp.c:2487
    ipv6_setsockopt+0x149/0x170 net/ipv6/ipv6_sockglue.c:917
    tcp_setsockopt+0x93/0xe0 net/ipv4/tcp.c:3057
    sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
    __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
    __do_sys_setsockopt net/socket.c:1914 [inline]
    __se_sys_setsockopt net/socket.c:1911 [inline]
    __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x447369
    RSP: 002b:00007fd99f75dda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00000000006e39e4 RCX: 0000000000447369
    RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 0000000000000018 R09: 0000000000000000
    R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000006e39e0
    R13: 75a1ff93f0896195 R14: 6f745f3168746576 R15: 0000000000000001
    Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 d2 8f 48 fa eb
    de 55 48 89 fe 48 c7 c7 60 65 64 88 48 89 e5 e8 91 dd f3 f9 0b 90 90
    90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
    RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801c976f800

    Reported-and-tested-by: syzbot+aac887f77319868646df@syzkaller.appspotmail.com
    Fixes: e4ff67513096 ("ipvs: add sync_maxlen parameter for the sync daemon")
    Fixes: 4da62fc70d7c ("[IPVS]: Fix for overflows")
    Signed-off-by: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Julian Anastasov
     
  • [ Upstream commit 3e0f64b7dd3149f75e8652ff1df56cffeedc8fc1 ]

    Credit calculations for the packet ratelimiting are not correct, as per
    the applied ratelimit of 25/second and burst 8, a total of 33 packets
    should have been accepted. This is true in iptables(33) but not in
    nftables (~65). For packet ratelimiting, use:

    div_u64(limit->nsecs, limit->rate) * limit->burst;

    to calculate credit, just like in iptables' xt_limit does.

    Moreover, use default burst in iptables, users are expecting similar
    behaviour.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit adc972c5b88829d38ede08b1069718661c7330ae upstream.

    When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain
    crashes. But there is no need to crash hard here.

    Suggested-by: Florian Westphal
    Signed-off-by: Taehee Yoo
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit 360cc79d9d299ce297b205508276285ceffc5fa8 upstream.

    The table field in nft_obj_filter is not an array. In order to check
    tablename, we should check if the pointer is set.

    Test commands:

    %nft add table ip filter
    %nft add counter ip filter ct1
    %nft reset counters

    Splat looks like:

    [ 306.510504] kasan: CONFIG_KASAN_INLINE enabled
    [ 306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 306.524775] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink ip_tables x_tables
    [ 306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17
    [ 306.528284] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables]
    [ 306.528284] RSP: 0018:ffff8800b6cb7520 EFLAGS: 00010246
    [ 306.528284] RAX: 0000000000000000 RBX: ffff8800b6c49820 RCX: 0000000000000000
    [ 306.528284] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffed0016d96e9a
    [ 306.528284] RBP: ffff8800b6cb75c0 R08: ffffed00236fce7c R09: ffffed00236fce7b
    [ 306.528284] R10: ffffffff9f6241e8 R11: ffffed00236fce7c R12: ffff880111365108
    [ 306.528284] R13: 0000000000000000 R14: ffff8800b6c49860 R15: ffff8800b6c49860
    [ 306.528284] FS: 00007f838b007700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
    [ 306.528284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 306.528284] CR2: 00007ffeafabcf78 CR3: 00000000b6cbe000 CR4: 00000000001006f0
    [ 306.528284] Call Trace:
    [ 306.528284] netlink_dump+0x470/0xa20
    [ 306.528284] __netlink_dump_start+0x5ae/0x690
    [ 306.528284] ? nf_tables_getobj+0x1b3/0x740 [nf_tables]
    [ 306.528284] nf_tables_getobj+0x2f5/0x740 [nf_tables]
    [ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
    [ 306.528284] ? nf_tables_getobj+0x740/0x740 [nf_tables]
    [ 306.528284] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
    [ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
    [ 306.528284] nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink]
    [ 306.528284] ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink]
    [ 306.528284] netlink_rcv_skb+0x1c9/0x2f0
    [ 306.528284] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
    [ 306.528284] ? debug_check_no_locks_freed+0x270/0x270
    [ 306.528284] ? netlink_ack+0x7a0/0x7a0
    [ 306.528284] ? ns_capable_common+0x6e/0x110
    [ ... ]

    Fixes: e46abbcc05aa8 ("netfilter: nf_tables: Allow table names of up to 255 chars")
    Signed-off-by: Taehee Yoo
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Florian Westphal
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit 467697d289e7e6e1b15910d99096c0da08c56d5b upstream.

    Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
    Fixes: f25ad2e907f1 ("netfilter: nf_tables: prepare for expressions associated to set elements")
    Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit f0dfd7a2b35b02030949100247d851b793cb275f upstream.

    Currently the -EBUSY error return path is not free'ing resources
    allocated earlier, leaving a memory leak. Fix this by exiting via the
    error exit label err5 that performs the necessary resource clean
    up.

    Detected by CoverityScan, CID#1432975 ("Resource leak")

    Fixes: 9744a6fcefcb ("netfilter: nf_tables: check if same extensions are set when adding elements")
    Signed-off-by: Colin Ian King
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Colin Ian King
     
  • commit bbb8c61f97e3a2dd91b30d3e57b7964a67569d11 upstream.

    When a chain is updated, a counter can be attached. if so,
    the nft_counters_enabled should be increased.

    test commands:

    %nft add table ip filter
    %nft add chain ip filter input { type filter hook input priority 4\; }
    %iptables-compat -Z input
    %nft delete chain ip filter input

    we can see below messages.

    [ 286.443720] jump label: negative count!
    [ 286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0
    [ 286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
    [ 286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G W 4.17.0-rc2+ #12
    [ 286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
    [ 286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286
    [ 286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522
    [ 286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac
    [ 286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b
    [ 286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000
    [ 286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000
    [ 286.449144] FS: 00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
    [ 286.449144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0
    [ 286.449144] Call Trace:
    [ 286.449144] static_key_slow_dec+0x6a/0x70
    [ 286.449144] nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
    [ 286.449144] nf_tables_commit+0x1891/0x1c50 [nf_tables]
    [ 286.449144] nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
    [ ... ]

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo