10 Oct, 2018

2 commits

  • [ Upstream commit 7acfda539c0b9636a58bfee56abfb3aeee806d96 ]

    When element of verdict map is deleted, the delete routine should
    release chain. however, flush element of verdict map routine doesn't
    release chain.

    test commands:
    %nft add table ip filter
    %nft add chain ip filter c1
    %nft add map ip filter map1 { type ipv4_addr : verdict \; }
    %nft add element ip filter map1 { 1 : jump c1 }
    %nft flush map ip filter map1
    %nft flush ruleset

    splat looks like:
    [ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415!
    [ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55
    [ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables]
    [ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02
    [ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202
    [ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0
    [ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8
    [ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000
    [ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
    [ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000
    [ 4895.234841] FS: 00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
    [ 4895.234841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0
    [ 4895.234841] Call Trace:
    [ 4895.234841] nf_tables_commit+0x2704/0x2c70 [nf_tables]
    [ 4895.234841] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
    [ 4895.234841] ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables]
    [ 4895.323824] ? __lock_is_held+0x9d/0x130
    [ 4895.323824] ? kasan_unpoison_shadow+0x30/0x40
    [ 4895.333299] ? kasan_kmalloc+0xa9/0xc0
    [ 4895.333299] ? kmem_cache_alloc_trace+0x2c0/0x310
    [ 4895.333299] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
    [ 4895.333299] nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink]
    [ 4895.333299] ? debug_show_all_locks+0x290/0x290
    [ 4895.333299] ? nfnetlink_net_init+0x150/0x150 [nfnetlink]
    [ 4895.333299] ? sched_clock_cpu+0xe5/0x170
    [ 4895.333299] ? sched_clock_local+0xff/0x130
    [ 4895.333299] ? sched_clock_cpu+0xe5/0x170
    [ 4895.333299] ? find_held_lock+0x39/0x1b0
    [ 4895.333299] ? sched_clock_local+0xff/0x130
    [ 4895.333299] ? memset+0x1f/0x40
    [ 4895.333299] ? nla_parse+0x33/0x260
    [ 4895.333299] ? ns_capable_common+0x6e/0x110
    [ 4895.333299] nfnetlink_rcv+0x2c0/0x310 [nfnetlink]
    [ ... ]

    Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit c1dc2912059901f97345d9e10c96b841215fdc0f ]

    The cluster match requires conntrack for matching packets. If the
    netns does not have conntrack hooks registered, the match does not
    work at all.

    Implicitly load the conntrack hook for the family, exactly as many
    other extensions do. This ensures that the match works even if the
    hooks have not been registered by other means.

    Signed-off-by: Martin Willi
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Martin Willi
     

15 Sep, 2018

2 commits

  • [ Upstream commit 3e673b23b541b8e7f773b2d378d6eb99831741cd ]

    Shaochun Chen points out we leak dumper filter state allocations
    stored in dump_control->data in case there is an error before netlink sets
    cb_running (after which ->done will be called at some point).

    In order to fix this, add .start functions and move allocations there.

    Same pattern as used in commit 90fd131afc565159c9e0ea742f082b337e10f8c6
    ("netfilter: nf_tables: move dumper state allocation into ->start").

    Reported-by: shaochun chen
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit a53b42c11815d2357e31a9403ae3950517525894 ]

    We came across infinite loop in ipvs when using ipvs in docker
    env.

    When ipvs receives new packets and cannot find an ipvs connection,
    it will create a new connection, then if the dest is unavailable
    (i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.

    But if the dropped packet is the first packet of this connection,
    the connection control timer never has a chance to start and the
    ipvs connection cannot be released. This will lead to memory leak, or
    infinite loop in cleanup_net() when net namespace is released like
    this:

    ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs]
    __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs]
    ops_exit_list at ffffffff81567a49
    cleanup_net at ffffffff81568b40
    process_one_work at ffffffff810a851b
    worker_thread at ffffffff810a9356
    kthread at ffffffff810b0b6f
    ret_from_fork at ffffffff81697a18

    race condition:
    CPU1 CPU2
    ip_vs_in()
    ip_vs_conn_new()
    ip_vs_del_dest()
    __ip_vs_unlink_dest()
    ~IP_VS_DEST_F_AVAILABLE
    cp->dest && !IP_VS_DEST_F_AVAILABLE
    __ip_vs_conn_put
    ...
    cleanup_net ---> infinite looping

    Fix this by checking whether the timer already started.

    Signed-off-by: Tan Hu
    Reviewed-by: Jiang Biao
    Acked-by: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tan Hu
     

05 Sep, 2018

3 commits

  • [ Upstream commit c6cc94df65c3174be92afbee638f11cbb5e606a7 ]

    Its possible to rename two chains to the same name in one
    transaction:

    nft add chain t c1
    nft add chain t c2
    nft 'rename chain t c1 c3;rename chain t c2 c3'

    This creates two chains named 'c3'.

    Appears to be harmless, both chains can still be deleted both
    by name or handle, but, nevertheless, its a bug.

    Walk transaction log and also compare vs. the pending renames.

    Both chains can still be deleted, but nevertheless it is a bug as
    we don't allow to create chains with identical names, so we should
    prevent this from happening-by-rename too.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 9f8aac0be21ed5f99bd5ba0ff315d710737d1794 ]

    The new name is stored in the transaction metadata, on commit,
    the pointers to the old and new names are swapped.

    Therefore in abort and commit case we have to free the
    pointer in the chain_trans container.

    In commit case, the pointer can be used by another cpu that
    is currently dumping the renamed chain, thus kfree needs to
    happen after waiting for rcu readers to complete.

    Fixes: b7263e071a ("netfilter: nf_tables: Allow chain name of up to 255 chars")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 9970a8e40d4c39e23d62d32540366d1d7d2cce9b ]

    GC of set uses call_rcu() to destroy elements.
    So that elements would be destroyed after destroying sets and chains.
    But, elements should be destroyed before destroying sets and chains.
    In order to wait calling call_rcu(), a rcu_barrier() is added.

    In order to test correctly, below patch should be applied.
    https://patchwork.ozlabs.org/patch/940883/

    test scripts:
    %cat test.nft
    table ip aa {
    map map1 {
    type ipv4_addr : verdict; flags timeout;
    elements = {
    0 : jump a0,
    1 : jump a0,
    2 : jump a0,
    3 : jump a0,
    4 : jump a0,
    5 : jump a0,
    6 : jump a0,
    7 : jump a0,
    8 : jump a0,
    9 : jump a0,
    }
    timeout 1s;
    }
    chain a0 {
    }
    }
    flush ruleset

    [ ... ]

    table ip aa {
    map map1 {
    type ipv4_addr : verdict; flags timeout;
    elements = {
    0 : jump a0,
    1 : jump a0,
    2 : jump a0,
    3 : jump a0,
    4 : jump a0,
    5 : jump a0,
    6 : jump a0,
    7 : jump a0,
    8 : jump a0,
    9 : jump a0,
    }
    timeout 1s;
    }
    chain a0 {
    }
    }
    flush ruleset

    Splat looks like:
    [ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
    [ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
    [ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
    [ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0
    4c 8b 40 08 e8 58 e5 fd f8 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
    [ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
    [ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
    [ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
    [ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
    [ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
    [ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
    [ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
    [ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
    [ 200.930353] Call Trace:
    [ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
    [ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
    [ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
    [ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
    [ 200.959532] ? nla_parse+0xab/0x230
    [ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
    [ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
    [ 200.975525] ? debug_show_all_locks+0x290/0x290
    [ 200.980363] ? debug_show_all_locks+0x290/0x290
    [ 200.986356] ? sched_clock_cpu+0x132/0x170
    [ 200.990352] ? find_held_lock+0x39/0x1b0
    [ 200.994355] ? sched_clock_local+0x10d/0x130
    [ 200.999531] ? memset+0x1f/0x40

    Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

24 Aug, 2018

5 commits

  • commit 6613b6173dee098997229caf1f3b961c49da75e6 upstream.

    When first DCCP packet is SYNC or SYNCACK, we insert a new conntrack
    that has an un-initialized timeout value, i.e. such entry could be
    reaped at any time.

    Mark them as INVALID and only ignore SYNC/SYNCACK when connection had
    an old state.

    Reported-by: syzbot+6f18401420df260e37ed@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 2045cdfa1b40d66f126f3fd05604fc7c754f0022 ]

    Loading the nf_conntrack module with doubled hashsize parameter, i.e.
    modprobe nf_conntrack hashsize=12345 hashsize=12345
    causes NULL-ptr deref.

    If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function
    will be called also twice.
    The first nf_conntrack_set_hashsize() call will set the
    'nf_conntrack_htable_size' variable:

    nf_conntrack_set_hashsize()
    ...
    /* On boot, we can set this without any fancy locking. */
    if (!nf_conntrack_htable_size)
    return param_set_uint(val, kp);

    But on the second invocation, the nf_conntrack_htable_size is already set,
    so the nf_conntrack_set_hashsize() will take a different path and call
    the nf_conntrack_hash_resize() function. Which will crash on the attempt
    to dereference 'nf_conntrack_hash' pointer:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack]
    Call Trace:
    nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack]
    parse_args+0x1f9/0x5a0
    load_module+0x1281/0x1a50
    __se_sys_finit_module+0xbe/0xf0
    do_syscall_64+0x7c/0x390
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fix this, by checking !nf_conntrack_hash instead of
    !nf_conntrack_htable_size. nf_conntrack_hash will be initialized only
    after the module loaded, so the second invocation of the
    nf_conntrack_set_hashsize() won't crash, it will just reinitialize
    nf_conntrack_htable_size again.

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andrey Ryabinin
     
  • [ Upstream commit 21d5e078192d244df3d6049f9464fff2f72cfd68 ]

    iptables-nft never requests these, but make this explicitly illegal.
    If it were quested, kernel could oops as ->eval is NULL, furthermore,
    the builtin targets have no owning module so its possible to rmmod
    eb/ip/ip6_tables module even if they would be loaded.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit dffd22aed2aa1e804bccf19b30a421e89ee2ae61 ]

    When proc_dostring() is called with a non-zero offset in strict mode, it
    doesn't just write to the ->data buffer, it also reads. Make sure it
    doesn't read uninitialized data.

    Fixes: c6ac37d8d884 ("netfilter: nf_log: fix error on write NONE to [...]")
    Signed-off-by: Jann Horn
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • [ Upstream commit ad9852af97587b8abe8102f9ddcb05c9769656f6 ]

    The helper module would be unloaded after nf_conntrack_helper_unregister,
    so it may cause a possible panic caused by race.

    nf_ct_iterate_destroy(unhelp, me) reset the helper of conntrack as NULL,
    but maybe someone has gotten the helper pointer during this period. Then
    it would panic, when it accesses the helper and the module was unloaded.

    Take an example as following:
    CPU0 CPU1
    ctnetlink_dump_helpinfo
    helper = rcu_dereference(help->helper);
    unhelp
    set helper as NULL
    unload helper module
    helper->to_nlattr(skb, ct);

    As above, the cpu0 tries to access the helper and its module is unloaded,
    then the panic happens.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gao Feng
     

03 Aug, 2018

2 commits

  • [ Upstream commit 9c7f96fd77b0dbe1fe7ed1f9c462c45dc48a1076 ]

    The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before
    using nft_trans_set(trans). Otherwise we can get out of bounds read.

    For example, KASAN reported the one when running 0001_cache_handling_0 nft
    test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE:

    [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356
    ...
    [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1
    [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2
    [75517.618129] Call Trace:
    [75517.648821] dump_stack+0xd1/0x13b
    [75517.691040] ? show_regs_print_info+0x5/0x5
    [75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5
    [75517.799300] ? lock_acquire+0x143/0x310
    [75517.846738] print_address_description+0x85/0x3a0
    [75517.904547] kasan_report+0x18d/0x4b0
    [75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables]
    [75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables]
    [75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables]
    [75518.357154] ? nla_parse+0x1a5/0x300
    [75518.401455] ? kasan_kmalloc+0xa6/0xd0
    [75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
    [75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink]
    [75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink]
    [75518.631711] ? lock_acquire+0x143/0x310
    [75518.679133] ? netlink_deliver_tap+0x9b/0x1070
    [75518.733840] ? kasan_unpoison_shadow+0x31/0x40
    [75518.788542] netlink_unicast+0x45d/0x680
    [75518.837111] ? __isolate_free_page+0x890/0x890
    [75518.891913] ? netlink_attachskb+0x6b0/0x6b0
    [75518.944542] netlink_sendmsg+0x6fa/0xd30
    [75518.993107] ? netlink_unicast+0x680/0x680
    [75519.043758] ? netlink_unicast+0x680/0x680
    [75519.094402] sock_sendmsg+0xd9/0x160
    [75519.138810] ___sys_sendmsg+0x64d/0x980
    [75519.186234] ? copy_msghdr_from_user+0x350/0x350
    [75519.243118] ? lock_downgrade+0x650/0x650
    [75519.292738] ? do_raw_spin_unlock+0x5d/0x250
    [75519.345456] ? _raw_spin_unlock+0x24/0x30
    [75519.395065] ? __handle_mm_fault+0xbde/0x3410
    [75519.448830] ? sock_setsockopt+0x3d2/0x1940
    [75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0
    [75519.558448] ? lock_downgrade+0x650/0x650
    [75519.608057] ? __audit_syscall_entry+0x317/0x720
    [75519.664960] ? __fget_light+0x58/0x250
    [75519.711325] ? __sys_sendmsg+0xde/0x170
    [75519.758850] __sys_sendmsg+0xde/0x170
    [75519.804193] ? __ia32_sys_shutdown+0x90/0x90
    [75519.856725] ? syscall_trace_enter+0x897/0x10e0
    [75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920
    [75519.979432] ? __audit_syscall_entry+0x720/0x720
    [75520.036118] do_syscall_64+0xa3/0x3d0
    [75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0
    [75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [75520.201680] RIP: 0033:0x7fc153320ba0
    [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0
    [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003
    [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090
    [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660
    [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001

    [75520.790946] Allocated by task 7356:
    [75520.833994] kasan_kmalloc+0xa6/0xd0
    [75520.878088] __kmalloc+0x189/0x450
    [75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables]
    [75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables]
    [75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
    [75521.108655] netlink_unicast+0x45d/0x680
    [75521.157013] netlink_sendmsg+0x6fa/0xd30
    [75521.205271] sock_sendmsg+0xd9/0x160
    [75521.249365] ___sys_sendmsg+0x64d/0x980
    [75521.296686] __sys_sendmsg+0xde/0x170
    [75521.341822] do_syscall_64+0xa3/0x3d0
    [75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [75521.467867] Freed by task 23454:
    [75521.507804] __kasan_slab_free+0x132/0x180
    [75521.558137] kfree+0x14d/0x4d0
    [75521.596005] free_rt_sched_group+0x153/0x280
    [75521.648410] sched_autogroup_create_attach+0x19a/0x520
    [75521.711330] ksys_setsid+0x2ba/0x400
    [75521.755529] __ia32_sys_setsid+0xa/0x10
    [75521.802850] do_syscall_64+0xa3/0x3d0
    [75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [75521.929000] The buggy address belongs to the object at ffff881bdb643f80
    which belongs to the cache kmalloc-96 of size 96
    [75522.079797] The buggy address is located 72 bytes inside of
    96-byte region [ffff881bdb643f80, ffff881bdb643fe0)
    [75522.221234] The buggy address belongs to the page:
    [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0
    [75522.377443] flags: 0x2fffff80000100(slab)
    [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020
    [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000
    [75522.615601] page dumped because: kasan: bad access detected

    Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets")
    Signed-off-by: Alexey Kodanev
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit cbdebe481a14b42c45aa9f4ceb5ff19b55de2c57 ]

    Userspace `ipset` command forbids family option for hash:mac type:

    ipset create test hash:mac family inet4
    ipset v6.30: Unknown argument: `family'

    However, this check is not done in kernel itself. When someone use
    external netlink applications (pyroute2 python library for example), one
    can create hash:mac with invalid family and inconsistant results from
    userspace (`ipset` command cannot read set content anymore).

    This patch enforce the logic in kernel, and forbids insertion of
    hash:mac with a family set.

    Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no
    impact on other hash:* sets

    Signed-off-by: Florent Fourcot
    Signed-off-by: Victorien Molle
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florent Fourcot
     

17 Jul, 2018

1 commit

  • commit ba062ebb2cd561d404e0fba8ee4b3f5ebce7cbfc upstream.

    Three attributes are currently not verified, thus can trigger KMSAN
    warnings such as :

    BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
    BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
    CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
    __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
    __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    __fswab32 include/uapi/linux/swab.h:59 [inline]
    nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
    nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212
    netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
    nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x43fd59
    RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59
    RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680
    R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2753 [inline]
    __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:988 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
    netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: fdb694a01f1f ("netfilter: Add fail-open support")
    Fixes: 829e17a1a602 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

11 Jul, 2018

1 commit

  • commit ce00bf07cc95a57cd20b208e02b3c2604e532ae8 upstream.

    The old code would indefinitely block other users of nf_log_mutex if
    a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
    region. Fix it by moving proc_dostring() out of the locked region.

    This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix
    sleeping function called from invalid context"), which changed this code
    from using rcu_read_lock() to taking nf_log_mutex.

    Fixes: 266d07cb1c9a ("netfilter: nf_log: fix sleeping function calle[...]")
    Signed-off-by: Jann Horn
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

08 Jul, 2018

14 commits

  • [ Upstream commit 52f96757905bbf0edef47f3ee6c7c784e7f8ff8a ]

    syzkaller reports for buffer overflow for interface name
    when starting sync daemons [1]

    What we do is that we copy user structure into larger stack
    buffer but later we search NUL past the stack buffer.
    The same happens for sched_name when adding/editing virtual server.

    We are restricted by IP_VS_SCHEDNAME_MAXLEN and IP_VS_IFNAME_MAXLEN
    being used as size in include/uapi/linux/ip_vs.h, so they
    include the space for NUL.

    As using strlcpy is wrong for unsafe source, replace it with
    strscpy and add checks to return EINVAL if source string is not
    NUL-terminated. The incomplete strlcpy fix comes from 2.6.13.

    For the netlink interface reduce the len parameter for
    IPVS_DAEMON_ATTR_MCAST_IFN and IPVS_SVC_ATTR_SCHED_NAME,
    so that we get proper EINVAL.

    [1]
    kernel BUG at lib/string.c:1052!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 373 Comm: syz-executor936 Not tainted 4.17.0-rc4+ #45
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
    RSP: 0018:ffff8801c976f800 EFLAGS: 00010282
    RAX: 0000000000000022 RBX: 0000000000000040 RCX: 0000000000000000
    RDX: 0000000000000022 RSI: ffffffff8160f6f1 RDI: ffffed00392edef6
    RBP: ffff8801c976f800 R08: ffff8801cf4c62c0 R09: ffffed003b5e4fb0
    R10: ffffed003b5e4fb0 R11: ffff8801daf27d87 R12: ffff8801c976fa20
    R13: ffff8801c976fae4 R14: ffff8801c976fae0 R15: 000000000000048b
    FS: 00007fd99f75e700(0000) GS:ffff8801daf00000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000200001c0 CR3: 00000001d6843000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    strlen include/linux/string.h:270 [inline]
    strlcpy include/linux/string.h:293 [inline]
    do_ip_vs_set_ctl+0x31c/0x1d00 net/netfilter/ipvs/ip_vs_ctl.c:2388
    nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
    nf_setsockopt+0x7d/0xd0 net/netfilter/nf_sockopt.c:115
    ip_setsockopt+0xd8/0xf0 net/ipv4/ip_sockglue.c:1253
    udp_setsockopt+0x62/0xa0 net/ipv4/udp.c:2487
    ipv6_setsockopt+0x149/0x170 net/ipv6/ipv6_sockglue.c:917
    tcp_setsockopt+0x93/0xe0 net/ipv4/tcp.c:3057
    sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
    __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
    __do_sys_setsockopt net/socket.c:1914 [inline]
    __se_sys_setsockopt net/socket.c:1911 [inline]
    __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x447369
    RSP: 002b:00007fd99f75dda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00000000006e39e4 RCX: 0000000000447369
    RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 0000000000000018 R09: 0000000000000000
    R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000006e39e0
    R13: 75a1ff93f0896195 R14: 6f745f3168746576 R15: 0000000000000001
    Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 d2 8f 48 fa eb
    de 55 48 89 fe 48 c7 c7 60 65 64 88 48 89 e5 e8 91 dd f3 f9 0b 90 90
    90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
    RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801c976f800

    Reported-and-tested-by: syzbot+aac887f77319868646df@syzkaller.appspotmail.com
    Fixes: e4ff67513096 ("ipvs: add sync_maxlen parameter for the sync daemon")
    Fixes: 4da62fc70d7c ("[IPVS]: Fix for overflows")
    Signed-off-by: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Julian Anastasov
     
  • [ Upstream commit 3e0f64b7dd3149f75e8652ff1df56cffeedc8fc1 ]

    Credit calculations for the packet ratelimiting are not correct, as per
    the applied ratelimit of 25/second and burst 8, a total of 33 packets
    should have been accepted. This is true in iptables(33) but not in
    nftables (~65). For packet ratelimiting, use:

    div_u64(limit->nsecs, limit->rate) * limit->burst;

    to calculate credit, just like in iptables' xt_limit does.

    Moreover, use default burst in iptables, users are expecting similar
    behaviour.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit adc972c5b88829d38ede08b1069718661c7330ae upstream.

    When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain
    crashes. But there is no need to crash hard here.

    Suggested-by: Florian Westphal
    Signed-off-by: Taehee Yoo
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit 360cc79d9d299ce297b205508276285ceffc5fa8 upstream.

    The table field in nft_obj_filter is not an array. In order to check
    tablename, we should check if the pointer is set.

    Test commands:

    %nft add table ip filter
    %nft add counter ip filter ct1
    %nft reset counters

    Splat looks like:

    [ 306.510504] kasan: CONFIG_KASAN_INLINE enabled
    [ 306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 306.524775] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink ip_tables x_tables
    [ 306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17
    [ 306.528284] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables]
    [ 306.528284] RSP: 0018:ffff8800b6cb7520 EFLAGS: 00010246
    [ 306.528284] RAX: 0000000000000000 RBX: ffff8800b6c49820 RCX: 0000000000000000
    [ 306.528284] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffed0016d96e9a
    [ 306.528284] RBP: ffff8800b6cb75c0 R08: ffffed00236fce7c R09: ffffed00236fce7b
    [ 306.528284] R10: ffffffff9f6241e8 R11: ffffed00236fce7c R12: ffff880111365108
    [ 306.528284] R13: 0000000000000000 R14: ffff8800b6c49860 R15: ffff8800b6c49860
    [ 306.528284] FS: 00007f838b007700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
    [ 306.528284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 306.528284] CR2: 00007ffeafabcf78 CR3: 00000000b6cbe000 CR4: 00000000001006f0
    [ 306.528284] Call Trace:
    [ 306.528284] netlink_dump+0x470/0xa20
    [ 306.528284] __netlink_dump_start+0x5ae/0x690
    [ 306.528284] ? nf_tables_getobj+0x1b3/0x740 [nf_tables]
    [ 306.528284] nf_tables_getobj+0x2f5/0x740 [nf_tables]
    [ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
    [ 306.528284] ? nf_tables_getobj+0x740/0x740 [nf_tables]
    [ 306.528284] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
    [ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
    [ 306.528284] nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink]
    [ 306.528284] ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink]
    [ 306.528284] netlink_rcv_skb+0x1c9/0x2f0
    [ 306.528284] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
    [ 306.528284] ? debug_check_no_locks_freed+0x270/0x270
    [ 306.528284] ? netlink_ack+0x7a0/0x7a0
    [ 306.528284] ? ns_capable_common+0x6e/0x110
    [ ... ]

    Fixes: e46abbcc05aa8 ("netfilter: nf_tables: Allow table names of up to 255 chars")
    Signed-off-by: Taehee Yoo
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Florian Westphal
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit 467697d289e7e6e1b15910d99096c0da08c56d5b upstream.

    Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
    Fixes: f25ad2e907f1 ("netfilter: nf_tables: prepare for expressions associated to set elements")
    Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit f0dfd7a2b35b02030949100247d851b793cb275f upstream.

    Currently the -EBUSY error return path is not free'ing resources
    allocated earlier, leaving a memory leak. Fix this by exiting via the
    error exit label err5 that performs the necessary resource clean
    up.

    Detected by CoverityScan, CID#1432975 ("Resource leak")

    Fixes: 9744a6fcefcb ("netfilter: nf_tables: check if same extensions are set when adding elements")
    Signed-off-by: Colin Ian King
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Colin Ian King
     
  • commit bbb8c61f97e3a2dd91b30d3e57b7964a67569d11 upstream.

    When a chain is updated, a counter can be attached. if so,
    the nft_counters_enabled should be increased.

    test commands:

    %nft add table ip filter
    %nft add chain ip filter input { type filter hook input priority 4\; }
    %iptables-compat -Z input
    %nft delete chain ip filter input

    we can see below messages.

    [ 286.443720] jump label: negative count!
    [ 286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0
    [ 286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
    [ 286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G W 4.17.0-rc2+ #12
    [ 286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
    [ 286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286
    [ 286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522
    [ 286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac
    [ 286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b
    [ 286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000
    [ 286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000
    [ 286.449144] FS: 00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
    [ 286.449144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0
    [ 286.449144] Call Trace:
    [ 286.449144] static_key_slow_dec+0x6a/0x70
    [ 286.449144] nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
    [ 286.449144] nf_tables_commit+0x1891/0x1c50 [nf_tables]
    [ 286.449144] nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
    [ ... ]

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit ad9d9e85072b668731f356be0a3750a3ba22a607 upstream.

    This patch fixes the following splat.

    [118709.054937] BUG: using smp_processor_id() in preemptible [00000000] code: test/1571
    [118709.054970] caller is nft_update_chain_stats.isra.4+0x53/0x97 [nf_tables]
    [118709.054980] CPU: 2 PID: 1571 Comm: test Not tainted 4.17.0-rc6+ #335
    [...]
    [118709.054992] Call Trace:
    [118709.055011] dump_stack+0x5f/0x86
    [118709.055026] check_preemption_disabled+0xd4/0xe4

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 97a0549b15a0b466c47f6a0143a490a082c64b4e upstream.

    In the nft_meta_set_eval, nftrace value is dereferenced as u32 from sreg.
    But correct type is u8. so that sometimes incorrect value is dereferenced.

    Steps to reproduce:

    %nft add table ip filter
    %nft add chain ip filter input { type filter hook input priority 4\; }
    %nft add rule ip filter input nftrace set 0
    %nft monitor

    Sometimes, we can see trace messages.

    trace id 16767227 ip filter input packet: iif "enp2s0"
    ether saddr xx:xx:xx:xx:xx:xx ether daddr xx:xx:xx:xx:xx:xx
    ip saddr 192.168.0.1 ip daddr 255.255.255.255 ip dscp cs0
    ip ecn not-ect ip
    trace id 16767227 ip filter input rule nftrace set 0 (verdict continue)
    trace id 16767227 ip filter input verdict continue
    trace id 16767227 ip filter input

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit bb7b40aecbf778c0c83a5bd62b0f03ca9f49a618 upstream.

    When removing a rule that jumps to chain and such chain in the same
    batch, this bogusly hits EBUSY. Add activate and deactivate operations
    to expression that can be called from the preparation and the
    commit/abort phases.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     
  • commit 009240940e84c1c089af88b454f7e804a4c5bd1b upstream.

    nft_chain_stats_replace() and all other spots assume ->stats can be
    NULL, but nft_update_chain_stats does not. It must do this check,
    just because the jump label is set doesn't mean all basechains have stats
    assigned.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 732a8049f365f514d0607e03938491bf6cb0d620 upstream.

    currently matchinfo gets stored in the expression, but some xt matches
    are very large.

    To handle those we either need to switch nft core to kvmalloc and increase
    size limit, or allocate the info blob of large matches separately.

    This does the latter, this limits the scope of the changes to
    nft_compat.

    I picked a threshold of 192, this allows most matches to work as before and
    handle only few ones via separate alloation (cgroup, u32, sctp, rt).

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 8bdf164744b2c7f63561846c01cff3db597f282d upstream.

    Next patch will make it possible for *info to be stored in
    a separate allocation instead of the expr private area.

    This removes the 'expr priv area is info blob' assumption
    from the match init/destroy/eval functions.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit b8e9dc1c75714ceb53615743e1036f76e00f5a17 upstream.

    Taehee Yoo reported following bug:
    iptables-compat -I OUTPUT -m cpu --cpu 0
    iptables-compat -F
    lsmod |grep xt_cpu
    xt_cpu 16384 1

    Quote:
    "When above command is given, a netlink message has two expressions that
    are the cpu compat and the nft_counter.
    The nft_expr_type_get() in the nf_tables_expr_parse() successes
    first expression then, calls select_ops callback.
    (allocates memory and holds module)
    But, second nft_expr_type_get() in the nf_tables_expr_parse()
    returns -EAGAIN because of request_module().
    In that point, by the 'goto err1',
    the 'module_put(info[i].ops->type->owner)' is called.
    There is no release routine."

    The core problem is that unlike all other expression,
    nft_compat select_ops has side effects.

    1. it allocates dynamic memory which holds an nft ops struct.
    In all other expressions, ops has static storage duration.
    2. It grabs references to the xt module that it is supposed to
    invoke.

    Depending on where things go wrong, error unwinding doesn't
    always do the right thing.

    In the above scenario, a new nft_compat_expr is created and
    xt_cpu module gets loaded with a refcount of 1.

    Due to to -EAGAIN, the netlink messages get re-parsed.
    When that happens, nft_compat finds that xt_cpu is already present
    and increments module refcount again.

    This fixes the problem by making select_ops to have no visible
    side effects and removes all extra module_get/put.

    When select_ops creates a new nft_compat expression, the new
    expression has a refcount of 0, and the xt module gets its refcount
    incremented.

    When error happens, the next call finds existing entry, but will no
    longer increase the reference count -- the presence of existing
    nft_xt means we already hold a module reference.

    Because nft_xt_put is only called from nft_compat destroy hook,
    it will never see the initial zero reference count.
    ->destroy can only be called after ->init(), and that will increase the
    refcount.

    Lastly, we now free nft_xt struct with kfree_rcu.
    Else, we get use-after free in nf_tables_rule_destroy:

    while (expr != nft_expr_last(rule) && expr->ops) {
    nf_tables_expr_destroy(ctx, expr);
    expr = nft_expr_next(expr); // here

    nft_expr_next() dereferences expr->ops. This is safe
    for all users, as ops have static storage duration.
    In nft_compat case however, its ->destroy callback can
    free the memory that hold the ops structure.

    Tested-by: Taehee Yoo
    Reported-by: Taehee Yoo
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     

21 Jun, 2018

1 commit

  • [ Upstream commit d71efb599ad42ef1e564c652d8084252bdc85edf ]

    When chain name is changed, nft_chain_commit_update is called.
    In the nft_chain_commit_update, trans->ctx.chain->name has old chain name
    and nft_trans_chain_name(trans) has new chain name.
    If new chain name is longer than old chain name, KASAN warns
    slab-out-of-bounds.

    [ 175.015012] BUG: KASAN: slab-out-of-bounds in strcpy+0x9e/0xb0
    [ 175.022735] Write of size 1 at addr ffff880114e022da by task iptables-compat/1458

    [ 175.031353] CPU: 0 PID: 1458 Comm: iptables-compat Not tainted 4.16.0-rc7+ #146
    [ 175.031353] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 175.031353] Call Trace:
    [ 175.031353] dump_stack+0x68/0xa0
    [ 175.031353] print_address_description+0xd0/0x260
    [ 175.031353] ? strcpy+0x9e/0xb0
    [ 175.031353] kasan_report+0x234/0x350
    [ 175.031353] __asan_report_store1_noabort+0x1c/0x20
    [ 175.031353] strcpy+0x9e/0xb0
    [ 175.031353] nf_tables_commit+0x1ccc/0x2990
    [ 175.031353] nfnetlink_rcv+0x141e/0x16c0
    [ 175.031353] ? nfnetlink_net_init+0x150/0x150
    [ 175.031353] ? lock_acquire+0x370/0x370
    [ 175.031353] ? lock_acquire+0x370/0x370
    [ 175.031353] netlink_unicast+0x444/0x640
    [ 175.031353] ? netlink_attachskb+0x700/0x700
    [ 175.031353] ? _copy_from_iter_full+0x180/0x740
    [ 175.031353] ? kasan_check_write+0x14/0x20
    [ 175.031353] ? _copy_from_user+0x9b/0xd0
    [ 175.031353] netlink_sendmsg+0x845/0xc70
    [ ... ]

    Steps to reproduce:
    iptables-compat -N 1
    iptables-compat -E 1 aaaaaaaaa

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

16 Jun, 2018

1 commit

  • commit b71534583f22d08c3e3563bf5100aeb5f5c9fbe5 upstream.

    In the nft_ct_helper_obj_dump(), always priv->helper4 is dereferenced.
    But if family is ipv6, priv->helper6 should be dereferenced.

    Steps to reproduces:

    #test.nft
    table ip6 filter {
    ct helper ftp {
    type "ftp" protocol tcp
    }
    chain input {
    type filter hook input priority 4;
    ct helper set "ftp"
    }
    }

    %nft -f test.nft
    %nft list ruleset

    we can see the below messages:

    [ 916.286233] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 916.294777] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 916.302613] Modules linked in: nft_objref nf_conntrack_sip nf_conntrack_snmp nf_conntrack_broadcast nf_conntrack_ftp nft_ct nf_conntrack nf_tables nfnetlink [last unloaded: nfnetlink]
    [ 916.318758] CPU: 1 PID: 2093 Comm: nft Not tainted 4.17.0-rc4+ #181
    [ 916.326772] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 916.338773] RIP: 0010:strlen+0x1a/0x90
    [ 916.342781] RSP: 0018:ffff88010ff0f2f8 EFLAGS: 00010292
    [ 916.346773] RAX: dffffc0000000000 RBX: ffff880119b26ee8 RCX: ffff88010c150038
    [ 916.354777] RDX: 0000000000000002 RSI: ffff880119b26ee8 RDI: 0000000000000010
    [ 916.362773] RBP: 0000000000000010 R08: 0000000000007e88 R09: ffff88010c15003c
    [ 916.370773] R10: ffff88010c150037 R11: ffffed002182a007 R12: ffff88010ff04040
    [ 916.378779] R13: 0000000000000010 R14: ffff880119b26f30 R15: ffff88010ff04110
    [ 916.387265] FS: 00007f57a1997700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
    [ 916.394785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 916.402778] CR2: 00007f57a0ac80f0 CR3: 000000010ff02000 CR4: 00000000001006e0
    [ 916.410772] Call Trace:
    [ 916.414787] nft_ct_helper_obj_dump+0x94/0x200 [nft_ct]
    [ 916.418779] ? nft_ct_set_eval+0x560/0x560 [nft_ct]
    [ 916.426771] ? memset+0x1f/0x40
    [ 916.426771] ? __nla_reserve+0x92/0xb0
    [ 916.434774] ? memcpy+0x34/0x50
    [ 916.434774] nf_tables_fill_obj_info+0x484/0x860 [nf_tables]
    [ 916.442773] ? __nft_release_basechain+0x600/0x600 [nf_tables]
    [ 916.450779] ? lock_acquire+0x193/0x380
    [ 916.454771] ? lock_acquire+0x193/0x380
    [ 916.458789] ? nf_tables_dump_obj+0x148/0xcb0 [nf_tables]
    [ 916.462777] nf_tables_dump_obj+0x5f0/0xcb0 [nf_tables]
    [ 916.470769] ? __alloc_skb+0x30b/0x500
    [ 916.474779] netlink_dump+0x752/0xb50
    [ 916.478775] __netlink_dump_start+0x4d3/0x750
    [ 916.482784] nf_tables_getobj+0x27a/0x930 [nf_tables]
    [ 916.490774] ? nft_obj_notify+0x100/0x100 [nf_tables]
    [ 916.494772] ? nf_tables_getobj+0x930/0x930 [nf_tables]
    [ 916.502579] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
    [ 916.506774] ? nft_obj_notify+0x100/0x100 [nf_tables]
    [ 916.514808] nfnetlink_rcv_msg+0x8ab/0xa86 [nfnetlink]
    [ 916.518771] ? nfnetlink_rcv_msg+0x550/0xa86 [nfnetlink]
    [ 916.526782] netlink_rcv_skb+0x23e/0x360
    [ 916.530773] ? nfnetlink_bind+0x200/0x200 [nfnetlink]
    [ 916.534778] ? debug_check_no_locks_freed+0x280/0x280
    [ 916.542770] ? netlink_ack+0x870/0x870
    [ 916.546786] ? ns_capable_common+0xf4/0x130
    [ 916.550765] nfnetlink_rcv+0x172/0x16c0 [nfnetlink]
    [ 916.554771] ? sched_clock_local+0xe2/0x150
    [ 916.558774] ? sched_clock_cpu+0x144/0x180
    [ 916.566575] ? lock_acquire+0x380/0x380
    [ 916.570775] ? sched_clock_local+0xe2/0x150
    [ 916.574765] ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
    [ 916.578763] ? sched_clock_cpu+0x144/0x180
    [ 916.582770] ? lock_acquire+0x193/0x380
    [ 916.590771] ? lock_acquire+0x193/0x380
    [ 916.594766] ? lock_acquire+0x380/0x380
    [ 916.598760] ? netlink_deliver_tap+0x262/0xa60
    [ 916.602766] ? lock_acquire+0x193/0x380
    [ 916.606766] netlink_unicast+0x3ef/0x5a0
    [ 916.610771] ? netlink_attachskb+0x630/0x630
    [ 916.614763] netlink_sendmsg+0x72a/0xb00
    [ 916.618769] ? netlink_unicast+0x5a0/0x5a0
    [ 916.626766] ? _copy_from_user+0x92/0xc0
    [ 916.630773] __sys_sendto+0x202/0x300
    [ 916.634772] ? __ia32_sys_getpeername+0xb0/0xb0
    [ 916.638759] ? lock_acquire+0x380/0x380
    [ 916.642769] ? lock_acquire+0x193/0x380
    [ 916.646761] ? finish_task_switch+0xf4/0x560
    [ 916.650763] ? __schedule+0x582/0x19a0
    [ 916.655301] ? __sched_text_start+0x8/0x8
    [ 916.655301] ? up_read+0x1c/0x110
    [ 916.655301] ? __do_page_fault+0x48b/0xaa0
    [ 916.655301] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
    [ 916.655301] __x64_sys_sendto+0xdd/0x1b0
    [ 916.655301] do_syscall_64+0x96/0x3d0
    [ 916.655301] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 916.655301] RIP: 0033:0x7f57a0ff5e03
    [ 916.655301] RSP: 002b:00007fff6367e0a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    [ 916.655301] RAX: ffffffffffffffda RBX: 00007fff6367f1e0 RCX: 00007f57a0ff5e03
    [ 916.655301] RDX: 0000000000000020 RSI: 00007fff6367e110 RDI: 0000000000000003
    [ 916.655301] RBP: 00007fff6367e100 R08: 00007f57a0ce9160 R09: 000000000000000c
    [ 916.655301] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff6367e110
    [ 916.655301] R13: 0000000000000020 R14: 00007f57a153c610 R15: 0000562417258de0
    [ 916.655301] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 fa 53 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df 48 89 fd 48 83 ec 08 b6 04 02 48 89 fa 83 e2 07 38 d0 7f
    [ 916.655301] RIP: strlen+0x1a/0x90 RSP: ffff88010ff0f2f8
    [ 916.771929] ---[ end trace 1065e048e72479fe ]---
    [ 916.777204] Kernel panic - not syncing: Fatal exception
    [ 916.778158] Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

    Signed-off-by: Taehee Yoo
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

30 May, 2018

1 commit

  • [ Upstream commit 8a949fff0302b50063f74bb345a66190015528d0 ]

    The IPS_NAT_MASK check in 4.12 replaced previous check for nfct_nat()
    which was needed to fix a crash in 2.6.36-rc, see
    commit 7bcbf81a2296 ("ipvs: avoid oops for passive FTP").
    But as IPVS does not set the IPS_SRC_NAT and IPS_DST_NAT bits,
    checking for IPS_NAT_MASK prevents PASV response to be properly
    mangled and blocks the transfer. Remove the check as it is not
    needed after 3.12 commit 41d73ec053d2 ("netfilter: nf_conntrack:
    make sequence number adjustments usuable without NAT") which
    changes nfct_nat() with nfct_seqadj() and especially after 3.13
    commit b25adce16064 ("ipvs: correct usage/allocation of seqadj
    ext in ipvs").

    Thanks to Li Shuang and Florian Westphal for reporting the problem!

    Reported-by: Li Shuang
    Fixes: be7be6e161a2 ("netfilter: ipvs: fix incorrect conflict resolution")
    Signed-off-by: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Julian Anastasov
     

23 May, 2018

2 commits

  • commit 569ccae68b38654f04b6842b034aa33857f605fe upstream.

    rules in nftables a free'd using kfree, but protected by rcu, i.e. we
    must wait for a grace period to elapse.

    Normal removal patch does this, but nf_tables_newrule() doesn't obey
    this rule during error handling.

    It calls nft_trans_rule_add() *after* linking rule, and, if that
    fails to allocate memory, it unlinks the rule and then kfree() it --
    this is unsafe.

    Switch order -- first add rule to transaction list, THEN link it
    to public list.

    Note: nft_trans_rule_add() uses GFP_KERNEL; it will not fail so this
    is not a problem in practice (spotted only during code review).

    Fixes: 0628b123c96d12 ("netfilter: nfnetlink: add batch support and use it from nf_tables")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 2f6adf481527c8ab8033c601f55bfb5b3712b2ac upstream.

    set->name must be free'd here in case ops->init fails.

    Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     

16 May, 2018

1 commit

  • commit 5c64576a77894a50be80be0024bed27171b55989 upstream.

    syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2]

    We have 2 problems in start_sync_thread if error path is
    taken, eg. on memory allocation error or failure to configure
    sockets for mcast group or addr/port binding:

    1. recursive locking: holding rtnl_lock while calling sock_release
    which in turn calls again rtnl_lock in ip_mc_drop_socket to leave
    the mcast group, as noticed by Florian Westphal. Additionally,
    sock_release can not be called while holding sync_mutex (ABBA
    deadlock).

    2. task hung: holding rtnl_lock while calling kthread_stop to
    stop the running kthreads. As the kthreads do the same to leave
    the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock)
    they hang.

    Fix the problems by calling rtnl_unlock early in the error path,
    now sock_release is called after unlocking both mutexes.

    Problem 3 (task hung reported by syzkaller [2]) is variant of
    problem 2: use _trylock to prevent one user to call rtnl_lock and
    then while waiting for sync_mutex to block kthreads that execute
    sock_release when they are stopped by stop_sync_thread.

    [1]
    IPVS: stopping backup sync thread 4500 ...
    WARNING: possible recursive locking detected
    4.16.0-rc7+ #3 Not tainted
    --------------------------------------------
    syzkaller688027/4497 is trying to acquire lock:
    (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    but task is already holding lock:
    IPVS: stopping backup sync thread 4495 ...
    (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(rtnl_mutex);
    lock(rtnl_mutex);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    2 locks held by syzkaller688027/4497:
    #0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74
    #1: (ipvs->sync_mutex){+.+.}, at: []
    do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388

    stack backtrace:
    CPU: 1 PID: 4497 Comm: syzkaller688027 Not tainted 4.16.0-rc7+ #3
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x24d lib/dump_stack.c:53
    print_deadlock_bug kernel/locking/lockdep.c:1761 [inline]
    check_deadlock kernel/locking/lockdep.c:1805 [inline]
    validate_chain kernel/locking/lockdep.c:2401 [inline]
    __lock_acquire+0xe8f/0x3e00 kernel/locking/lockdep.c:3431
    lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
    __mutex_lock_common kernel/locking/mutex.c:756 [inline]
    __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
    mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
    rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
    ip_mc_drop_socket+0x88/0x230 net/ipv4/igmp.c:2643
    inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:413
    sock_release+0x8d/0x1e0 net/socket.c:595
    start_sync_thread+0x2213/0x2b70 net/netfilter/ipvs/ip_vs_sync.c:1924
    do_ip_vs_set_ctl+0x1139/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2389
    nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
    nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
    ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1261
    udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2406
    sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
    SYSC_setsockopt net/socket.c:1849 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1828
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    RIP: 0033:0x446a69
    RSP: 002b:00007fa1c3a64da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000446a69
    RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 00000000006e29fc R08: 0000000000000018 R09: 0000000000000000
    R10: 00000000200000c0 R11: 0000000000000246 R12: 00000000006e29f8
    R13: 00676e697279656b R14: 00007fa1c3a659c0 R15: 00000000006e2b60

    [2]
    IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4,
    id = 0
    IPVS: stopping backup sync thread 25415 ...
    INFO: task syz-executor7:25421 blocked for more than 120 seconds.
    Not tainted 4.16.0-rc6+ #284
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor7 D23688 25421 4408 0x00000004
    Call Trace:
    context_switch kernel/sched/core.c:2862 [inline]
    __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
    schedule+0xf5/0x430 kernel/sched/core.c:3499
    schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
    do_wait_for_common kernel/sched/completion.c:86 [inline]
    __wait_for_common kernel/sched/completion.c:107 [inline]
    wait_for_common kernel/sched/completion.c:118 [inline]
    wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
    kthread_stop+0x14a/0x7a0 kernel/kthread.c:530
    stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996
    do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394
    nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
    nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
    ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253
    sctp_setsockopt+0x2ca/0x63e0 net/sctp/socket.c:4154
    sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039
    SYSC_setsockopt net/socket.c:1850 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1829
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    RIP: 0033:0x454889
    RSP: 002b:00007fc927626c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00007fc9276276d4 RCX: 0000000000454889
    RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000017
    RBP: 000000000072bf58 R08: 0000000000000018 R09: 0000000000000000
    R10: 0000000020000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 000000000000051c R14: 00000000006f9b40 R15: 0000000000000001

    Showing all locks held in the system:
    2 locks held by khungtaskd/868:
    #0: (rcu_read_lock){....}, at: []
    check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
    #0: (rcu_read_lock){....}, at: [] watchdog+0x1c5/0xd60
    kernel/hung_task.c:249
    #1: (tasklist_lock){.+.+}, at: []
    debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
    1 lock held by rsyslogd/4247:
    #0: (&f->f_pos_lock){+.+.}, at: []
    __fdget_pos+0x12b/0x190 fs/file.c:765
    2 locks held by getty/4338:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    2 locks held by getty/4339:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    2 locks held by getty/4340:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    2 locks held by getty/4341:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    2 locks held by getty/4342:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    2 locks held by getty/4343:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    2 locks held by getty/4344:
    #0: (&tty->ldisc_sem){++++}, at: []
    ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
    #1: (&ldata->atomic_read_lock){+.+.}, at: []
    n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
    3 locks held by kworker/0:5/6494:
    #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
    [] work_static include/linux/workqueue.h:198 [inline]
    #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
    [] set_work_data kernel/workqueue.c:619 [inline]
    #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
    [] set_work_pool_and_clear_pending kernel/workqueue.c:646
    [inline]
    #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
    [] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
    #1: ((addr_chk_work).work){+.+.}, at: []
    process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
    #2: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74
    1 lock held by syz-executor7/25421:
    #0: (ipvs->sync_mutex){+.+.}, at: []
    do_ip_vs_set_ctl+0x277/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2393
    2 locks held by syz-executor7/25427:
    #0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74
    #1: (ipvs->sync_mutex){+.+.}, at: []
    do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388
    1 lock held by syz-executor7/25435:
    #0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74
    1 lock held by ipvs-b:2:0/25415:
    #0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    Reported-and-tested-by: syzbot+a46d6abf9d56b1365a72@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+5fe074c01b2032ce9618@syzkaller.appspotmail.com
    Fixes: e0b26cc997d5 ("ipvs: call rtnl_lock early")
    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Cc: Zubin Mithra
    Cc: Guenter Roeck
    Signed-off-by: Greg Kroah-Hartman

    Julian Anastasov
     

26 Apr, 2018

4 commits