24 Nov, 2016

1 commit

  • As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset:
    fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
    while set->timeout was stored in milliseconds. This is inconsistent and
    incorrect.

    Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
    priv->timeout will be converted to jiffies twice.

    Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
    set->timeout will be used, but we forget to call msecs_to_jiffies
    when do update elements.

    Fix this by using jiffies internally for traditional sets and doing the
    conversions to/from msec when interacting with userspace - as dynset
    already does.

    This is preferable to doing the conversions, when elements are inserted or
    updated, because this can happen very frequently on busy dynsets.

    Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
    Reported-by: Liping Zhang
    Signed-off-by: Anders K. Pedersen
    Acked-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Anders K. Pedersen
     

09 Nov, 2016

1 commit

  • Dalegaard says:
    The following ruleset, when loaded with 'nft -f bad.txt'
    ----snip----
    flush ruleset
    table ip inlinenat {
    map sourcemap {
    type ipv4_addr : verdict;
    }

    chain postrouting {
    ip saddr vmap @sourcemap accept
    }
    }
    add chain inlinenat test
    add element inlinenat sourcemap { 100.123.10.2 : jump test }
    ----snip----

    results in a kernel oops:
    BUG: unable to handle kernel paging request at 0000000000001344
    IP: [] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
    [...]
    Call Trace:
    [] ? nft_data_init+0x13e/0x1a0 [nf_tables]
    [] nft_validate_register_store+0x60/0xb0 [nf_tables]
    [] nft_add_set_elem+0x545/0x5e0 [nf_tables]
    [] ? nft_table_lookup+0x30/0x60 [nf_tables]
    [] ? nla_strcmp+0x40/0x50
    [] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
    [] ? nla_validate+0x60/0x80
    [] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

    Because we forget to fill the net pointer in bind_ctx, so dereferencing
    it may cause kernel crash.

    Reported-by: Dalegaard
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

31 Oct, 2016

1 commit


28 Oct, 2016

2 commits

  • Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
    u32 netlink attributes") introduced nft_parse_u32_check with a return
    value of "unsigned int", yet on error it returns "-ERANGE".

    This patch corrects the mismatch by changing the return value to "int",
    which happens to match the actual users of nft_parse_u32_check already.

    Found by Coverity, CID 1373930.

    Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
    handling in nft_exthdr_init()) attempted to address the issue, but
    did not address the return type of nft_parse_u32_check.

    Signed-off-by: John W. Linville
    Cc: Laura Garcia Liebana
    Cc: Pablo Neira Ayuso
    Cc: Dan Carpenter
    Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
    Signed-off-by: Pablo Neira Ayuso

    John W. Linville
     
  • When nft_expr_clone failed, a series of problems will happen:

    1. module refcnt will leak, we call __module_get at the beginning but
    we forget to put it back if ops->clone returns fail
    2. memory will be leaked, if clone fail, we just return NULL and forget
    to free the alloced element
    3. set->nelems will become incorrect when set->size is specified. If
    clone fail, we should decrease the set->nelems

    Now this patch fixes these problems. And fortunately, clone fail will
    only happen on counter expression when memory is exhausted.

    Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

17 Oct, 2016

1 commit


23 Sep, 2016

1 commit

  • Fetch value and validate u32 netlink attribute. This validation is
    usually required when the u32 netlink attributes are being stored in a
    field whose size is smaller.

    This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
    on u8 nft_exthdr attributes").

    Fixes: 96518518cc41 ("netfilter: add nftables")
    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     

26 Aug, 2016

1 commit

  • If the NLM_F_EXCL flag is set, then new elements that clash with an
    existing one return EEXIST. In case you try to add an element whose
    data area differs from what we have, then this returns EBUSY. If no
    flag is specified at all, then this returns success to userspace.

    This patch also update the set insert operation so we can fetch the
    existing element that clashes with the one you want to add, we need
    this to make sure the element data doesn't differ.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

23 Aug, 2016

2 commits

  • Currently, if you add a base chain whose name clashes with an existing
    non-base chain, nf_tables doesn't complain about this. Similarly, if you
    update the chain type, the hook number and priority.

    With this patch, nf_tables bails out in case any of this unsupported
    operations occur by returning EBUSY.

    # nft add table x
    # nft add chain x y
    # nft add chain x y { type nat hook input priority 0\; }
    :1:1-49: Error: Could not process rule: Device or resource busy
    add chain x y { type nat hook input priority 0; }
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Introduce a new function to wrap the code that parses the chain hook
    configuration so we can reuse this code to validate chain updates.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

25 Jul, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next,
    they are:

    1) Count pre-established connections as active in "least connection"
    schedulers such that pre-established connections to avoid overloading
    backend servers on peak demands, from Michal Kubecek via Simon Horman.

    2) Address a race condition when resizing the conntrack table by caching
    the bucket size when fulling iterating over the hashtable in these
    three possible scenarios: 1) dump via /proc/net/nf_conntrack,
    2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
    From Liping Zhang.

    3) Revisit early_drop() path to perform lockless traversal on conntrack
    eviction under stress, use del_timer() as synchronization point to
    avoid two CPUs evicting the same entry, from Florian Westphal.

    4) Move NAT hlist_head to nf_conn object, this simplifies the existing
    NAT extension and it doesn't increase size since recent patches to
    align nf_conn, from Florian.

    5) Use rhashtable for the by-source NAT hashtable, also from Florian.

    6) Don't allow --physdev-is-out from OUTPUT chain, just like
    --physdev-out is not either, from Hangbin Liu.

    7) Automagically set on nf_conntrack counters if the user tries to
    match ct bytes/packets from nftables, from Liping Zhang.

    8) Remove possible_net_t fields in nf_tables set objects since we just
    simply pass the net pointer to the backend set type implementations.

    9) Fix possible off-by-one in h323, from Toby DiPasquale.

    10) early_drop() may be called from ctnetlink patch, so we must hold
    rcu read size lock from them too, this amends Florian's patch #3
    coming in this batch, from Liping Zhang.

    11) Use binary search to validate jump offset in x_tables, this
    addresses the O(n!) validation that was introduced recently
    resolve security issues with unpriviledge namespaces, from Florian.

    12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.

    13) Three updates for nft_log: Fix log prefix leak in error path. Bail
    out on loglevel larger than debug in nft_log and set on the new
    NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.

    14) Allow to filter rule dumps in nf_tables based on table and chain
    names.

    15) Simplify connlabel to always use 128 bits to store labels and
    get rid of unused function in xt_connlabel, from Florian.

    16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
    helper, by Gao Feng.

    17) Put back x_tables module reference in nft_compat on error, from
    Liping Zhang.

    18) Add a reference count to the x_tables extensions cache in
    nft_compat, so we can remove them when unused and avoid a crash
    if the extensions are rmmod, again from Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jul, 2016

1 commit


21 Jul, 2016

1 commit


11 Jul, 2016

1 commit


07 Jul, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next,
    they are:

    1) Don't use userspace datatypes in bridge netfilter code, from
    Tobin Harding.

    2) Iterate only once over the expectation table when removing the
    helper module, instead of once per-netns, from Florian Westphal.

    3) Extra sanitization in xt_hook_ops_alloc() to return error in case
    we ever pass zero hooks, xt_hook_ops_alloc():

    4) Handle NFPROTO_INET from the logging core infrastructure, from
    Liping Zhang.

    5) Autoload loggers when TRACE target is used from rules, this doesn't
    change the behaviour in case the user already selected nfnetlink_log
    as preferred way to print tracing logs, also from Liping Zhang.

    6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
    by cache lines, increases the size of entries in 11% per entry.
    From Florian Westphal.

    7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

    8) Remove useless defensive check in nf_logger_find_get() from Shivani
    Bhardwaj.

    9) Remove zone extension as place it in the conntrack object, this is
    always include in the hashing and we expect more intensive use of
    zones since containers are in place. Also from Florian Westphal.

    10) Owner match now works from any namespace, from Eric Bierdeman.

    11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

    12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

    13) Introduce generic macros for nf_tables object generation masks.

    14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

    15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

    16) Support for deletion of just added elements in the hash set type.

    17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

    18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

    19) Support for matching inverted set lookups, from Arturo Borrero.

    20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

    21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

    22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

    23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

    24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2016

5 commits

  • This flag was introduced to restore rulesets from the new netdev
    family, but since 5ebe0b0eec9d6f7 ("netfilter: nf_tables: destroy
    basechain and rules on netdevice removal") the ruleset is released
    once the netdev is gone.

    This also removes nft_register_basechain() and
    nft_unregister_basechain() since they have no clients anymore after
    this rework.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Similar to ("netfilter: nf_tables: add generation mask to tables").

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Similar to ("netfilter: nf_tables: add generation mask to tables").

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch addresses two problems:

    1) The netlink dump is inconsistent when interfering with an ongoing
    transaction update for several reasons:

    1.a) We don't honor the internal NFT_TABLE_INACTIVE flag, and we should
    be skipping these inactive objects in the dump.

    1.b) We perform speculative deletion during the preparation phase, that
    may result in skipping active objects.

    1.c) The listing order changes, which generates noise when tracking
    incremental ruleset update via tools like git or our own
    testsuite.

    2) We don't allow to add and to update the object in the same batch,
    eg. add table x; add table x { flags dormant\; }.

    In order to resolve these problems:

    1) If the user requests a deletion, the object becomes inactive in the
    next generation. Then, ignore objects that scheduled to be deleted
    from the lookup path, as they will be effectively removed in the
    next generation.

    2) From the get/dump path, if the object is not currently active, we
    skip it.

    3) Support 'add X -> update X' sequence from a transaction.

    After this update, we obtain a consistent list as long as we stay
    in the same generation. The userspace side can detect interferences
    through the generation counter so it can restart the dumping.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Thus, we can reuse these to check the genmask of any object type, not
    only rules. This is required now that tables, chain and sets will get a
    generation mask field too in follow up patches.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

23 Jun, 2016

1 commit


15 Jun, 2016

3 commits

  • When we add a nft rule like follows:
    # nft add rule filter test tcp dport vmap {1: jump test}
    -ELOOP error will be returned, and the anonymous set will be
    destroyed.

    But after that, nf_tables_abort will also try to remove the
    element and destroy the set, which was already destroyed and
    freed.

    If we add a nft wrong rule, nft_tables_abort will do the cleanup
    work rightly, so nf_tables_set_destroy call here is redundant and
    wrong, remove it.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Liping Zhang says:

    "Users may add such a wrong nft rules successfully, which will cause an
    endless jump loop:

    # nft add rule filter test tcp dport vmap {1: jump test}

    This is because before we commit, the element in the current anonymous
    set is inactive, so osp->walk will skip this element and miss the
    validate check."

    To resolve this problem, this patch passes the generation mask to the
    walk function through the iter container structure depending on the code
    path:

    1) If we're dumping the elements, then we have to check if the element
    is active in the current generation. Thus, we check for the current
    bit in the genmask.

    2) If we're checking for loops, then we have to check if the element is
    active in the next generation, as we're in the middle of a
    transaction. Thus, we check for the next bit in the genmask.

    Based on original patch from Liping Zhang.

    Reported-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso
    Tested-by: Liping Zhang

    Pablo Neira Ayuso
     
  • We should check "i" is used as a dictionary or not, "binding" is already
    checked before.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

02 Jun, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Fix incorrect timestamp in nfnetlink_queue introduced when addressing
    y2038 safe timestamp, from Florian Westphal.

    2) Get rid of leftover conntrack definition from the previous merge
    window, oneliner from Florian.

    3) Make nf_queue handler pernet to resolve race on dereferencing the
    hook state structure with netns removal, from Eric Biederman.

    4) Ensure clean exit on unregistered helper ports, from Taehee Yoo.

    5) Restore FLOWI_FLAG_KNOWN_NH in nf_dup_ipv6. This got lost while
    generalizing xt_TEE to add packet duplication support in nf_tables,
    from Paolo Abeni.

    6) Insufficient netlink NFTA_SET_TABLE attribute check in
    nf_tables_getset(), from Phil Turnbull.

    7) Reject helper registration on duplicated ports via modparams.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 May, 2016

1 commit


05 May, 2016

1 commit


25 Apr, 2016

2 commits


24 Apr, 2016

1 commit


08 Jan, 2016

1 commit


29 Dec, 2015

5 commits


19 Dec, 2015

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains the first batch of Netfilter updates for
    the upcoming 4.5 kernel. This batch contains userspace netfilter header
    compilation fixes, support for packet mangling in nf_tables, the new
    tracing infrastructure for nf_tables and cgroup2 support for iptables.
    More specifically, they are:

    1) Two patches to include dependencies in our netfilter userspace
    headers to resolve compilation problems, from Mikko Rapeli.

    2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

    3) Remove duplicate include in the netfilter reject infrastructure,
    from Stephen Hemminger.

    4) Two patches to simplify the netfilter defragmentation code for IPv6,
    patch from Florian Westphal.

    5) Fix root ownership of /proc/net netfilter for unpriviledged net
    namespaces, from Philip Whineray.

    6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

    7) Add mangling support to our nf_tables payload expression, from
    Patrick McHardy.

    8) Introduce a new netlink-based tracing infrastructure for nf_tables,
    from Florian Westphal.

    9) Change setter functions in nfnetlink_log to be void, from
    Rami Rosen.

    10) Add netns support to the cttimeout infrastructure.

    11) Add cgroup2 support to iptables, from Tejun Heo.

    12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

    13) Add support for mangling pkttype in the nf_tables meta expression,
    also from Florian.

    BTW, I need that you pull net into net-next, I have another batch that
    requires changes that I don't yet see in net.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

14 Dec, 2015

1 commit

  • When we use 'nft -f' to submit rules, it will build multiple rules into
    one netlink skb to send to kernel, kernel will process them one by one.
    meanwhile, it add the trans into commit_list to record every commit.
    if one of them's return value is -EAGAIN, status |= NFNL_BATCH_REPLAY
    will be marked. after all the process is done. it will roll back all the
    commits.

    now kernel use list_add_tail to add trans to commit, and use
    list_for_each_entry_safe to roll back. which means the order of adding
    and rollback is the same. that will cause some cases cannot work well,
    even trigger call trace, like:

    1. add a set into table foo [return -EAGAIN]:
    commit_list = 'add set trans'
    2. del foo:
    commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans'
    then nf_tables_abort will be called to roll back:
    firstly process 'add set trans':
    case NFT_MSG_NEWSET:
    trans->ctx.table->use--;
    list_del_rcu(&nft_trans_set(trans)->list);

    it will del the set from the table foo, but it has removed when del
    table foo [step 2], then the kernel will panic.

    the right order of rollback should be:
    'del tab trans' -> 'del set trans' -> 'add set trans'.
    which is opposite with commit_list order.

    so fix it by rolling back commits with reverse order in nf_tables_abort.

    Signed-off-by: Xin Long
    Signed-off-by: Pablo Neira Ayuso

    Xin Long
     

10 Dec, 2015

1 commit


09 Dec, 2015

1 commit

  • nft monitor mode can then decode and display this trace data.

    Parts of LL/Network/Transport headers are provided as separate
    attributes.

    Otherwise, printing IP address data becomes virtually impossible
    for userspace since in the case of the netdev family we really don't
    want userspace to have to know all the possible link layer types
    and/or sizes just to display/print an ip address.

    We also don't want userspace to have to follow ipv6 header chains
    to get the s/dport info, the kernel already did this work for us.

    To avoid bloating nft_do_chain all data required for tracing is
    encapsulated in nft_traceinfo.

    The structure is initialized unconditionally(!) for each nft_do_chain
    invocation.

    This unconditionall call will be moved under a static key in a
    followup patch.

    With lots of help from Patrick McHardy and Pablo Neira.

    Signed-off-by: Florian Westphal
    Acked-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal