24 Nov, 2016

1 commit


17 Oct, 2016

1 commit


13 Sep, 2016

2 commits

  • The overflow validation in the init() function establishes that the
    maximum value that the hash could reach is less than U32_MAX, which is
    likely to be true.

    The fix detects the overflow when the maximum hash value is less than
    the offset itself.

    Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
    Reported-by: Liping Zhang
    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     
  • Add support to pass through an offset to the hash value. With this
    feature, the sysadmin is able to generate a hash with a given
    offset value.

    Example:

    meta mark set jhash ip saddr mod 2 seed 0xabcd offset 100

    This option generates marks according to the source address from 100 to
    101.

    Signed-off-by: Laura Garcia Liebana

    Laura Garcia Liebana
     

26 Aug, 2016

1 commit


22 Aug, 2016

1 commit


12 Aug, 2016

2 commits


11 Jul, 2016

1 commit


07 Jul, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next,
    they are:

    1) Don't use userspace datatypes in bridge netfilter code, from
    Tobin Harding.

    2) Iterate only once over the expectation table when removing the
    helper module, instead of once per-netns, from Florian Westphal.

    3) Extra sanitization in xt_hook_ops_alloc() to return error in case
    we ever pass zero hooks, xt_hook_ops_alloc():

    4) Handle NFPROTO_INET from the logging core infrastructure, from
    Liping Zhang.

    5) Autoload loggers when TRACE target is used from rules, this doesn't
    change the behaviour in case the user already selected nfnetlink_log
    as preferred way to print tracing logs, also from Liping Zhang.

    6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
    by cache lines, increases the size of entries in 11% per entry.
    From Florian Westphal.

    7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

    8) Remove useless defensive check in nf_logger_find_get() from Shivani
    Bhardwaj.

    9) Remove zone extension as place it in the conntrack object, this is
    always include in the hashing and we expect more intensive use of
    zones since containers are in place. Also from Florian Westphal.

    10) Owner match now works from any namespace, from Eric Bierdeman.

    11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

    12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

    13) Introduce generic macros for nf_tables object generation masks.

    14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

    15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

    16) Support for deletion of just added elements in the hash set type.

    17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

    18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

    19) Support for matching inverted set lookups, from Arturo Borrero.

    20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

    21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

    22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

    23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

    24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2016

1 commit

  • New elements are inactive in the preparation phase, and its
    NFT_SET_ELEM_BUSY_MASK flag is set on.

    This busy flag doesn't allow us to delete it from the same transaction,
    following a sequence like:

    begin transaction
    add element X
    delete element X
    end transaction

    This sequence is valid and may be triggered by robots. To resolve this
    problem, allow deactivating elements that are active in the current
    generation (ie. those that has been just added in this batch).

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

15 Jun, 2016

1 commit

  • Liping Zhang says:

    "Users may add such a wrong nft rules successfully, which will cause an
    endless jump loop:

    # nft add rule filter test tcp dport vmap {1: jump test}

    This is because before we commit, the element in the current anonymous
    set is inactive, so osp->walk will skip this element and miss the
    validate check."

    To resolve this problem, this patch passes the generation mask to the
    walk function through the iter container structure depending on the code
    path:

    1) If we're dumping the elements, then we have to check if the element
    is active in the current generation. Thus, we check for the current
    bit in the genmask.

    2) If we're checking for loops, then we have to check if the element is
    active in the next generation, as we're in the middle of a
    transaction. Thus, we check for the next bit in the genmask.

    Based on original patch from Liping Zhang.

    Reported-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso
    Tested-by: Liping Zhang

    Pablo Neira Ayuso
     

05 Apr, 2016

1 commit

  • In certain cases, the 802.11 mesh pathtable code wants to
    iterate over all of the entries in the forwarding table from
    the receive path, which is inside an RCU read-side critical
    section. Enable walks inside atomic sections by allowing
    GFP_ATOMIC allocations for the walker state.

    Change all existing callsites to pass in GFP_KERNEL.

    Acked-by: Thomas Graf
    Signed-off-by: Bob Copeland
    [also adjust gfs2/glock.c and rhashtable tests]
    Signed-off-by: Johannes Berg

    Bob Copeland
     

13 Apr, 2015

4 commits


08 Apr, 2015

2 commits

  • Add a new "dynset" expression for dynamic set updates.

    A new set op ->update() is added which, for non existant elements,
    invokes an initialization callback and inserts the new element.
    For both new or existing elements the extenstion pointer is returned
    to the caller to optionally perform timer updates or other actions.

    Element removal is not supported so far, however that seems to be a
    rather exotic need and can be added later on.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Use atomic operations for the element count to avoid races with async
    updates.

    To properly handle the transactional semantics during netlink updates,
    deleted but not yet committed elements are accounted for seperately and
    are treated as being already removed. This means for the duration of
    a netlink transaction, the limit might be exceeded by the amount of
    elements deleted. Set implementations must be prepared to handle this.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

01 Apr, 2015

1 commit


26 Mar, 2015

7 commits

  • Set elements are the last object type not supporting transaction support.
    Implement similar to the existing rule transactions:

    The global transaction counter keeps track of two generations, current
    and next. Each element contains a bitmask specifying in which generations
    it is inactive.

    New elements start out as inactive in the current generation and active
    in the next. On commit, the previous next generation becomes the current
    generation and the element becomes active. The bitmask is then cleared
    to indicate that the element is active in all future generations. If the
    transaction is aborted, the element is removed from the set before it
    becomes active.

    When removing an element, it gets marked as inactive in the next generation.
    On commit the next generation becomes active and the therefor the element
    inactive. It is then taken out of then set and released. On abort, the
    element is marked as active for the next generation again.

    Lookups ignore elements not active in the current generation.

    The current set types (hash/rbtree) both use a field in the extension area
    to store the generation mask. This (currently) does not require any
    additional memory since we have some free space in there.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Return the extension area from the ->lookup() function to allow to
    consolidate common actions.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • With the conversion to set extensions, it is now possible to consolidate
    the different set element destruction functions.

    The set implementations' ->remove() functions are changed to only take
    the element out of their internal data structures. Elements will be freed
    in a batched fashion after the global transaction's completion RCU grace
    period.

    This reduces the amount of grace periods required for nft_hash from N
    to zero additional ones, additionally this guarantees that the set
    elements' extensions of all implementations can be used under RCU
    protection.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • The set implementations' private struct will only contain the elements
    needed to maintain the search structure, all other elements are moved
    to the set extensions.

    Element allocation and initialization is performed centrally by
    nf_tables_api instead of by the different set implementations'
    ->insert() functions. A new "elemsize" member in the set ops specifies
    the amount of memory to reserve for internal usage. Destruction
    will also be moved out of the set implementations by a following patch.

    Except for element allocation, the patch is a simple conversion to
    using data from the extension area.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • A following patch will convert sets to use so called set extensions,
    where the key is not located in a fixed position anymore. This will
    require rhashtable hashing and comparison callbacks to be used.

    As preparation, convert nft_hash to use these callbacks without any
    functional changes.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Improve readability by indenting the parameter initialization.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Following patches will add new private members, restore struct nft_hash
    as preparation.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

25 Mar, 2015

2 commits

  • rhashtable_destroy() variant which stops rehashes, iterates over
    the table and calls a callback to release resources.

    Avoids need for nft_hash to embed rhashtable internals and allows to
    get rid of the being_destroyed flag. It also saves a 2nd mutex
    lock upon destruction.

    Also fixes an RCU lockdep splash on nft set destruction due to
    calling rht_for_each_entry_safe() without holding bucket locks.
    Open code this loop as we need know that no mutations may occur in
    parallel.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Introduce a new bool automatic_shrinking to require the
    user to explicitly opt-in to automatic shrinking of tables.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

24 Mar, 2015

1 commit


21 Mar, 2015

1 commit

  • This patch converts nft_hash to the inlined rhashtable interface.

    This patch also replaces the call to rhashtable_lookup_compare with
    a straight rhashtable_lookup_fast because it's simply doing a memcmp
    (in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).

    Furthermore, the compare function is only meant to compare, it is not
    supposed to have side-effects. The current side-effect code can
    simply be moved into the nft_hash_get.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

13 Mar, 2015

1 commit

  • When we get back an EAGAIN from rhashtable_walk_next we were
    treating it as a valid object which obviously doesn't work too
    well.

    Luckily this is hard to trigger so it seems nobody has run into
    it yet.

    This patch fixes it by redoing the next call when we get an EAGAIN.

    Signed-off-by: Herbert Xu
    Signed-off-by: Pablo Neira Ayuso

    Herbert Xu
     

28 Feb, 2015

1 commit

  • Currently, all real users of rhashtable default their grow and shrink
    decision functions to rht_grow_above_75() and rht_shrink_below_30(),
    so that there's currently no need to have this explicitly selectable.

    It can/should be generic and private inside rhashtable until a real
    use case pops up. Since we can make this private, we'll save us this
    additional indirection layer and can improve insertion/deletion time
    as well.

    Reference: http://patchwork.ozlabs.org/patch/443040/
    Suggested-by: David S. Miller
    Signed-off-by: Daniel Borkmann
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

05 Feb, 2015

1 commit

  • This patch gets rid of the manual rhashtable walk in nft_hash
    which touches rhashtable internals that should not be exposed.
    It does so by using the rhashtable iterator primitives.

    Note that I'm leaving nft_hash_destroy alone since it's only
    invoked on shutdown and it shouldn't be affected by changes
    to rhashtable internals (or at least not what I'm planning to
    change).

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

04 Jan, 2015

4 commits

  • Introduces an array of spinlocks to protect bucket mutations. The number
    of spinlocks per CPU is configurable and selected based on the hash of
    the bucket. This allows for parallel insertions and removals of entries
    which do not share a lock.

    The patch also defers expansion and shrinking to a worker queue which
    allows insertion and removal from atomic context. Insertions and
    deletions may occur in parallel to it and are only held up briefly
    while the particular bucket is linked or unzipped.

    Mutations of the bucket table pointer is protected by a new mutex, read
    access is RCU protected.

    In the event of an expansion or shrinking, the new bucket table allocated
    is exposed as a so called future table as soon as the resize process
    starts. Lookups, deletions, and insertions will briefly use both tables.
    The future table becomes the main table after an RCU grace period and
    initial linking of the old to the new table was performed. Optimization
    of the chains to make use of the new number of buckets follows only the
    new table is in use.

    The side effect of this is that during that RCU grace period, a bucket
    traversal using any rht_for_each() variant on the main table will not see
    any insertions performed during the RCU grace period which would at that
    point land in the future table. The lookup will see them as it searches
    both tables if needed.

    Having multiple insertions and removals occur in parallel requires nelems
    to become an atomic counter.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • The removal function of nft_hash currently stores a reference to the
    previous element during lookup which is used to optimize removal later
    on. This was possible because a lock is held throughout calling
    rhashtable_lookup() and rhashtable_remove().

    With the introdution of deferred table resizing in parallel to lookups
    and insertions, the nftables lock will no longer synchronize all
    table mutations and the stored pprev may become invalid.

    Removing this optimization makes removal slightly more expensive on
    average but allows taking the resize cost out of the insert and
    remove path.

    Signed-off-by: Thomas Graf
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This patch is in preparation to introduce per bucket spinlocks. It
    extends all iterator macros to take the bucket table and bucket
    index. It also introduces a new rht_dereference_bucket() to
    handle protected accesses to buckets.

    It introduces a barrier() to the RCU iterators to the prevent
    the compiler from caching the first element.

    The lockdep verifier is introduced as stub which always succeeds
    and properly implement in the next patch when the locks are
    introduced.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Hash the key inside of rhashtable_lookup_compare() like
    rhashtable_lookup() does. This allows to simplify the hashing
    functions and keep them private.

    Signed-off-by: Thomas Graf
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: David S. Miller

    Thomas Graf
     

14 Nov, 2014

2 commits

  • Reallocation is only required for shrinking and expanding and both rely
    on a mutex for synchronization and callers of rhashtable_init() are in
    non atomic context. Therefore, no reason to continue passing allocation
    hints through the API.

    Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
    for silent fall back to vzalloc() without the OOM killer jumping in as
    pointed out by Eric Dumazet and Eric W. Biederman.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Currently mutex_is_held can only test locks in the that are global
    since it takes no arguments. This prevents rhashtable from being
    used in places where locks are lock, e.g., per-namespace locks.

    This patch adds a parent field to mutex_is_held and rhashtable_params
    so that local locks can be used (and tested).

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu