24 Nov, 2016
1 commit
-
Use the function nft_parse_u32_check() to fetch the value and validate
the u32 attribute into the hash len u8 field.This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso
17 Oct, 2016
1 commit
-
Missing the nla_policy description will also miss the validation check
in kernel.Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso
13 Sep, 2016
2 commits
-
The overflow validation in the init() function establishes that the
maximum value that the hash could reach is less than U32_MAX, which is
likely to be true.The fix detects the overflow when the maximum hash value is less than
the offset itself.Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
Reported-by: Liping Zhang
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso -
Add support to pass through an offset to the hash value. With this
feature, the sysadmin is able to generate a hash with a given
offset value.Example:
meta mark set jhash ip saddr mod 2 seed 0xabcd offset 100
This option generates marks according to the source address from 100 to
101.Signed-off-by: Laura Garcia Liebana
26 Aug, 2016
1 commit
-
nft_dump_register() should only be used with registers, not with
immediates.Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Fixes: 91dbc6be0a62("netfilter: nf_tables: add number generator expression")
Signed-off-by: Pablo Neira Ayuso
22 Aug, 2016
1 commit
-
Fixes the following sparse warning:
net/netfilter/nft_hash.c:40:25: warning:
symbol 'nft_hash_policy' was not declared. Should it be static?Signed-off-by: Wei Yongjun
Signed-off-by: Pablo Neira Ayuso
12 Aug, 2016
2 commits
-
This patch adds a new hash expression, this provides jhash support but
this can be extended to support for other hash functions. The modulus
and seed already comes embedded into this new expression.Use case example:
... meta mark set hash ip saddr mod 10
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso -
Use nft_set_* prefix for backend set implementations, thus we can use
nft_hash for the new hash expression.Signed-off-by: Pablo Neira Ayuso
11 Jul, 2016
1 commit
-
We can pass the netns pointer as parameter to the functions that need to
gain access to it. From basechains, I didn't find any client for this
field anymore so let's remove this too.Signed-off-by: Pablo Neira Ayuso
07 Jul, 2016
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for net-next,
they are:1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================Signed-off-by: David S. Miller
24 Jun, 2016
1 commit
-
New elements are inactive in the preparation phase, and its
NFT_SET_ELEM_BUSY_MASK flag is set on.This busy flag doesn't allow us to delete it from the same transaction,
following a sequence like:begin transaction
add element X
delete element X
end transactionThis sequence is valid and may be triggered by robots. To resolve this
problem, allow deactivating elements that are active in the current
generation (ie. those that has been just added in this batch).Signed-off-by: Pablo Neira Ayuso
15 Jun, 2016
1 commit
-
Liping Zhang says:
"Users may add such a wrong nft rules successfully, which will cause an
endless jump loop:# nft add rule filter test tcp dport vmap {1: jump test}
This is because before we commit, the element in the current anonymous
set is inactive, so osp->walk will skip this element and miss the
validate check."To resolve this problem, this patch passes the generation mask to the
walk function through the iter container structure depending on the code
path:1) If we're dumping the elements, then we have to check if the element
is active in the current generation. Thus, we check for the current
bit in the genmask.2) If we're checking for loops, then we have to check if the element is
active in the next generation, as we're in the middle of a
transaction. Thus, we check for the next bit in the genmask.Based on original patch from Liping Zhang.
Reported-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso
Tested-by: Liping Zhang
05 Apr, 2016
1 commit
-
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section. Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.Change all existing callsites to pass in GFP_KERNEL.
Acked-by: Thomas Graf
Signed-off-by: Bob Copeland
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg
13 Apr, 2015
4 commits
-
This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Simple conversion to use u32 pointers to the beginning of the data
area to keep follow up patches smaller.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Only needlessly complicates things due to requiring specific argument
types. Use memcmp directly.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Replace the array of registers passed to expressions by a struct nft_regs,
containing the verdict as a seperate member, which aliases to the
NFT_REG_VERDICT register.This is needed to seperate the verdict from the data registers completely,
so their size can be changed.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso
08 Apr, 2015
2 commits
-
Add a new "dynset" expression for dynamic set updates.
A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Use atomic operations for the element count to avoid races with async
updates.To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso
01 Apr, 2015
1 commit
-
Add support for element timeouts to nft_hash. The lookup and walking
functions are changed to ignore timed out elements, a periodic garbage
collection task cleans out expired entries.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso
26 Mar, 2015
7 commits
-
Set elements are the last object type not supporting transaction support.
Implement similar to the existing rule transactions:The global transaction counter keeps track of two generations, current
and next. Each element contains a bitmask specifying in which generations
it is inactive.New elements start out as inactive in the current generation and active
in the next. On commit, the previous next generation becomes the current
generation and the element becomes active. The bitmask is then cleared
to indicate that the element is active in all future generations. If the
transaction is aborted, the element is removed from the set before it
becomes active.When removing an element, it gets marked as inactive in the next generation.
On commit the next generation becomes active and the therefor the element
inactive. It is then taken out of then set and released. On abort, the
element is marked as active for the next generation again.Lookups ignore elements not active in the current generation.
The current set types (hash/rbtree) both use a field in the extension area
to store the generation mask. This (currently) does not require any
additional memory since we have some free space in there.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Return the extension area from the ->lookup() function to allow to
consolidate common actions.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
With the conversion to set extensions, it is now possible to consolidate
the different set element destruction functions.The set implementations' ->remove() functions are changed to only take
the element out of their internal data structures. Elements will be freed
in a batched fashion after the global transaction's completion RCU grace
period.This reduces the amount of grace periods required for nft_hash from N
to zero additional ones, additionally this guarantees that the set
elements' extensions of all implementations can be used under RCU
protection.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
The set implementations' private struct will only contain the elements
needed to maintain the search structure, all other elements are moved
to the set extensions.Element allocation and initialization is performed centrally by
nf_tables_api instead of by the different set implementations'
->insert() functions. A new "elemsize" member in the set ops specifies
the amount of memory to reserve for internal usage. Destruction
will also be moved out of the set implementations by a following patch.Except for element allocation, the patch is a simple conversion to
using data from the extension area.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
A following patch will convert sets to use so called set extensions,
where the key is not located in a fixed position anymore. This will
require rhashtable hashing and comparison callbacks to be used.As preparation, convert nft_hash to use these callbacks without any
functional changes.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Improve readability by indenting the parameter initialization.
Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Following patches will add new private members, restore struct nft_hash
as preparation.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso
25 Mar, 2015
2 commits
-
rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
24 Mar, 2015
1 commit
-
Conflicts:
net/netfilter/nf_tables_core.cThe nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.Signed-off-by: David S. Miller
21 Mar, 2015
1 commit
-
This patch converts nft_hash to the inlined rhashtable interface.
This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects. The current side-effect code can
simply be moved into the nft_hash_get.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
13 Mar, 2015
1 commit
-
When we get back an EAGAIN from rhashtable_walk_next we were
treating it as a valid object which obviously doesn't work too
well.Luckily this is hard to trigger so it seems nobody has run into
it yet.This patch fixes it by redoing the next call when we get an EAGAIN.
Signed-off-by: Herbert Xu
Signed-off-by: Pablo Neira Ayuso
28 Feb, 2015
1 commit
-
Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller
Signed-off-by: Daniel Borkmann
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
05 Feb, 2015
1 commit
-
This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
04 Jan, 2015
4 commits
-
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts. Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.Signed-off-by: Thomas Graf
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller -
This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
Hash the key inside of rhashtable_lookup_compare() like
rhashtable_lookup() does. This allows to simplify the hashing
functions and keep them private.Signed-off-by: Thomas Graf
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller
14 Nov, 2014
2 commits
-
Reallocation is only required for shrinking and expanding and both rely
on a mutex for synchronization and callers of rhashtable_init() are in
non atomic context. Therefore, no reason to continue passing allocation
hints through the API.Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
for silent fall back to vzalloc() without the OOM killer jumping in as
pointed out by Eric Dumazet and Eric W. Biederman.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments. This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller