Eric Lee / smarc-fsl-linux-kernel

29 Jul, 2020

1 commit

ce9b362bf rhashtable: Restore RCU marking on rhash_lock_head ... Browse Code »

This patch restores the RCU marking on bucket_table->buckets as
it really does need RCU protection. Its removal had led to a fatal
bug.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2020-07-29 08:09:49 +0800

07 Jun, 2020

1 commit

4a3084aaa rhashtable: Drop raw RCU deref in nested_table_free ... Browse Code »

This patch replaces some unnecessary uses of rcu_dereference_raw
in the rhashtable code with rcu_dereference_protected.

The top-level nested table entry is only marked as RCU because it
shares the same type as the tree entries underneath it. So it
doesn't need any RCU protection.

We also don't need RCU protection when we're freeing a nested RCU
table because by this stage we've long passed a memory barrier
when anyone could change the nested table.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2020-06-07 06:51:17 +0800

19 Jun, 2019

1 commit

d2912cb15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 ... Browse Code »

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Enrico Weigelt
Reviewed-by: Kate Stewart
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-19 23:09:55 +0800

17 May, 2019

2 commits

e9458a4e3 rhashtable: Fix cmpxchg RCU warnings ... Browse Code »

As cmpxchg is a non-RCU mechanism it will cause sparse warnings
when we use it for RCU. This patch adds explicit casts to silence
those warnings. This should probably be moved to RCU itself in
future.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2019-05-17 00:45:20 +0800
ba6306e3f rhashtable: Remove RCU marking from rhash_lock_head ... Browse Code »

The opaque type rhash_lock_head should not be marked with __rcu
because it can never be dereferenced. We should apply the RCU
marking when we turn it into a pointer which can be dereferenced.

This patch does exactly that. This fixes a number of sparse
warnings as well as getting rid of some unnecessary RCU checking.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2019-05-17 00:45:20 +0800

13 Apr, 2019

5 commits

ca0b709d1 rhashtable: use BIT(0) for locking. ... Browse Code »

As reported by Guenter Roeck, the new bit-locking using
BIT(1) doesn't work on the m68k architecture. m68k only requires
2-byte alignment for words and longwords, so there is only one
unused bit in pointers to structs - We current use two, one for the
NULLS marker at the end of the linked list, and one for the bit-lock
in the head of the list.

The two uses don't need to conflict as we never need the head of the
list to be a NULLS marker - the marker is only needed to check if an
object has moved to a different table, and the bucket head cannot
move. The NULLS marker is only needed in a ->next pointer.

As we already have different types for the bucket head pointer (struct
rhash_lock_head) and the ->next pointers (struct rhash_head), it is
fairly easy to treat the lsb differently in each.

So: Initialize buckets heads to NULL, and use the lsb for locking.
When loading the pointer from the bucket head, if it is NULL (ignoring
the lock big), report as being the expected NULLS marker.
When storing a value into a bucket head, if it is a NULLS marker,
store NULL instead.

And convert all places that used bit 1 for locking, to use bit 0.

Fixes: 8f0db018006a ("rhashtable: use bit_spin_locks to protect hash bucket.")
Reported-by: Guenter Roeck
Tested-by: Guenter Roeck
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-13 08:34:45 +0800
f4712b46a rhashtable: replace rht_ptr_locked() with rht_assign_locked() ... Browse Code »

The only times rht_ptr_locked() is used, it is to store a new
value in a bucket-head. This is the only time it makes sense
to use it too. So replace it by a function which does the
whole task: Sets the lock bit and assigns to a bucket head.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-13 08:34:45 +0800
adc6a3ab1 rhashtable: move dereference inside rht_ptr() ... Browse Code »

Rather than dereferencing a pointer to a bucket and then passing the
result to rht_ptr(), we now pass in the pointer and do the dereference
in rht_ptr().

This requires that we pass in the tbl and hash as well to support RCU
checks, and means that the various rht_for_each functions can expect a
pointer that can be dereferenced without further care.

There are two places where we dereference a bucket pointer
where there is no testable protection - in each case we know
that we much have exclusive access without having taken a lock.
The previous code used rht_dereference() to pretend that holding
the mutex provided protects, but holding the mutex never provides
protection for accessing buckets.

So instead introduce rht_ptr_exclusive() that can be used when
there is known to be exclusive access without holding any locks.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-13 08:34:45 +0800
e4edbe3c1 rhashtable: fix some __rcu annotation errors ... Browse Code »

With these annotations, the rhashtable now gets no
warnings when compiled with "C=1" for sparse checking.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-13 08:34:45 +0800
c252aa3e8 rhashtable: use struct_size() in kvzalloc() ... Browse Code »

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kvzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2019-04-13 08:31:33 +0800

08 Apr, 2019

4 commits

149212f07 rhashtable: add lockdep tracking to bucket bit-spin-locks. ... Browse Code »

Native bit_spin_locks are not tracked by lockdep.

The bit_spin_locks used for rhashtable buckets are local
to the rhashtable implementation, so there is little opportunity
for the sort of misuse that lockdep might detect.
However locks are held while a hash function or compare
function is called, and if one of these took a lock,
a misbehaviour is possible.

As it is quite easy to add lockdep support this unlikely
possibility seems to be enough justification.

So create a lockdep class for bucket bit_spin_lock and attach
through a lockdep_map in each bucket_table.

Without the 'nested' annotation in rhashtable_rehash_one(), lockdep
correctly reports a possible problem as this lock is taken
while another bucket lock (in another table) is held. This
confirms that the added support works.
With the correct nested annotation in place, lockdep reports
no problems.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-08 10:12:12 +0800
8f0db0180 rhashtable: use bit_spin_locks to protect hash bucket. ... Browse Code »

This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
- no need to allocate a separate array of locks.
- no need to have a configuration option to guide the
choice of the size of this array
- locking cost is often a single test-and-set in a cache line
that will have to be loaded anyway. When inserting at, or removing
from, the head of the chain, the unlock is free - writing the new
address in the bucket head implicitly clears the lock bit.
For __rhashtable_insert_fast() we ensure this always happens
when adding a new key.
- even when lockings costs 2 updates (lock and unlock), they are
in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved. This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

To enhance type checking, a new struct is introduced to represent the
pointer plus lock-bit
that is stored in the bucket-table. This is "struct rhash_lock_head"
and is empty. A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".

Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head. As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-08 10:12:12 +0800
ff302db96 rhashtable: allow rht_bucket_var to return NULL. ... Browse Code »

Rather than returning a pointer to a static nulls, rht_bucket_var()
now returns NULL if the bucket doesn't exist.
This will make the next patch, which stores a bitlock in the
bucket pointer, somewhat cleaner.

This change involves introducing __rht_bucket_nested() which is
like rht_bucket_nested(), but doesn't provide the static nulls,
and changing rht_bucket_nested() to call this and possible
provide a static nulls - as is still needed for the non-var case.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-08 10:12:12 +0800
7a41c294c rhashtable: use cmpxchg() in nested_table_alloc() ... Browse Code »

nested_table_alloc() relies on the fact that there is
at most one spinlock allocated for every slot in the top
level nested table, so it is not possible for two threads
to try to allocate the same table at the same time.

This assumption is a little fragile (it is not explicit) and is
unnecessary as cmpxchg() can be used instead.

A future patch will replace the spinlocks by per-bucket bitlocks,
and then we won't be able to protect the slot pointer with a spinlock.

So replace rcu_assign_pointer() with cmpxchg() - which has equivalent
barrier properties.
If it the cmp fails, free the table that was just allocated.

Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-04-08 10:12:12 +0800

28 Mar, 2019

1 commit

356d71e00 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2019-03-28 08:37:58 +0800

22 Mar, 2019

3 commits

f7ad68bf9 rhashtable: rename rht_for_each*continue as *from. ... Browse Code »

The pattern set by list.h is that for_each..continue()
iterators start at the next entry after the given one,
while for_each..from() iterators start at the given
entry.

The rht_for_each*continue() iterators are documented as though the
start at the 'next' entry, but actually start at the given entry,
and they are used expecting that behaviour.
So fix the documentation and change the names to *from for consistency
with list.h

Acked-by: Herbert Xu
Acked-by: Miguel Ojeda
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-03-22 05:01:10 +0800
4feb7c7a4 rhashtable: don't hold lock on first table throughout insertion. ... Browse Code »

rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.

As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.

When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table. The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.

This simplifies the code and allows the ->rehash field to be
discarded.

We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop(). This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.

Acked-by: Herbert Xu
Reviewed-by: Paul E. McKenney
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2019-03-22 05:01:10 +0800
408f13ef3 rhashtable: Still do rehash when we get EEXIST ... Browse Code »

As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.

This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.

The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.

Fixes: da20420f83ea ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser
Signed-off-by: Herbert Xu
Tested-by: Josh Elsasser
Signed-off-by: David S. Miller

Herbert Xu
2019-03-22 04:57:28 +0800

22 Feb, 2019

1 commit

6c4128f65 rhashtable: Remove obsolete rhashtable_walk_init function ... Browse Code »

The rhashtable_walk_init function has been obsolete for more than
two years. This patch finally converts its last users over to
rhashtable_walk_enter and removes it.

Signed-off-by: Herbert Xu
Signed-off-by: Johannes Berg

Herbert Xu
2019-02-22 20:49:00 +0800

04 Dec, 2018

1 commit

82208d0d5 rhashtable: detect when object movement between tables might have invalidated a lookup ... Browse Code »

Some users of rhashtables might need to move an object from one table
to another - this appears to be the reason for the incomplete usage
of NULLS markers.

To support these, we store a unique NULLS_MARKER at the end of
each chain, and when a search fails to find a match, we check
if the NULLS marker found was the expected one. If not, the search
may not have examined all objects in the target bucket, so it is
repeated.

The unique NULLS_MARKER is derived from the address of the
head of the chain. As this cannot be derived at load-time the
static rhnull in rht_bucket_nested() needs to be initialised
at run time.

Any caller of a lookup function must still be prepared for the
possibility that the object returned is in a different table - it
might have been there for some time.

Note that this does NOT provide support for other uses of
NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing
the key of an object and re-inserting it in the same table.
These could only be done safely if new objects were inserted
at the *start* of a hash chain, and that is not currently the case.

Signed-off-by: NeilBrown
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

NeilBrown
2018-12-04 07:31:55 +0800

28 Aug, 2018

1 commit

050cdc6c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks.

2) Better fix for AB-BA deadlock in packet scheduler code, from Cong
Wang.

3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel
Borkmann.

4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent
attackers using it as a side-channel. From Eric Dumazet.

5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva.

6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin
Liu.

7) hns and hns3 bug fixes from Huazhong Tan.

8) Fix RIF leak in mlxsw driver, from Ido Schimmel.

9) iova range check fix in vhost, from Jason Wang.

10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend.

11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng.

12) Memory exposure in TCA_U32_SEL handling, from Kees Cook.

13) TCP BBR congestion control fixes from Kevin Yang.

14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger.

15) qed driver fixes from Tomer Tayar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
net: sched: Fix memory exposure from short TCA_U32_SEL
qed: fix spelling mistake "comparsion" -> "comparison"
vhost: correctly check the iova range when waking virtqueue
qlge: Fix netdev features configuration.
net: macb: do not disable MDIO bus at open/close time
Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency"
net: macb: Fix regression breaking non-MDIO fixed-link PHYs
mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
i40e: fix condition of WARN_ONCE for stat strings
i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
ixgbe: fix driver behaviour after issuing VFLR
ixgbe: Prevent unsupported configurations with XDP
ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
igb: Use an advanced ctx descriptor for launchtime
e1000: ensure to free old tx/rx rings in set_ringparam()
e1000: check on netif_running() before calling e1000_up()
ixgb: use dma_zalloc_coherent instead of allocator/memset
ice: Trivial formatting fixes
...

Linus Torvalds
2018-08-28 02:59:39 +0800

23 Aug, 2018

2 commits

2d22ecf6d lib/rhashtable: guarantee initial hashtable allocation ... Browse Code »

rhashtable_init() may fail due to -ENOMEM, thus making the entire api
unusable. This patch removes this scenario, however unlikely. In order
to guarantee memory allocation, this patch always ends up doing
GFP_KERNEL|__GFP_NOFAIL for both the tbl as well as
alloc_bucket_spinlocks().

Upon the first table allocation failure, we shrink the size to the
smallest value that makes sense and retry with __GFP_NOFAIL semantics.
With the defaults, this means that from 64 buckets, we retry with only 4.
Any later issues regarding performance due to collisions or larger table
resizing (when more memory becomes available) is the least of our
problems.

Link: http://lkml.kernel.org/r/20180712185241.4017-9-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso
Signed-off-by: Manfred Spraul
Acked-by: Herbert Xu
Cc: Dmitry Vyukov
Cc: Kees Cook
Cc: Michael Kerrisk
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2018-08-23 01:52:52 +0800
93f976b51 lib/rhashtable: simplify bucket_table_alloc() ... Browse Code »

As of ce91f6ee5b3b ("mm: kvmalloc does not fallback to vmalloc for
incompatible gfp flags") we can simplify the caller and trust kvzalloc()
to just do the right thing. For the case of the GFP_ATOMIC context, we
can drop the __GFP_NORETRY flag for obvious reasons, and for the
__GFP_NOWARN case, however, it is changed such that the caller passes the
flag instead of making bucket_table_alloc() handle it.

This slightly changes the gfp flags passed on to nested_table_alloc() as
it will now also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this
a positive consequence as for the same reasons we want nowarn semantics in
bucket_table_alloc().

[manfred@colorfullife.com: commit id extended to 12 digits, line wraps updated]
Link: http://lkml.kernel.org/r/20180712185241.4017-8-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso
Signed-off-by: Manfred Spraul
Acked-by: Michal Hocko
Cc: Dmitry Vyukov
Cc: Herbert Xu
Cc: Kees Cook
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2018-08-23 01:52:52 +0800

21 Aug, 2018

1 commit

ab08dcd72 rhashtable: remove duplicated include from rhashtable.c ... Browse Code »

Remove duplicated include.

Signed-off-by: Yue Haibing
Signed-off-by: David S. Miller

Yue Haibing
2018-08-21 10:18:50 +0800

21 Jul, 2018

1 commit

c4c5551df Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux ... Browse Code »

All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller

David S. Miller
2018-07-21 12:17:12 +0800

19 Jul, 2018

1 commit

107d01f5b lib/rhashtable: consider param->min_size when setting initial table size ... Browse Code »

rhashtable_init() currently does not take into account the user-passed
min_size parameter unless param->nelem_hint is set as well. As such,
the default size (number of buckets) will always be HASH_DEFAULT_SIZE
even if the smallest allowed size is larger than that. Remediate this
by unconditionally calling into rounded_hashtable_size() and handling
things accordingly.

Signed-off-by: Davidlohr Bueso
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Davidlohr Bueso
2018-07-19 04:27:43 +0800

10 Jul, 2018

1 commit

0026129c8 rhashtable: add restart routine in rhashtable_free_and_destroy() ... Browse Code »

rhashtable_free_and_destroy() cancels re-hash deferred work
then walks and destroys elements. at this moment, some elements can be
still in future_tbl. that elements are not destroyed.

test case:
nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy
all elements of sets before destroying sets and chains.
But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl.
so that splat occurred.

test script:
%cat test.nft
table ip aa {
map map1 {
type ipv4_addr : verdict;
elements = {
0 : jump a0,
1 : jump a0,
2 : jump a0,
3 : jump a0,
4 : jump a0,
5 : jump a0,
6 : jump a0,
7 : jump a0,
8 : jump a0,
9 : jump a0,
}
}
chain a0 {
}
}
flush ruleset
table ip aa {
map map1 {
type ipv4_addr : verdict;
elements = {
0 : jump a0,
1 : jump a0,
2 : jump a0,
3 : jump a0,
4 : jump a0,
5 : jump a0,
6 : jump a0,
7 : jump a0,
8 : jump a0,
9 : jump a0,
}
}
chain a0 {
}
}
flush ruleset

%while :; do nft -f test.nft; done

Splat looks like:
[ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
[ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
[ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
[ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
[ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
[ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
[ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
[ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
[ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
[ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
[ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
[ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
[ 200.930353] Call Trace:
[ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
[ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
[ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
[ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
[ 200.959532] ? nla_parse+0xab/0x230
[ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
[ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
[ 200.975525] ? debug_show_all_locks+0x290/0x290
[ 200.980363] ? debug_show_all_locks+0x290/0x290
[ 200.986356] ? sched_clock_cpu+0x132/0x170
[ 200.990352] ? find_held_lock+0x39/0x1b0
[ 200.994355] ? sched_clock_local+0x10d/0x130
[ 200.999531] ? memset+0x1f/0x40

V2:
- free all tables requested by Herbert Xu

Signed-off-by: Taehee Yoo
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Taehee Yoo
2018-07-10 07:28:51 +0800

03 Jul, 2018

1 commit

c643ecf35 lib: rhashtable: Correct self-assignment in rhashtable.c ... Browse Code »

In file lib/rhashtable.c line 777, skip variable is assigned to
itself. The following error was observed:

lib/rhashtable.c:777:41: warning: explicitly assigning value of
variable of type 'int' to itself [-Wself-assign] error, forbidden
warning: rhashtable.c:777
This error was found when compiling with Clang 6.0. Change it to iter->skip.

Signed-off-by: Rishabh Bhatnagar
Acked-by: Herbert Xu
Reviewed-by: NeilBrown
Signed-off-by: David S. Miller

Rishabh Bhatnagar
2018-07-03 22:25:37 +0800

22 Jun, 2018

6 commits

c0690016a rhashtable: clean up dereference of ->future_tbl. ... Browse Code »

Using rht_dereference_bucket() to dereference
->future_tbl looks like a type error, and could be confusing.
Using rht_dereference_rcu() to test a pointer for NULL
adds an unnecessary barrier - rcu_access_pointer() is preferred
for NULL tests when no lock is held.

This uses 3 different ways to access ->future_tbl.
- if we know the mutex is held, use rht_dereference()
- if we don't hold the mutex, and are only testing for NULL,
use rcu_access_pointer()
- otherwise (using RCU protection for true dereference),
use rht_dereference_rcu().

Note that this includes a simplification of the call to
rhashtable_last_table() - we don't do an extra dereference
before the call any more.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-06-22 12:43:28 +0800
0ad66449a rhashtable: use cmpxchg() to protect ->future_tbl. ... Browse Code »

Rather than borrowing one of the bucket locks to
protect ->future_tbl updates, use cmpxchg().
This gives more freedom to change how bucket locking
is implemented.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-06-22 12:43:27 +0800
5af68ef73 rhashtable: simplify nested_table_alloc() and rht_bucket_nested_insert() ... Browse Code »

Now that we don't use the hash value or shift in nested_table_alloc()
there is room for simplification.
We only need to pass a "is this a leaf" flag to nested_table_alloc(),
and don't need to track as much information in
rht_bucket_nested_insert().

Note there is another minor cleanup in nested_table_alloc() here.
The number of elements in a page of "union nested_tables" is most naturally

PAGE_SIZE / sizeof(ntbl[0])

The previous code had

PAGE_SIZE / sizeof(ntbl[0].bucket)

which happens to be the correct value only because the bucket uses all
the space in the union.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-06-22 12:43:27 +0800
9b4f64a22 rhashtable: simplify INIT_RHT_NULLS_HEAD() ... Browse Code »

The 'ht' and 'hash' arguments to INIT_RHT_NULLS_HEAD() are
no longer used - so drop them. This allows us to also
remove the nhash argument from nested_table_alloc().

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-06-22 12:43:27 +0800
9f9a70773 rhashtable: remove nulls_base and related code. ... Browse Code »

This "feature" is unused, undocumented, and untested and so doesn't
really belong. A patch is under development to properly implement
support for detecting when a search gets diverted down a different
chain, which the common purpose of nulls markers.

This patch actually fixes a bug too. The table resizing allows a
table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
any growth beyond 2^27 is wasteful an ineffective.

This patch results in NULLS_MARKER(0) being used for all chains,
and leaves the use of rht_is_a_null() to test for it.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-06-22 12:43:27 +0800
0eb71a9da rhashtable: split rhashtable.h ... Browse Code »

Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.

This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code. rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-06-22 12:43:27 +0800

25 Apr, 2018

3 commits

5d240a893 rhashtable: improve rhashtable_walk stability when stop/start used. ... Browse Code »

When a walk of an rhashtable is interrupted with rhastable_walk_stop()
and then rhashtable_walk_start(), the location to restart from is based
on a 'skip' count in the current hash chain, and this can be incorrect
if insertions or deletions have happened. This does not happen when
the walk is not stopped and started as iter->p is a placeholder which
is safe to use while holding the RCU read lock.

In rhashtable_walk_start() we can revalidate that 'p' is still in the
same hash chain. If it isn't then the current method is still used.

With this patch, if a rhashtable walker ensures that the current
object remains in the table over a stop/start period (possibly by
elevating the reference count if that is sufficient), it can be sure
that a walk will not miss objects that were in the hashtable for the
whole time of the walk.

rhashtable_walk_start() may not find the object even though it is
still in the hashtable if a rehash has moved it to a new table. In
this case it will (eventually) get -EAGAIN and will need to proceed
through the whole table again to be sure to see everything at least
once.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-04-25 01:21:46 +0800
b41cc04b6 rhashtable: reset iter when rhashtable_walk_start sees new table ... Browse Code »

The documentation claims that when rhashtable_walk_start_check()
detects a resize event, it will rewind back to the beginning
of the table. This is not true. We need to set ->slot and
->skip to be zero for it to be true.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-04-25 01:21:45 +0800
82266e98d rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() ... Browse Code »

Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
they do take a spinlock without irq protection.
So revise the comments to accurately state the contexts in which
these functions can be called.

Acked-by: Herbert Xu
Signed-off-by: NeilBrown
Signed-off-by: David S. Miller

NeilBrown
2018-04-25 01:21:45 +0800

01 Apr, 2018

1 commit

ae6da1f50 rhashtable: add schedule points ... Browse Code »

Rehashing and destroying large hash table takes a lot of time,
and happens in process context. It is safe to add cond_resched()
in rhashtable_rehash_table() and rhashtable_free_and_destroy()

Signed-off-by: Eric Dumazet
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2018-04-01 11:25:39 +0800

07 Mar, 2018

1 commit

d3dcf8eb6 rhashtable: Fix rhlist duplicates insertion ... Browse Code »

When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Issue: 1241076
Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Paul Blakey
2018-03-07 23:44:02 +0800

11 Dec, 2017

1 commit

64e0cd0d3 rhashtable: Call library function alloc_bucket_locks ... Browse Code »

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks. This function is
based on the old alloc_bucket_locks in rhashtable and should
produce the same effect.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2017-12-11 22:58:39 +0800