Eric Lee / smarc-fsl-linux-kernel

20 Sep, 2016

1 commit

ca26893f0 rhashtable: Add rhlist interface ... Browse Code »

The insecure_elasticity setting is an ugly wart brought out by
users who need to insert duplicate objects (that is, distinct
objects with identical keys) into the same table.

In fact, those users have a much bigger problem. Once those
duplicate objects are inserted, they don't have an interface to
find them (unless you count the walker interface which walks
over the entire table).

Some users have resorted to doing a manual walk over the hash
table which is of course broken because they don't handle the
potential existence of multiple hash tables. The result is that
they will break sporadically when they encounter a hash table
resize/rehash.

This patch provides a way out for those users, at the expense
of an extra pointer per object. Essentially each object is now
a list of objects carrying the same key. The hash table will
only see the lists so nothing changes as far as rhashtable is
concerned.

To use this new interface, you need to insert a struct rhlist_head
into your objects instead of struct rhash_head. While the hash
table is unchanged, for type-safety you'll need to use struct
rhltable instead of struct rhashtable. All the existing interfaces
have been duplicated for rhlist, including the hash table walker.

One missing feature is nulls marking because AFAIK the only potential
user of it does not need duplicate objects. Should anyone need
this it shouldn't be too hard to add.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2016-09-20 16:43:36 +0800

07 Sep, 2016

1 commit

60175ccdf Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Most relevant updates are the removal of per-conntrack timers to
use a workqueue/garbage collection approach instead from Florian
Westphal, the hash and numgen expression for nf_tables from Laura
Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
removal of ip_conntrack sysctl and many other incremental updates on our
Netfilter codebase.

More specifically, they are:

1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
transport area in dccp, sctp, tcp, udp and udplite protocol
conntrackers, from Gao Feng.

2) Missing whitespace on error message in physdev match, from Hangbin Liu.

3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
from Florian.

6) Rename nf_tables set implementation to nft_set_{name}.c

7) Introduce the hash expression to allow arbitrary hashing of selector
concatenations, from Laura Garcia Liebana.

8) Remove ip_conntrack sysctl backward compatibility code, this code has
been around for long time already, and we have two interfaces to do
this already: nf_conntrack sysctl and ctnetlink.

9) Use nf_conntrack_get_ht() helper function whenever possible, instead
of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

10) Add quota expression for nf_tables.

11) Add number generator expression for nf_tables, this supports
incremental and random generators that can be combined with maps,
very useful for load balancing purpose, again from Laura Garcia Liebana.

12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
configuration, this is used by a follow up patch to perform better chain
update validation.

14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
nft_set_hash implementation to honor the NLM_F_EXCL flag.

15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
patch from Florian Westphal.

16) Don't use the DYING bit to know if the conntrack event has been already
delivered, instead a state variable to track event re-delivery
states, also from Florian.

17) Remove the per-conntrack timer, use the workqueue approach that was
discussed during the NFWS, from Florian Westphal.

18) Use the netlink conntrack table dump path to kill stale entries,
again from Florian.

19) Add a garbage collector to get rid of stale conntracks, from
Florian.

20) Reschedule garbage collector if eviction rate is high.

21) Get rid of the __nf_ct_kill_acct() helper.

22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

23) Make nf_log_set() interface assertive on unsupported families.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-09-07 03:45:26 +0800

30 Aug, 2016

1 commit

6abdd5f59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller

David S. Miller
2016-08-30 12:54:02 +0800

27 Aug, 2016

1 commit

9dbeea7f0 rhashtable: fix a memory leak in alloc_bucket_locks() ... Browse Code »

If vmalloc() was successful, do not attempt a kmalloc_array()

Fixes: 4cf0b354d92e ("rhashtable: avoid large lock-array allocations")
Reported-by: CAI Qian
Signed-off-by: Eric Dumazet
Cc: Florian Westphal
Acked-by: Herbert Xu
Tested-by: CAI Qian
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-27 12:59:53 +0800

26 Aug, 2016

1 commit

5ca8cc5bf rhashtable: add rhashtable_lookup_get_insert_key() ... Browse Code »

This patch modifies __rhashtable_insert_fast() so it returns the
existing object that clashes with the one that you want to insert.
In case the object is successfully inserted, NULL is returned.
Otherwise, you get an error via ERR_PTR().

This patch adapts the existing callers of __rhashtable_insert_fast()
so they handle this new logic, and it adds a new
rhashtable_lookup_get_insert_key() interface to fetch this existing
object.

nf_tables needs this change to improve handling of EEXIST cases via
honoring the NLM_F_EXCL flag and by checking if the data part of the
mapping matches what we have.

Cc: Herbert Xu
Cc: Thomas Graf
Signed-off-by: Pablo Neira Ayuso
Acked-by: Herbert Xu

Pablo Neira Ayuso
2016-08-26 23:29:41 +0800

20 Aug, 2016

1 commit

246779dd0 rhashtable: Remove GFP flag from rhashtable_walk_init ... Browse Code »

The commit 8f6fd83c6c5ec66a4a70c728535ddcdfef4f3697 ("rhashtable:
accept GFP flags in rhashtable_walk_init") added a GFP flag argument
to rhashtable_walk_init because some users wish to use the walker
in an unsleepable context.

In fact we don't need to allocate memory in rhashtable_walk_init
at all. The walker is always paired with an iterator so we could
just stash ourselves there.

This patch does that by introducing a new enter function to replace
the existing init function. This way we don't have to churn all
the existing users again.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2016-08-20 05:40:24 +0800

16 Aug, 2016

1 commit

12311959e rhashtable: fix shift by 64 when shrinking ... Browse Code »

I got this:

================================================================================
UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Workqueue: events rht_deferred_worker
0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
Call Trace:
[] dump_stack+0xac/0xfc
[] ? _atomic_dec_and_lock+0xc4/0xc4
[] ubsan_epilogue+0xd/0x8a
[] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
[] ? __ubsan_handle_out_of_bounds+0x180/0x180
[] ? nl80211_req_set_reg+0x256/0x2f0
[] ? print_context_stack+0x8a/0x160
[] ? amd_pmu_reset+0x341/0x380
[] rht_deferred_worker+0x1618/0x1790
[] ? rht_deferred_worker+0x1618/0x1790
[] ? rhashtable_jhash2+0x370/0x370
[] ? process_one_work+0x6fd/0x1970
[] process_one_work+0x79f/0x1970
[] ? process_one_work+0x6fd/0x1970
[] ? try_to_grab_pending+0x4c0/0x4c0
[] ? worker_thread+0x1c4/0x1340
[] worker_thread+0x55f/0x1340
[] ? __schedule+0x4df/0x1d40
[] ? process_one_work+0x1970/0x1970
[] ? process_one_work+0x1970/0x1970
[] kthread+0x237/0x390
[] ? __kthread_parkme+0x280/0x280
[] ? _raw_spin_unlock_irq+0x33/0x50
[] ret_from_fork+0x1f/0x40
[] ? __kthread_parkme+0x280/0x280
================================================================================

roundup_pow_of_two() is undefined when called with an argument of 0, so
let's avoid the call and just fall back to ht->p.min_size (which should
never be smaller than HASH_MIN_SIZE).

Cc: Herbert Xu
Signed-off-by: Vegard Nossum
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Vegard Nossum
2016-08-16 02:10:09 +0800

15 Aug, 2016

1 commit

4cf0b354d rhashtable: avoid large lock-array allocations ... Browse Code »

Sander reports following splat after netfilter nat bysrc table got
converted to rhashtable:

swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
[] warn_alloc_failed+0xdd/0x140
[] __alloc_pages_nodemask+0x3e1/0xcf0
[] alloc_pages_current+0x8d/0x110
[] kmalloc_order+0x1f/0x70
[] __kmalloc+0x129/0x140
[] bucket_table_alloc+0xc1/0x1d0
[] rhashtable_insert_rehash+0x5d/0xe0
[] nf_nat_setup_info+0x2ef/0x400

The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.

Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.

In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu. So with 64 byte per cacheline and 4 byte per
spinlock this gives 32 locks per cpu.

Resulting size of the lock-array (sizeof(spinlock) == 4):

cpus: 1 2 4 8 16 32 64
old: 1k 1k 4k 8k 16k 16k 16k
new: 128 256 512 1k 2k 4k 8k

8k allocation should have decent chance of success even
with GFP_ATOMIC, and should not fail with GFP_KERNEL.

With 72-byte spinlock (LOCKDEP):
cpus : 1 2
old: 9k 18k
new: ~2k ~4k

Reported-by: Sander Eikelenboom
Suggested-by: Thomas Graf
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-08-15 12:12:57 +0800

05 Apr, 2016

1 commit

8f6fd83c6 rhashtable: accept GFP flags in rhashtable_walk_init ... Browse Code »

In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section. Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf
Signed-off-by: Bob Copeland
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg

Bob Copeland
2016-04-05 16:56:32 +0800

01 Jan, 2016

1 commit

c07f30ad6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2016-01-01 07:20:10 +0800

19 Dec, 2015

1 commit

179ccc0a7 rhashtable: Kill harmless RCU warning in rhashtable_walk_init ... Browse Code »

The commit c6ff5268293ef98e48a99597e765ffc417e39fa5 ("rhashtable:
Fix walker list corruption") causes a suspicious RCU usage warning
because we no longer hold ht->mutex when we dereference ht->tbl.

However, this is a false positive because we now hold ht->lock
which also guarantees that ht->tbl won't disppear from under us.

This patch kills the warning by using rcu_dereference_protected.

Reported-by: kernel test robot
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-12-19 12:44:18 +0800

18 Dec, 2015

1 commit

b3e0d3d7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller

David S. Miller
2015-12-18 11:08:28 +0800

17 Dec, 2015

1 commit

c6ff52682 rhashtable: Fix walker list corruption ... Browse Code »

The commit ba7c95ea3870fe7b847466d39a049ab6f156aa2c ("rhashtable:
Fix sleeping inside RCU critical section in walk_stop") introduced
a new spinlock for the walker list. However, it did not convert
all existing users of the list over to the new spin lock. Some
continued to use the old mutext for this purpose. This obviously
led to corruption of the list.

The fix is to use the spin lock everywhere where we touch the list.

This also allows us to do rcu_rad_lock before we take the lock in
rhashtable_walk_start. With the old mutex this would've deadlocked
but it's safe with the new spin lock.

Fixes: ba7c95ea3870 ("rhashtable: Fix sleeping inside RCU...")
Reported-by: Colin Ian King
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-12-17 00:13:14 +0800

16 Dec, 2015

1 commit

3a324606b rhashtable: Enforce minimum size on initial hash table ... Browse Code »

William Hua wrote:
>
> I wasn't aware there was an enforced minimum size. I simply set the
> nelem_hint in the rhastable_params struct to 1, expecting it to grow as
> needed. This caused a segfault afterwards when trying to insert an
> element.

OK we're doing the size computation before we enforce the limit
on min_size.

---8
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-12-16 23:44:08 +0800

09 Dec, 2015

1 commit

46c749eac rhashtable: Remove unnecessary wmb for future_tbl ... Browse Code »

The patch 9497df88ab5567daa001829051c5f87161a81ff0 ("rhashtable:
Fix reader/rehash race") added a pair of barriers. In fact the
wmb is superfluous because every subsequent write to the old or
new hash table uses rcu_assign_pointer, which itself carriers a
full barrier prior to the assignment.

Therefore we may remove the explicit wmb.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-12-09 11:46:32 +0800

06 Dec, 2015

1 commit

a90099d9f Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation" ... Browse Code »

This reverts commit d3716f18a7d841565c930efde30737a3557eee69.

vmalloc cannot be used in BH disabled contexts, even
with GFP_ATOMIC. And we certainly want to support
rhashtable users inserting entries with software
interrupts disabled.

Signed-off-by: David S. Miller

David S. Miller
2015-12-06 11:47:11 +0800

05 Dec, 2015

2 commits

d3716f18a rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation ... Browse Code »

When an rhashtable user pounds rhashtable hard with back-to-back
insertions we may end up growing the table in GFP_ATOMIC context.
Unfortunately when the table reaches a certain size this often
fails because we don't have enough physically contiguous pages
to hold the new table.

Eric Dumazet suggested (and in fact wrote this patch) using
__vmalloc instead which can be used in GFP_ATOMIC context.

Reported-by: Phil Sutter
Suggested-by: Eric Dumazet
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-12-05 05:53:05 +0800
3cf92222a rhashtable: Prevent spurious EBUSY errors on insertion ... Browse Code »

Thomas and Phil observed that under stress rhashtable insertion
sometimes failed with EBUSY, even though this error should only
ever been seen when we're under attack and our hash chain length
has grown to an unacceptable level, even after a rehash.

It turns out that the logic for detecting whether there is an
existing rehash is faulty. In particular, when two threads both
try to grow the same table at the same time, one of them may see
the newly grown table and thus erroneously conclude that it had
been rehashed. This is what leads to the EBUSY error.

This patch fixes this by remembering the current last table we
used during insertion so that rhashtable_insert_rehash can detect
when another thread has also done a resize/rehash. When this is
detected we will give up our resize/rehash and simply retry the
insertion with the new table.

Reported-by: Thomas Graf
Reported-by: Phil Sutter
Signed-off-by: Herbert Xu
Tested-by: Phil Sutter
Signed-off-by: David S. Miller

Herbert Xu
2015-12-05 03:38:26 +0800

23 Sep, 2015

1 commit

7def0f952 lib: fix data race in rhashtable_rehash_one ... Browse Code »

rhashtable_rehash_one() uses complex logic to update entry->next field,
after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:

entry->next = 1 | ((base + off) << 1)

This can be compiled along the lines of:

entry->next = base + off
entry->next <next |= 1

Which will break concurrent readers.

NULLS value recomputation is not needed here, so just remove
the complex logic.

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov
Acked-by: Eric Dumazet
Acked-by: Thomas Graf
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Dmitriy Vyukov
2015-09-23 08:36:07 +0800

09 Jul, 2015

1 commit

142b942a7 rhashtable: fix for resize events during table walk ... Browse Code »

If rhashtable_walk_next detects a resize operation in progress, it jumps
to the new table and continues walking that one. But it misses to drop
the reference to it's current item, leading it to continue traversing
the new table's bucket in which the current item is sorted into, and
after reaching that bucket's end continues traversing the new table's
second bucket instead of the first one, thereby potentially missing
items.

This fixes the rhashtable runtime test for me. Bug probably introduced
by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during
rehash") although not explicitly tested.

Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash")
Signed-off-by: Phil Sutter
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Phil Sutter
2015-07-09 05:53:49 +0800

09 Jun, 2015

1 commit

941742f49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2015-06-09 11:06:56 +0800

07 Jun, 2015

1 commit

6d7954130 rhashtable: add missing import <linux/export.h> ... Browse Code »

rhashtable uses EXPORT_SYMBOL_GPL() without importing linux/export.h
directly it is only imported indirectly through some other includes.

Signed-off-by: Hauke Mehrtens
Signed-off-by: David S. Miller

Hauke Mehrtens
2015-06-07 15:10:15 +0800

23 May, 2015

1 commit

36583eb54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/cadence/macb.c
drivers/net/phy/phy.c
include/linux/skbuff.h
net/ipv4/tcp.c
net/switchdev/switchdev.c

Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
renaming overlapping with net-next changes of various
sorts.

phy.c was a case of two changes, one adding a local
variable to a function whilst the second was removing
one.

tcp.c overlapped a deadlock fix with the addition of new tcp_info
statistic values.

macb.c involved the addition of two zyncq device entries.

skbuff.h involved adding back ipv4_daddr to nf_bridge_info
whilst net-next changes put two other existing members of
that struct into a union.

Signed-off-by: David S. Miller

David S. Miller
2015-05-23 13:22:35 +0800

17 May, 2015

1 commit

07ee0722b rhashtable: Add cap on number of elements in hash table ... Browse Code »

We currently have no limit on the number of elements in a hash table.
This is a problem because some users (tipc) set a ceiling on the
maximum table size and when that is reached the hash table may
degenerate. Others may encounter OOM when growing and if we allow
insertions when that happens the hash table perofrmance may also
suffer.

This patch adds a new paramater insecure_max_entries which becomes
the cap on the table. If unset it defaults to max_size * 2. If
it is also zero it means that there is no cap on the number of
elements in the table. However, the table will grow whenever the
utilisation hits 100% and if that growth fails, you will get ENOMEM
on insertion.

As allowing oversubscription is potentially dangerous, the name
contains the word insecure.

Note that the cap is not a hard limit. This is done for performance
reasons as enforcing a hard limit will result in use of atomic ops
that are heavier than the ones we currently use.

The reasoning is that we're only guarding against a gross over-
subscription of the table, rather than a small breach of the limit.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-05-17 06:08:26 +0800

06 May, 2015

1 commit

c936a79fc rhashtable: Simplify iterator code ... Browse Code »

Remove useless obj variable and goto logic.

Signed-off-by: Thomas Graf
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Thomas Graf
2015-05-06 07:30:47 +0800

23 Apr, 2015

2 commits

a87b9ebf1 rhashtable: Do not schedule more than one rehash if we can't grow further ... Browse Code »

The current code currently only stops inserting rehashes into the
chain when no resizes are currently scheduled. As long as resizes
are scheduled and while inserting above the utilization watermark,
more and more rehashes will be scheduled.

This lead to a perfect DoS storm with thousands of rehashes
scheduled which lead to thousands of spinlocks to be taken
sequentially.

Instead, only allow either a series of resizes or a single rehash.
Drop any further rehashes and return -EBUSY.

Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Thomas Graf
2015-04-23 02:17:22 +0800
e2307ed6c rhashtable: Schedule async resize when sync realloc fails ... Browse Code »

When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
we can't allocate the necessary memory in the current context but the
limits as set by the user would still allow to grow.

Thus attempt an async resize in the background where we can allocate
using GFP_KERNEL which is more likely to succeed. The insertion itself
will still fail to indicate pressure.

This fixes a bug where the table would never continue growing once the
utilization is above 100%.

Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Thomas Graf
2015-04-23 02:17:22 +0800

26 Mar, 2015

1 commit

49f7b33e6 rhashtable: provide len to obj_hashfn ... Browse Code »

nftables sets will be converted to use so called setextensions, moving
the key to a non-fixed position. To hash it, the obj_hashfn must be used,
however it so far doesn't receive the length parameter.

Pass the key length to obj_hashfn() and convert existing users.

Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso

Patrick McHardy
2015-03-26 00:18:33 +0800

25 Mar, 2015

4 commits

6b6f302ce rhashtable: Add rhashtable_free_and_destroy() ... Browse Code »

rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-03-25 05:48:40 +0800
b5e2c150a rhashtable: Disable automatic shrinking by default ... Browse Code »

Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-03-25 05:48:40 +0800
299e5c32a rhashtable: Use 'unsigned int' consistently ... Browse Code »

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-03-25 05:48:39 +0800
27ed44a5d rhashtable: Add comment on choice of elasticity value ... Browse Code »

This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-03-25 02:57:04 +0800

24 Mar, 2015

7 commits

ba7c95ea3 rhashtable: Fix sleeping inside RCU critical section in walk_stop ... Browse Code »

The commit 963ecbd41a1026d99ec7537c050867428c397b89 ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.

Reported-by: Sasha Levin
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:16:07 +0800
ccd57b1bd rhashtable: Add immediate rehash during insertion ... Browse Code »

This patch reintroduces immediate rehash during insertion. If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail. Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:07:52 +0800
b9ecfdaa1 rhashtable: Allow GFP_ATOMIC bucket table allocation ... Browse Code »

This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL. This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:07:52 +0800
b824478b2 rhashtable: Add multiple rehash support ... Browse Code »

This patch adds the missing bits to allow multiple rehashes. The
read-side as well as remove already handle this correctly. So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.

This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:07:52 +0800
18093d1c0 rhashtable: Shrink to fit ... Browse Code »

This patch changes rhashtable_shrink to shrink to the smallest
size possible rather than halving the table. This is needed
because with multiple rehashing we will defer shrinking until
all other rehashing is done, meaning that when we do shrink
we may be able to shrink a lot.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:07:52 +0800
31ccde2da rhashtable: Allow hashfn to be unset ... Browse Code »

Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset. In which case rhashtable will automatically
set it to jhash. Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:07:51 +0800
d88252f9b rhashtable: Add barrier to ensure we see new tables in walker ... Browse Code »

The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-03-24 10:07:51 +0800

21 Mar, 2015

1 commit

dc0ee268d rhashtable: Rip out obsolete out-of-line interface ... Browse Code »

Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-03-21 04:16:24 +0800