05 Apr, 2016

1 commit

  • In certain cases, the 802.11 mesh pathtable code wants to
    iterate over all of the entries in the forwarding table from
    the receive path, which is inside an RCU read-side critical
    section. Enable walks inside atomic sections by allowing
    GFP_ATOMIC allocations for the walker state.

    Change all existing callsites to pass in GFP_KERNEL.

    Acked-by: Thomas Graf
    Signed-off-by: Bob Copeland
    [also adjust gfs2/glock.c and rhashtable tests]
    Signed-off-by: Johannes Berg

    Bob Copeland
     

01 Jan, 2016

1 commit


19 Dec, 2015

1 commit

  • The commit c6ff5268293ef98e48a99597e765ffc417e39fa5 ("rhashtable:
    Fix walker list corruption") causes a suspicious RCU usage warning
    because we no longer hold ht->mutex when we dereference ht->tbl.

    However, this is a false positive because we now hold ht->lock
    which also guarantees that ht->tbl won't disppear from under us.

    This patch kills the warning by using rcu_dereference_protected.

    Reported-by: kernel test robot
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Dec, 2015

1 commit


17 Dec, 2015

1 commit

  • The commit ba7c95ea3870fe7b847466d39a049ab6f156aa2c ("rhashtable:
    Fix sleeping inside RCU critical section in walk_stop") introduced
    a new spinlock for the walker list. However, it did not convert
    all existing users of the list over to the new spin lock. Some
    continued to use the old mutext for this purpose. This obviously
    led to corruption of the list.

    The fix is to use the spin lock everywhere where we touch the list.

    This also allows us to do rcu_rad_lock before we take the lock in
    rhashtable_walk_start. With the old mutex this would've deadlocked
    but it's safe with the new spin lock.

    Fixes: ba7c95ea3870 ("rhashtable: Fix sleeping inside RCU...")
    Reported-by: Colin Ian King
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

16 Dec, 2015

1 commit

  • William Hua wrote:
    >
    > I wasn't aware there was an enforced minimum size. I simply set the
    > nelem_hint in the rhastable_params struct to 1, expecting it to grow as
    > needed. This caused a segfault afterwards when trying to insert an
    > element.

    OK we're doing the size computation before we enforce the limit
    on min_size.

    ---8
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

09 Dec, 2015

1 commit

  • The patch 9497df88ab5567daa001829051c5f87161a81ff0 ("rhashtable:
    Fix reader/rehash race") added a pair of barriers. In fact the
    wmb is superfluous because every subsequent write to the old or
    new hash table uses rcu_assign_pointer, which itself carriers a
    full barrier prior to the assignment.

    Therefore we may remove the explicit wmb.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

06 Dec, 2015

1 commit


05 Dec, 2015

2 commits

  • When an rhashtable user pounds rhashtable hard with back-to-back
    insertions we may end up growing the table in GFP_ATOMIC context.
    Unfortunately when the table reaches a certain size this often
    fails because we don't have enough physically contiguous pages
    to hold the new table.

    Eric Dumazet suggested (and in fact wrote this patch) using
    __vmalloc instead which can be used in GFP_ATOMIC context.

    Reported-by: Phil Sutter
    Suggested-by: Eric Dumazet
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Thomas and Phil observed that under stress rhashtable insertion
    sometimes failed with EBUSY, even though this error should only
    ever been seen when we're under attack and our hash chain length
    has grown to an unacceptable level, even after a rehash.

    It turns out that the logic for detecting whether there is an
    existing rehash is faulty. In particular, when two threads both
    try to grow the same table at the same time, one of them may see
    the newly grown table and thus erroneously conclude that it had
    been rehashed. This is what leads to the EBUSY error.

    This patch fixes this by remembering the current last table we
    used during insertion so that rhashtable_insert_rehash can detect
    when another thread has also done a resize/rehash. When this is
    detected we will give up our resize/rehash and simply retry the
    insertion with the new table.

    Reported-by: Thomas Graf
    Reported-by: Phil Sutter
    Signed-off-by: Herbert Xu
    Tested-by: Phil Sutter
    Signed-off-by: David S. Miller

    Herbert Xu
     

23 Sep, 2015

1 commit

  • rhashtable_rehash_one() uses complex logic to update entry->next field,
    after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:

    entry->next = 1 | ((base + off) << 1)

    This can be compiled along the lines of:

    entry->next = base + off
    entry->next <next |= 1

    Which will break concurrent readers.

    NULLS value recomputation is not needed here, so just remove
    the complex logic.

    The data race was found with KernelThreadSanitizer (KTSAN).

    Signed-off-by: Dmitry Vyukov
    Acked-by: Eric Dumazet
    Acked-by: Thomas Graf
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Dmitriy Vyukov
     

09 Jul, 2015

1 commit

  • If rhashtable_walk_next detects a resize operation in progress, it jumps
    to the new table and continues walking that one. But it misses to drop
    the reference to it's current item, leading it to continue traversing
    the new table's bucket in which the current item is sorted into, and
    after reaching that bucket's end continues traversing the new table's
    second bucket instead of the first one, thereby potentially missing
    items.

    This fixes the rhashtable runtime test for me. Bug probably introduced
    by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during
    rehash") although not explicitly tested.

    Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash")
    Signed-off-by: Phil Sutter
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Phil Sutter
     

09 Jun, 2015

1 commit


07 Jun, 2015

1 commit


23 May, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/cadence/macb.c
    drivers/net/phy/phy.c
    include/linux/skbuff.h
    net/ipv4/tcp.c
    net/switchdev/switchdev.c

    Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
    renaming overlapping with net-next changes of various
    sorts.

    phy.c was a case of two changes, one adding a local
    variable to a function whilst the second was removing
    one.

    tcp.c overlapped a deadlock fix with the addition of new tcp_info
    statistic values.

    macb.c involved the addition of two zyncq device entries.

    skbuff.h involved adding back ipv4_daddr to nf_bridge_info
    whilst net-next changes put two other existing members of
    that struct into a union.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 May, 2015

1 commit

  • We currently have no limit on the number of elements in a hash table.
    This is a problem because some users (tipc) set a ceiling on the
    maximum table size and when that is reached the hash table may
    degenerate. Others may encounter OOM when growing and if we allow
    insertions when that happens the hash table perofrmance may also
    suffer.

    This patch adds a new paramater insecure_max_entries which becomes
    the cap on the table. If unset it defaults to max_size * 2. If
    it is also zero it means that there is no cap on the number of
    elements in the table. However, the table will grow whenever the
    utilisation hits 100% and if that growth fails, you will get ENOMEM
    on insertion.

    As allowing oversubscription is potentially dangerous, the name
    contains the word insecure.

    Note that the cap is not a hard limit. This is done for performance
    reasons as enforcing a hard limit will result in use of atomic ops
    that are heavier than the ones we currently use.

    The reasoning is that we're only guarding against a gross over-
    subscription of the table, rather than a small breach of the limit.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

06 May, 2015

1 commit


23 Apr, 2015

2 commits

  • The current code currently only stops inserting rehashes into the
    chain when no resizes are currently scheduled. As long as resizes
    are scheduled and while inserting above the utilization watermark,
    more and more rehashes will be scheduled.

    This lead to a perfect DoS storm with thousands of rehashes
    scheduled which lead to thousands of spinlocks to be taken
    sequentially.

    Instead, only allow either a series of resizes or a single rehash.
    Drop any further rehashes and return -EBUSY.

    Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
    Signed-off-by: Thomas Graf
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
    we can't allocate the necessary memory in the current context but the
    limits as set by the user would still allow to grow.

    Thus attempt an async resize in the background where we can allocate
    using GFP_KERNEL which is more likely to succeed. The insertion itself
    will still fail to indicate pressure.

    This fixes a bug where the table would never continue growing once the
    utilization is above 100%.

    Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
    Signed-off-by: Thomas Graf
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Thomas Graf
     

26 Mar, 2015

1 commit

  • nftables sets will be converted to use so called setextensions, moving
    the key to a non-fixed position. To hash it, the obj_hashfn must be used,
    however it so far doesn't receive the length parameter.

    Pass the key length to obj_hashfn() and convert existing users.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

25 Mar, 2015

4 commits


24 Mar, 2015

7 commits

  • The commit 963ecbd41a1026d99ec7537c050867428c397b89 ("rhashtable:
    Fix use-after-free in rhashtable_walk_stop") fixed a real bug
    but created another one because we may end up sleeping inside an
    RCU critical section.

    This patch fixes it properly by replacing the mutex with a spin
    lock that specifically protects the walker lists.

    Reported-by: Sasha Levin
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch reintroduces immediate rehash during insertion. If
    we find during insertion that the table is full or the chain
    length exceeds a set limit (currently 16 but may be disabled
    with insecure_elasticity) then we will force an immediate rehash.
    The rehash will contain an expansion if the table utilisation
    exceeds 75%.

    If this rehash fails then the insertion will fail. Otherwise the
    insertion will be reattempted in the new hash table.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the ability to allocate bucket table with GFP_ATOMIC
    instead of GFP_KERNEL. This is needed when we perform an immediate
    rehash during insertion.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the missing bits to allow multiple rehashes. The
    read-side as well as remove already handle this correctly. So it's
    only the rehasher and insertion that need modification to handle
    this.

    Note that this patch doesn't actually enable it so for now rehashing
    is still only performed by the worker thread.

    This patch also disables the explicit expand/shrink interface because
    the table is meant to expand and shrink automatically, and continuing
    to export these interfaces unnecessarily complicates the life of the
    rehasher since the rehash process is now composed of two parts.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch changes rhashtable_shrink to shrink to the smallest
    size possible rather than halving the table. This is needed
    because with multiple rehashing we will defer shrinking until
    all other rehashing is done, meaning that when we do shrink
    we may be able to shrink a lot.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Since every current rhashtable user uses jhash as their hash
    function, the fact that jhash is an inline function causes each
    user to generate a copy of its code.

    This function provides a solution to this problem by allowing
    hashfn to be unset. In which case rhashtable will automatically
    set it to jhash. Furthermore, if the key length is a multiple
    of 4, we will switch over to jhash2.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The walker is a lockless reader so it too needs an smp_rmb before
    reading the future_tbl field in order to see any new tables that
    may contain elements that we should have walked over.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

21 Mar, 2015

3 commits

  • Now that all rhashtable users have been converted over to the
    inline interface, this patch removes the unused out-of-line
    interface.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch deals with the complaint that we make indirect function
    calls on the fast paths unnecessarily in rhashtable. We resolve
    it by moving the fast paths into inline functions that take struct
    rhashtable_param (which obviously must be the same set of parameters
    supplied to rhashtable_init) as an argument.

    The only remaining indirect call is to obj_hashfn (or key_hashfn it
    obj_hashfn is unset) on the rehash as well as the insert-during-
    rehash slow path.

    This patch also extends the support of vairable-length keys to
    include those where the key is fixed but scattered in the object.
    For example, in netlink we want to key off the namespace and the
    portid but they're not next to each other.

    This patch does this by directly using the object hash function
    as the indicator of whether the key is accessible or not. It
    also adds a new function obj_cmpfn to compare a key against an
    object. This means that the caller no longer needs to supply
    explicit compare functions.

    All this is done in a backwards compatible manner so no existing
    users are affected until they convert to the new interface.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch marks the rhashtable_init params argument const as
    there is no reason to modify it since we will always make a copy
    of it in the rhashtable.

    This patch also fixes a bug where we don't actually round up the
    value of min_size unless it is less than HASH_MIN_SIZE.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

20 Mar, 2015

1 commit

  • Round up min_size respectively round down max_size to the next power
    of two to make sure we always respect the limit specified by the
    user. This is required because we compare the table size against the
    limit before we expand or shrink.

    Also fixes a minor bug where we modified min_size in the params
    provided instead of the copy stored in struct rhashtable.

    Signed-off-by: Thomas Graf
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Thomas Graf
     

19 Mar, 2015

3 commits


17 Mar, 2015

2 commits