05 Apr, 2016

1 commit

  • In certain cases, the 802.11 mesh pathtable code wants to
    iterate over all of the entries in the forwarding table from
    the receive path, which is inside an RCU read-side critical
    section. Enable walks inside atomic sections by allowing
    GFP_ATOMIC allocations for the walker state.

    Change all existing callsites to pass in GFP_KERNEL.

    Acked-by: Thomas Graf
    Signed-off-by: Bob Copeland
    [also adjust gfs2/glock.c and rhashtable tests]
    Signed-off-by: Johannes Berg

    Bob Copeland
     

18 Dec, 2015

1 commit


16 Dec, 2015

1 commit


05 Dec, 2015

1 commit

  • Thomas and Phil observed that under stress rhashtable insertion
    sometimes failed with EBUSY, even though this error should only
    ever been seen when we're under attack and our hash chain length
    has grown to an unacceptable level, even after a rehash.

    It turns out that the logic for detecting whether there is an
    existing rehash is faulty. In particular, when two threads both
    try to grow the same table at the same time, one of them may see
    the newly grown table and thus erroneously conclude that it had
    been rehashed. This is what leads to the EBUSY error.

    This patch fixes this by remembering the current last table we
    used during insertion so that rhashtable_insert_rehash can detect
    when another thread has also done a resize/rehash. When this is
    detected we will give up our resize/rehash and simply retry the
    insertion with the new table.

    Reported-by: Thomas Graf
    Reported-by: Phil Sutter
    Signed-off-by: Herbert Xu
    Tested-by: Phil Sutter
    Signed-off-by: David S. Miller

    Herbert Xu
     

17 May, 2015

1 commit

  • We currently have no limit on the number of elements in a hash table.
    This is a problem because some users (tipc) set a ceiling on the
    maximum table size and when that is reached the hash table may
    degenerate. Others may encounter OOM when growing and if we allow
    insertions when that happens the hash table perofrmance may also
    suffer.

    This patch adds a new paramater insecure_max_entries which becomes
    the cap on the table. If unset it defaults to max_size * 2. If
    it is also zero it means that there is no cap on the number of
    elements in the table. However, the table will grow whenever the
    utilisation hits 100% and if that growth fails, you will get ENOMEM
    on insertion.

    As allowing oversubscription is potentially dangerous, the name
    contains the word insecure.

    Note that the cap is not a hard limit. This is done for performance
    reasons as enforcing a hard limit will result in use of atomic ops
    that are heavier than the ones we currently use.

    The reasoning is that we're only guarding against a gross over-
    subscription of the table, rather than a small breach of the limit.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

24 Apr, 2015

1 commit

  • The conversion of mac80211's station table to rhashtable had a bug
    that I found by accident in code review, that hadn't been found as
    rhashtable apparently managed to have a maximum hash chain length
    of one (!) in all our testing.

    In order to test the bug and verify the fix I set my rhashtable's
    max_size very low (4) in order to force getting hash collisions.

    At that point, rhashtable WARNed in rhashtable_insert_rehash() but
    didn't actually reject the hash table insertion. This caused it to
    lose insertions - my master list of stations would have 9 entries,
    but the rhashtable only had 5. This may warrant a deeper look, but
    that WARN_ON() just shouldn't happen.

    Fix this by not returning true from rht_grow_above_100() when the
    rhashtable's max_size has been reached - in this case the user is
    explicitly configuring it to be at most that big, so even if it's
    now above 100% it shouldn't attempt to resize.

    This fixes the "lost insertion" issue and consequently allows my
    code to display its error (and verify my fix for it.)

    Signed-off-by: Johannes Berg
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Johannes Berg
     

26 Mar, 2015

1 commit

  • nftables sets will be converted to use so called setextensions, moving
    the key to a non-fixed position. To hash it, the obj_hashfn must be used,
    however it so far doesn't receive the length parameter.

    Pass the key length to obj_hashfn() and convert existing users.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

25 Mar, 2015

5 commits


24 Mar, 2015

5 commits

  • The commit 963ecbd41a1026d99ec7537c050867428c397b89 ("rhashtable:
    Fix use-after-free in rhashtable_walk_stop") fixed a real bug
    but created another one because we may end up sleeping inside an
    RCU critical section.

    This patch fixes it properly by replacing the mutex with a spin
    lock that specifically protects the walker lists.

    Reported-by: Sasha Levin
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch reintroduces immediate rehash during insertion. If
    we find during insertion that the table is full or the chain
    length exceeds a set limit (currently 16 but may be disabled
    with insecure_elasticity) then we will force an immediate rehash.
    The rehash will contain an expansion if the table utilisation
    exceeds 75%.

    If this rehash fails then the insertion will fail. Otherwise the
    insertion will be reattempted in the new hash table.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds the missing bits to allow multiple rehashes. The
    read-side as well as remove already handle this correctly. So it's
    only the rehasher and insertion that need modification to handle
    this.

    Note that this patch doesn't actually enable it so for now rehashing
    is still only performed by the worker thread.

    This patch also disables the explicit expand/shrink interface because
    the table is meant to expand and shrink automatically, and continuing
    to export these interfaces unnecessarily complicates the life of the
    rehasher since the rehash process is now composed of two parts.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Since every current rhashtable user uses jhash as their hash
    function, the fact that jhash is an inline function causes each
    user to generate a copy of its code.

    This function provides a solution to this problem by allowing
    hashfn to be unset. In which case rhashtable will automatically
    set it to jhash. Furthermore, if the key length is a multiple
    of 4, we will switch over to jhash2.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • When rht_key_hashfn is called from rhashtable itself and params
    is equal to ht->p, there is no point in checking params.key_len
    and falling back to ht->p.key_len.

    For some reason gcc couldn't figure out that params is the same
    as ht->p. So let's help it by only checking params.key_len when
    it's a constant.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

21 Mar, 2015

4 commits

  • We need to include linux/errno.h in rhashtable.h since it doesn't
    always get included otherwise.

    Reported-by: kbuild test robot
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that all rhashtable users have been converted over to the
    inline interface, this patch removes the unused out-of-line
    interface.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch deals with the complaint that we make indirect function
    calls on the fast paths unnecessarily in rhashtable. We resolve
    it by moving the fast paths into inline functions that take struct
    rhashtable_param (which obviously must be the same set of parameters
    supplied to rhashtable_init) as an argument.

    The only remaining indirect call is to obj_hashfn (or key_hashfn it
    obj_hashfn is unset) on the rehash as well as the insert-during-
    rehash slow path.

    This patch also extends the support of vairable-length keys to
    include those where the key is fixed but scattered in the object.
    For example, in netlink we want to key off the namespace and the
    portid but they're not next to each other.

    This patch does this by directly using the object hash function
    as the indicator of whether the key is accessible or not. It
    also adds a new function obj_cmpfn to compare a key against an
    object. This means that the caller no longer needs to supply
    explicit compare functions.

    All this is done in a backwards compatible manner so no existing
    users are affected until they convert to the new interface.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch marks the rhashtable_init params argument const as
    there is no reason to modify it since we will always make a copy
    of it in the rhashtable.

    This patch also fixes a bug where we don't actually round up the
    value of min_size unless it is less than HASH_MIN_SIZE.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

19 Mar, 2015

3 commits


15 Mar, 2015

4 commits

  • This patch moves future_tbl to open up the possibility of having
    multiple rehashes on the same table.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds a rehash counter to bucket_table to indicate
    the last bucket that has been rehashed. This serves two purposes:

    1. Any bucket that has been rehashed can never gain a new object.
    2. If the rehash counter reaches the size of the table, the table
    will forever remain empty.

    This patch also downsizes bucket_table->size to an unsigned int
    since we do not support sizes greater than 32 bits yet.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • There is in fact no need to wait for an RCU grace period in the
    rehash function, since all insertions are guaranteed to go into
    the new table through spin locks.

    This patch uses call_rcu to free the old/rehashed table at our
    leisure.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Previously whenever the walker encountered a resize it simply
    snaps back to the beginning and starts again. However, this only
    works if the rehash started and completed while the walker was
    idle.

    If the walker attempts to restart while the rehash is still ongoing,
    we may miss objects that we shouldn't have.

    This patch fixes this by making the walker walk the old table
    followed by the new table just like all other readers. If a
    rehash is detected we will still signal our caller of the fact
    so they can prepare for duplicates but we will simply continue
    the walk onto the new table after the old one is finished either
    by us or by the rehasher.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

13 Mar, 2015

1 commit

  • Commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker
    queue") changed ht->shift to be atomic, which is actually unnecessary.

    Instead of leaving the current shift in the core rhashtable structure,
    it can be cached inside the individual bucket tables.

    There, it will only be initialized once during a new table allocation
    in the shrink/expansion slow path, and from then onward it stays immutable
    for the rest of the bucket table liftime.

    That allows shift to be non-atomic. The patch also moves hash_rnd
    management into the table setup. The rhashtable structure now consumes
    3 instead of 4 cachelines.

    Signed-off-by: Daniel Borkmann
    Cc: Ying Xue
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Mar, 2015

1 commit

  • Currently hash_rnd is a parameter that users can set. However,
    no existing users set this parameter. It is also something that
    people are unlikely to want to set directly since it's just a
    random number.

    In preparation for allowing the reseeding/rehashing of rhashtable,
    this patch moves hash_rnd into bucket_table so that it's now an
    internal state rather than a parameter.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

28 Feb, 2015

1 commit

  • Currently, all real users of rhashtable default their grow and shrink
    decision functions to rht_grow_above_75() and rht_shrink_below_30(),
    so that there's currently no need to have this explicitly selectable.

    It can/should be generic and private inside rhashtable until a real
    use case pops up. Since we can make this private, we'll save us this
    additional indirection layer and can improve insertion/deletion time
    as well.

    Reference: http://patchwork.ozlabs.org/patch/443040/
    Suggested-by: David S. Miller
    Signed-off-by: Daniel Borkmann
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

22 Feb, 2015

1 commit


05 Feb, 2015

1 commit

  • Some existing rhashtable users get too intimate with it by walking
    the buckets directly. This prevents us from easily changing the
    internals of rhashtable.

    This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
    which will replace these custom walkers.

    They are meant to be usable for both procfs seq_file walks as well
    as walking by a netlink dump. The iterator structure should fit
    inside a netlink dump cb structure, with at least one element to
    spare.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

26 Jan, 2015

1 commit


16 Jan, 2015

1 commit

  • When we put our declared work task in the global workqueue with
    schedule_delayed_work(), its delay parameter is always zero.
    Therefore, we should define a regular work in rhashtable structure
    instead of a delayed work.

    By the way, we add a condition to check whether resizing functions
    are NULL before cancelling the work, avoiding to cancel an
    uninitialized work.

    Lastly, while we wait for all work items we submitted before to run
    to completion with cancel_delayed_work(), ht->mutex has been taken in
    rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
    until all work items are accomplished, and when work items are
    scheduled, the work's function - rht_deferred_worker() will be called.
    However, as rht_deferred_worker() also needs to acquire the lock,
    deadlock might happen at the moment as the lock is already held before.
    So if the cancel work function is moved out of the lock covered scope,
    this will avoid the deadlock.

    Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
    Signed-off-by: Ying Xue
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     

14 Jan, 2015

2 commits

  • As commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for
    worker queue") moves condition statements of verifying whether hash
    table size exceeds its maximum threshold or reaches its minimum
    threshold from resizing functions to resizing decision functions,
    we should add a note in rhashtable.h to indicate the implementation
    of what the grow and shrink decision function must enforce min/max
    shift, otherwise, it's failed to take min/max shift's set watermarks
    into effect.

    Signed-off-by: Ying Xue
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Introduce a new function called rhashtable_lookup_compare_insert()
    which is very similar to rhashtable_lookup_insert(). But the former
    makes use of users' given compare function to look for an object,
    and then inserts it into hash table if found. As the entire process
    of search and insertion is under protection of per bucket lock, this
    can help users to avoid the involvement of extra lock.

    Signed-off-by: Ying Xue
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     

09 Jan, 2015

2 commits

  • Move condition statements of verifying whether hash table size exceeds
    its maximum threshold or reaches its minimum threshold from resizing
    functions to resizing decision functions, avoiding unnecessary wakeup
    for worker queue thread.

    Signed-off-by: Ying Xue
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve a new function called rhashtable_lookup_insert() which makes
    lookup and insertion atomic under bucket lock protection, helping us
    avoid to introduce an extra lock when we search and insert an object
    into hash table.

    Signed-off-by: Ying Xue
    Signed-off-by: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     

05 Jan, 2015

1 commit

  • Fixup below build error:

    include/linux/rhashtable.h: At top level:
    include/linux/rhashtable.h:118:34: error: field ‘mutex’ has incomplete type

    Signed-off-by: Ying Xue
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue