03 Apr, 2019

1 commit

  • [ Upstream commit 408f13ef358aa5ad56dc6230c2c7deb92cf462b1 ]

    As it stands if a shrink is delayed because of an outstanding
    rehash, we will go into a rescheduling loop without ever doing
    the rehash.

    This patch fixes this by still carrying out the rehash and then
    rescheduling so that we can shrink after the completion of the
    rehash should it still be necessary.

    The return value of EEXIST captures this case and other cases
    (e.g., another thread expanded/rehashed the table at the same
    time) where we should still proceed with the rehash.

    Fixes: da20420f83ea ("rhashtable: Add nested tables")
    Reported-by: Josh Elsasser
    Signed-off-by: Herbert Xu
    Tested-by: Josh Elsasser
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     

28 Aug, 2018

1 commit

  • Pull networking fixes from David Miller:

    1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks.

    2) Better fix for AB-BA deadlock in packet scheduler code, from Cong
    Wang.

    3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel
    Borkmann.

    4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent
    attackers using it as a side-channel. From Eric Dumazet.

    5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva.

    6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin
    Liu.

    7) hns and hns3 bug fixes from Huazhong Tan.

    8) Fix RIF leak in mlxsw driver, from Ido Schimmel.

    9) iova range check fix in vhost, from Jason Wang.

    10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend.

    11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng.

    12) Memory exposure in TCA_U32_SEL handling, from Kees Cook.

    13) TCP BBR congestion control fixes from Kevin Yang.

    14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger.

    15) qed driver fixes from Tomer Tayar.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
    net: sched: Fix memory exposure from short TCA_U32_SEL
    qed: fix spelling mistake "comparsion" -> "comparison"
    vhost: correctly check the iova range when waking virtqueue
    qlge: Fix netdev features configuration.
    net: macb: do not disable MDIO bus at open/close time
    Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency"
    net: macb: Fix regression breaking non-MDIO fixed-link PHYs
    mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
    i40e: fix condition of WARN_ONCE for stat strings
    i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
    ixgbe: fix driver behaviour after issuing VFLR
    ixgbe: Prevent unsupported configurations with XDP
    ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
    igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
    igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
    igb: Use an advanced ctx descriptor for launchtime
    e1000: ensure to free old tx/rx rings in set_ringparam()
    e1000: check on netif_running() before calling e1000_up()
    ixgb: use dma_zalloc_coherent instead of allocator/memset
    ice: Trivial formatting fixes
    ...

    Linus Torvalds
     

23 Aug, 2018

2 commits

  • rhashtable_init() may fail due to -ENOMEM, thus making the entire api
    unusable. This patch removes this scenario, however unlikely. In order
    to guarantee memory allocation, this patch always ends up doing
    GFP_KERNEL|__GFP_NOFAIL for both the tbl as well as
    alloc_bucket_spinlocks().

    Upon the first table allocation failure, we shrink the size to the
    smallest value that makes sense and retry with __GFP_NOFAIL semantics.
    With the defaults, this means that from 64 buckets, we retry with only 4.
    Any later issues regarding performance due to collisions or larger table
    resizing (when more memory becomes available) is the least of our
    problems.

    Link: http://lkml.kernel.org/r/20180712185241.4017-9-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Acked-by: Herbert Xu
    Cc: Dmitry Vyukov
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • As of ce91f6ee5b3b ("mm: kvmalloc does not fallback to vmalloc for
    incompatible gfp flags") we can simplify the caller and trust kvzalloc()
    to just do the right thing. For the case of the GFP_ATOMIC context, we
    can drop the __GFP_NORETRY flag for obvious reasons, and for the
    __GFP_NOWARN case, however, it is changed such that the caller passes the
    flag instead of making bucket_table_alloc() handle it.

    This slightly changes the gfp flags passed on to nested_table_alloc() as
    it will now also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this
    a positive consequence as for the same reasons we want nowarn semantics in
    bucket_table_alloc().

    [manfred@colorfullife.com: commit id extended to 12 digits, line wraps updated]
    Link: http://lkml.kernel.org/r/20180712185241.4017-8-manfred@colorfullife.com
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Manfred Spraul
    Acked-by: Michal Hocko
    Cc: Dmitry Vyukov
    Cc: Herbert Xu
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

21 Aug, 2018

1 commit


21 Jul, 2018

1 commit


19 Jul, 2018

1 commit

  • rhashtable_init() currently does not take into account the user-passed
    min_size parameter unless param->nelem_hint is set as well. As such,
    the default size (number of buckets) will always be HASH_DEFAULT_SIZE
    even if the smallest allowed size is larger than that. Remediate this
    by unconditionally calling into rounded_hashtable_size() and handling
    things accordingly.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Davidlohr Bueso
     

10 Jul, 2018

1 commit

  • rhashtable_free_and_destroy() cancels re-hash deferred work
    then walks and destroys elements. at this moment, some elements can be
    still in future_tbl. that elements are not destroyed.

    test case:
    nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy
    all elements of sets before destroying sets and chains.
    But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl.
    so that splat occurred.

    test script:
    %cat test.nft
    table ip aa {
    map map1 {
    type ipv4_addr : verdict;
    elements = {
    0 : jump a0,
    1 : jump a0,
    2 : jump a0,
    3 : jump a0,
    4 : jump a0,
    5 : jump a0,
    6 : jump a0,
    7 : jump a0,
    8 : jump a0,
    9 : jump a0,
    }
    }
    chain a0 {
    }
    }
    flush ruleset
    table ip aa {
    map map1 {
    type ipv4_addr : verdict;
    elements = {
    0 : jump a0,
    1 : jump a0,
    2 : jump a0,
    3 : jump a0,
    4 : jump a0,
    5 : jump a0,
    6 : jump a0,
    7 : jump a0,
    8 : jump a0,
    9 : jump a0,
    }
    }
    chain a0 {
    }
    }
    flush ruleset

    %while :; do nft -f test.nft; done

    Splat looks like:
    [ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
    [ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
    [ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
    [ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
    [ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
    [ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
    [ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
    [ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
    [ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
    [ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
    [ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
    [ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
    [ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
    [ 200.930353] Call Trace:
    [ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
    [ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
    [ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
    [ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
    [ 200.959532] ? nla_parse+0xab/0x230
    [ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
    [ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
    [ 200.975525] ? debug_show_all_locks+0x290/0x290
    [ 200.980363] ? debug_show_all_locks+0x290/0x290
    [ 200.986356] ? sched_clock_cpu+0x132/0x170
    [ 200.990352] ? find_held_lock+0x39/0x1b0
    [ 200.994355] ? sched_clock_local+0x10d/0x130
    [ 200.999531] ? memset+0x1f/0x40

    V2:
    - free all tables requested by Herbert Xu

    Signed-off-by: Taehee Yoo
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Taehee Yoo
     

03 Jul, 2018

1 commit

  • In file lib/rhashtable.c line 777, skip variable is assigned to
    itself. The following error was observed:

    lib/rhashtable.c:777:41: warning: explicitly assigning value of
    variable of type 'int' to itself [-Wself-assign] error, forbidden
    warning: rhashtable.c:777
    This error was found when compiling with Clang 6.0. Change it to iter->skip.

    Signed-off-by: Rishabh Bhatnagar
    Acked-by: Herbert Xu
    Reviewed-by: NeilBrown
    Signed-off-by: David S. Miller

    Rishabh Bhatnagar
     

22 Jun, 2018

6 commits

  • Using rht_dereference_bucket() to dereference
    ->future_tbl looks like a type error, and could be confusing.
    Using rht_dereference_rcu() to test a pointer for NULL
    adds an unnecessary barrier - rcu_access_pointer() is preferred
    for NULL tests when no lock is held.

    This uses 3 different ways to access ->future_tbl.
    - if we know the mutex is held, use rht_dereference()
    - if we don't hold the mutex, and are only testing for NULL,
    use rcu_access_pointer()
    - otherwise (using RCU protection for true dereference),
    use rht_dereference_rcu().

    Note that this includes a simplification of the call to
    rhashtable_last_table() - we don't do an extra dereference
    before the call any more.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • Rather than borrowing one of the bucket locks to
    protect ->future_tbl updates, use cmpxchg().
    This gives more freedom to change how bucket locking
    is implemented.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • Now that we don't use the hash value or shift in nested_table_alloc()
    there is room for simplification.
    We only need to pass a "is this a leaf" flag to nested_table_alloc(),
    and don't need to track as much information in
    rht_bucket_nested_insert().

    Note there is another minor cleanup in nested_table_alloc() here.
    The number of elements in a page of "union nested_tables" is most naturally

    PAGE_SIZE / sizeof(ntbl[0])

    The previous code had

    PAGE_SIZE / sizeof(ntbl[0].bucket)

    which happens to be the correct value only because the bucket uses all
    the space in the union.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • The 'ht' and 'hash' arguments to INIT_RHT_NULLS_HEAD() are
    no longer used - so drop them. This allows us to also
    remove the nhash argument from nested_table_alloc().

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • This "feature" is unused, undocumented, and untested and so doesn't
    really belong. A patch is under development to properly implement
    support for detecting when a search gets diverted down a different
    chain, which the common purpose of nulls markers.

    This patch actually fixes a bug too. The table resizing allows a
    table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
    any growth beyond 2^27 is wasteful an ineffective.

    This patch results in NULLS_MARKER(0) being used for all chains,
    and leaves the use of rht_is_a_null() to test for it.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • Due to the use of rhashtables in net namespaces,
    rhashtable.h is included in lots of the kernel,
    so a small changes can required a large recompilation.
    This makes development painful.

    This patch splits out rhashtable-types.h which just includes
    the major type declarations, and does not include (non-trivial)
    inline code. rhashtable.h is no longer included by anything
    in the include/ directory.
    Common include files only include rhashtable-types.h so a large
    recompilation is only triggered when that changes.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

25 Apr, 2018

3 commits

  • When a walk of an rhashtable is interrupted with rhastable_walk_stop()
    and then rhashtable_walk_start(), the location to restart from is based
    on a 'skip' count in the current hash chain, and this can be incorrect
    if insertions or deletions have happened. This does not happen when
    the walk is not stopped and started as iter->p is a placeholder which
    is safe to use while holding the RCU read lock.

    In rhashtable_walk_start() we can revalidate that 'p' is still in the
    same hash chain. If it isn't then the current method is still used.

    With this patch, if a rhashtable walker ensures that the current
    object remains in the table over a stop/start period (possibly by
    elevating the reference count if that is sufficient), it can be sure
    that a walk will not miss objects that were in the hashtable for the
    whole time of the walk.

    rhashtable_walk_start() may not find the object even though it is
    still in the hashtable if a rehash has moved it to a new table. In
    this case it will (eventually) get -EAGAIN and will need to proceed
    through the whole table again to be sure to see everything at least
    once.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • The documentation claims that when rhashtable_walk_start_check()
    detects a resize event, it will rewind back to the beginning
    of the table. This is not true. We need to set ->slot and
    ->skip to be zero for it to be true.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     
  • Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
    they do take a spinlock without irq protection.
    So revise the comments to accurately state the contexts in which
    these functions can be called.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

01 Apr, 2018

1 commit

  • Rehashing and destroying large hash table takes a lot of time,
    and happens in process context. It is safe to add cond_resched()
    in rhashtable_rehash_table() and rhashtable_free_and_destroy()

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Mar, 2018

1 commit

  • When inserting duplicate objects (those with the same key),
    current rhlist implementation messes up the chain pointers by
    updating the bucket pointer instead of prev next pointer to the
    newly inserted node. This causes missing elements on removal and
    travesal.

    Fix that by properly updating pprev pointer to point to
    the correct rhash_head next pointer.

    Issue: 1241076
    Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
    Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
    Signed-off-by: Paul Blakey
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Paul Blakey
     

11 Dec, 2017

3 commits

  • To allocate the array of bucket locks for the hash table we now
    call library function alloc_bucket_spinlocks. This function is
    based on the old alloc_bucket_locks in rhashtable and should
    produce the same effect.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • This function is like rhashtable_walk_next except that it only returns
    the current element in the inter and does not advance the iter.

    This patch also creates __rhashtable_walk_find_next. It finds the next
    element in the table when the entry cached in iter is NULL or at the end
    of a slot. __rhashtable_walk_find_next is called from
    rhashtable_walk_next and rhastable_walk_peek.

    end_of_table is an added field to the iter structure. This indicates
    that the end of table was reached (walker.tbl being NULL is not a
    sufficient condition for end of table).

    Signed-off-by: Tom Herbert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Most callers of rhashtable_walk_start don't care about a resize event
    which is indicated by a return value of -EAGAIN. So calls to
    rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
    like this is common:

    ret = rhashtable_walk_start(rhiter);
    if (ret && ret != -EAGAIN)
    goto out;

    Since zero and -EAGAIN are the only possible return values from the
    function this check is pointless. The condition never evaluates to true.

    This patch changes rhashtable_walk_start to return void. This simplifies
    code for the callers that ignore -EAGAIN. For the few cases where the
    caller cares about the resize event, particularly where the table can be
    walked in mulitple parts for netlink or seq file dump, the function
    rhashtable_walk_start_check has been added that returns -EAGAIN on a
    resize event.

    Signed-off-by: Tom Herbert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Tom Herbert
     

20 Sep, 2017

1 commit

  • Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
    the beginning of the hash table. Confusion between rhashtable_walk_enter
    and rhashtable_walk_start has already lead to a bug.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: David S. Miller

    Andreas Gruenbacher
     

16 Jul, 2017

1 commit

  • Pull random updates from Ted Ts'o:
    "Add wait_for_random_bytes() and get_random_*_wait() functions so that
    callers can more safely get random bytes if they can block until the
    CRNG is initialized.

    Also print a warning if get_random_*() is called before the CRNG is
    initialized. By default, only one single-line warning will be printed
    per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
    warning will be printed for each function which tries to get random
    bytes before the CRNG is initialized. This can get spammy for certain
    architecture types, so it is not enabled by default"

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: reorder READ_ONCE() in get_random_uXX
    random: suppress spammy warnings about unseeded randomness
    random: warn when kernel uses unseeded randomness
    net/route: use get_random_int for random counter
    net/neighbor: use get_random_u32 for 32-bit hash random
    rhashtable: use get_random_u32 for hash_rnd
    ceph: ensure RNG is seeded before using
    iscsi: ensure RNG is seeded before use
    cifs: use get_random_u32 for 32-bit lock random
    random: add get_random_{bytes,u32,u64,int,long,once}_wait family
    random: add wait_for_random_bytes() API

    Linus Torvalds
     

11 Jul, 2017

1 commit

  • bucket_table_alloc() can be currently called with GFP_KERNEL or
    GFP_ATOMIC. For the former we basically have an open coded kvzalloc()
    while the later only uses kzalloc(). Let's simplify the code a bit by
    the dropping the open coded path and replace it with kvzalloc().

    Link: http://lkml.kernel.org/r/20170531155145.17111-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Thomas Graf
    Cc: Herbert Xu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

20 Jun, 2017

1 commit


09 May, 2017

1 commit

  • alloc_bucket_locks allocation pattern is quite unusual. We are
    preferring vmalloc when CONFIG_NUMA is enabled. The rationale is that
    vmalloc will respect the memory policy of the current process and so the
    backing memory will get distributed over multiple nodes if the requester
    is configured properly. At least that is the intention, in reality
    rhastable is shrunk and expanded from a kernel worker so no mempolicy
    can be assumed.

    Let's just simplify the code and use kvmalloc helper, which is a
    transparent way to use kmalloc with vmalloc fallback, if the caller is
    allowed to block and use the flag otherwise.

    Link: http://lkml.kernel.org/r/20170306103032.2540-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Tom Herbert
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

02 May, 2017

1 commit

  • By using smaller datatypes this (rather large) struct shrinks considerably
    (80 -> 48 bytes on x86_64).

    As this is embedded in other structs, this also rerduces size of several
    others, e.g. cls_fl_head or nft_hash.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

28 Apr, 2017

1 commit

  • The commit 6d684e54690c ("rhashtable: Cap total number of entries
    to 2^31") breaks rhashtable users that do not set max_size. This
    is because when max_size is zero max_elems is also incorrectly set
    to zero instead of 2^31.

    This patch fixes it by only lowering max_elems when max_size is not
    zero.

    Fixes: 6d684e54690c ("rhashtable: Cap total number of entries to 2^31")
    Reported-by: Florian Fainelli
    Reported-by: kernel test robot
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

27 Apr, 2017

2 commits

  • When max_size is not set or if it set to a sufficiently large
    value, the nelems counter can overflow. This would cause havoc
    with the automatic shrinking as it would then attempt to fit a
    huge number of entries into a tiny hash table.

    This patch fixes this by adding max_elems to struct rhashtable
    to cap the number of elements. This is set to 2^31 as nelems is
    not a precise count. This is sufficiently smaller than UINT_MAX
    that it should be safe.

    When max_size is set max_elems will be lowered to at most twice
    max_size as is the status quo.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • no users in the tree, insecure_max_entries is always set to
    ht->p.max_size * 2 in rhtashtable_init().

    Replace only spot that uses it with a ht->p.max_size check.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

19 Apr, 2017

1 commit

  • commit 83e7e4ce9e93c3 ("mac80211: Use rhltable instead of rhashtable")
    removed the last user that made use of 'insecure_elasticity' parameter,
    i.e. the default of 16 is used everywhere.

    Replace it with a constant.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

02 Mar, 2017

1 commit


27 Feb, 2017

2 commits


18 Feb, 2017

1 commit

  • This patch adds code that handles GFP_ATOMIC kmalloc failure on
    insertion. As we cannot use vmalloc, we solve it by making our
    hash table nested. That is, we allocate single pages at each level
    and reach our desired table size by nesting them.

    When a nested table is created, only a single page is allocated
    at the top-level. Lower levels are allocated on demand during
    insertion. Therefore for each insertion to succeed, only two
    (non-consecutive) pages are needed.

    After a nested table is created, a rehash will be scheduled in
    order to switch to a vmalloced table as soon as possible. Also,
    the rehash code will never rehash into a nested table. If we
    detect a nested table during a rehash, the rehash will be aborted
    and a new rehash will be scheduled.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

20 Sep, 2016

1 commit

  • The insecure_elasticity setting is an ugly wart brought out by
    users who need to insert duplicate objects (that is, distinct
    objects with identical keys) into the same table.

    In fact, those users have a much bigger problem. Once those
    duplicate objects are inserted, they don't have an interface to
    find them (unless you count the walker interface which walks
    over the entire table).

    Some users have resorted to doing a manual walk over the hash
    table which is of course broken because they don't handle the
    potential existence of multiple hash tables. The result is that
    they will break sporadically when they encounter a hash table
    resize/rehash.

    This patch provides a way out for those users, at the expense
    of an extra pointer per object. Essentially each object is now
    a list of objects carrying the same key. The hash table will
    only see the lists so nothing changes as far as rhashtable is
    concerned.

    To use this new interface, you need to insert a struct rhlist_head
    into your objects instead of struct rhash_head. While the hash
    table is unchanged, for type-safety you'll need to use struct
    rhltable instead of struct rhashtable. All the existing interfaces
    have been duplicated for rhlist, including the hash table walker.

    One missing feature is nulls marking because AFAIK the only potential
    user of it does not need duplicate objects. Should anyone need
    this it shouldn't be too hard to add.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

07 Sep, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. Most relevant updates are the removal of per-conntrack timers to
    use a workqueue/garbage collection approach instead from Florian
    Westphal, the hash and numgen expression for nf_tables from Laura
    Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
    removal of ip_conntrack sysctl and many other incremental updates on our
    Netfilter codebase.

    More specifically, they are:

    1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
    transport area in dccp, sctp, tcp, udp and udplite protocol
    conntrackers, from Gao Feng.

    2) Missing whitespace on error message in physdev match, from Hangbin Liu.

    3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

    4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

    5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
    from Florian.

    6) Rename nf_tables set implementation to nft_set_{name}.c

    7) Introduce the hash expression to allow arbitrary hashing of selector
    concatenations, from Laura Garcia Liebana.

    8) Remove ip_conntrack sysctl backward compatibility code, this code has
    been around for long time already, and we have two interfaces to do
    this already: nf_conntrack sysctl and ctnetlink.

    9) Use nf_conntrack_get_ht() helper function whenever possible, instead
    of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

    10) Add quota expression for nf_tables.

    11) Add number generator expression for nf_tables, this supports
    incremental and random generators that can be combined with maps,
    very useful for load balancing purpose, again from Laura Garcia Liebana.

    12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

    13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
    configuration, this is used by a follow up patch to perform better chain
    update validation.

    14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
    nft_set_hash implementation to honor the NLM_F_EXCL flag.

    15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
    patch from Florian Westphal.

    16) Don't use the DYING bit to know if the conntrack event has been already
    delivered, instead a state variable to track event re-delivery
    states, also from Florian.

    17) Remove the per-conntrack timer, use the workqueue approach that was
    discussed during the NFWS, from Florian Westphal.

    18) Use the netlink conntrack table dump path to kill stale entries,
    again from Florian.

    19) Add a garbage collector to get rid of stale conntracks, from
    Florian.

    20) Reschedule garbage collector if eviction rate is high.

    21) Get rid of the __nf_ct_kill_acct() helper.

    22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

    23) Make nf_log_set() interface assertive on unsupported families.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Aug, 2016

1 commit