04 Jan, 2015

6 commits

  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • The removal function of nft_hash currently stores a reference to the
    previous element during lookup which is used to optimize removal later
    on. This was possible because a lock is held throughout calling
    rhashtable_lookup() and rhashtable_remove().

    With the introdution of deferred table resizing in parallel to lookups
    and insertions, the nftables lock will no longer synchronize all
    table mutations and the stored pprev may become invalid.

    Removing this optimization makes removal slightly more expensive on
    average but allows taking the resize cost out of the insert and
    remove path.

    Signed-off-by: Thomas Graf
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Subsequent patches will require access to the bucket tail. Access
    to the tail is relatively cheap as the automatic resizing of the
    table should keep the number of entries per bucket to no more
    than 0.75 on average.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This patch is in preparation to introduce per bucket spinlocks. It
    extends all iterator macros to take the bucket table and bucket
    index. It also introduces a new rht_dereference_bucket() to
    handle protected accesses to buckets.

    It introduces a barrier() to the RCU iterators to the prevent
    the compiler from caching the first element.

    The lockdep verifier is introduced as stub which always succeeds
    and properly implement in the next patch when the locks are
    introduced.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Hash the key inside of rhashtable_lookup_compare() like
    rhashtable_lookup() does. This allows to simplify the hashing
    functions and keep them private.

    Signed-off-by: Thomas Graf
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: David S. Miller

    Thomas Graf
     

03 Jan, 2015

21 commits


01 Jan, 2015

13 commits

  • Roger Chen says:

    ====================
    support GMAC driver for RK3288

    Roger Chen (6):
    patch1: add driver for Rockchip RK3288 SoCs integrated GMAC
    patch2: define clock ID used for GMAC
    patch3: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
    patch4: dts: rockchip: add gmac info for rk3288
    patch5: dts: rockchip: enable gmac on RK3288 evb board
    patch6: add document for Rockchip RK3288 GMAC

    Tested on rk3288 evb board:
    Execute the following command to enable ethernet,
    set local IP and ping a remote host.

    busybox ifconfig eth0 up
    busybox ifconfig eth0 192.168.1.111
    ping 192.168.1.1
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The document descripts how to add properties for GMAC in device tree.

    change since v2:

    1. remove power-gpio, reset-gpio, phyirq-gpio, pmu_regulator setting
    2. add "snps,reset-gpio", "snps,reset-active-low;" "snps,reset-delays-us"

    Signed-off-by: Roger Chen
    Signed-off-by: David S. Miller

    Roger Chen
     
  • enable gmac in rk3288-evb-rk808.dts

    changes since v2:
    1. add fixed regulator for PHY
    2. remove power-gpio, reset-gpio, phyirq-gpio, pmu_regulator setting
    3. add "snps,reset-gpio", "snps,reset-active-low;" "snps,reset-delays-us"

    Signed-off-by: Roger Chen
    Signed-off-by: David S. Miller

    Roger Chen
     
  • add gmac info in rk3288.dtsi for GMAC driver

    changes since v2:
    1. add drive-strength in the pinctrl settings

    Signed-off-by: Roger Chen
    Signed-off-by: David S. Miller

    Roger Chen
     
  • modify CRU config for GMAC driver

    changes since v2:
    1. remove SCLK_MAC_PLL

    Signed-off-by: Roger Chen
    Signed-off-by: David S. Miller

    Roger Chen
     
  • changes since v2:
    1. remove SCLK_MAC_PLL

    Signed-off-by: Roger Chen
    Signed-off-by: David S. Miller

    Roger Chen
     
  • This driver is based on stmmac driver.

    changes since v2:
    - use tab instead of space for macros
    - use HIWORD_UPDATE macro for GMAC_CLK_RX_DL_CFG and GMAC_CLK_TX_DL_CFG
    - remove drive-strength setting in the driver and set it in the pinctrl settings
    - use dev_err instead of pr_err
    - remove clock names's macros, just use the real name of the clock
    - use devm_clk_get() instead of clk_get()
    - remove clk_set_parent(bsp_priv->clk_mac, bsp_priv->clk_mac_pll)
    - remove gpio setting for LDO, just use regulator API
    - remove phy reset using gpio in the glue layer, it has been handled in the stmmac driver
    - remove handling phy interrupt (mii interrupt)

    changes since v1:
    - use BIT() to set register
    - combine two remap_write() operations into one for the same register
    - use macros for register value setting
    - remove grf fail check in rk_gmac_setup() and save all the check in set_rgmii_speed()
    - remove .tx_coe=1 in rk_gmac_data

    Signed-off-by: Roger Chen
    Signed-off-by: David S. Miller

    Roger Chen
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Alexander Duyck says:

    ====================
    fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%

    These patches are meant to address several performance issues I have seen
    in the fib_trie implementation, and fib_table_lookup specifically. With
    these changes in place I have seen a reduction of up to 35 to 75% for the
    total time spent in fib_table_lookup depending on the type of search being
    performed.

    On a VM running in my Corei7-4930K system with a trie of maximum depth of 7
    this resulted in a reduction of over 370ns per packet in the total time to
    process packets received from an ixgbe interface and route them to a dummy
    interface. This represents a failed lookup in the local trie followed by
    a successful search in the main trie.

    Baseline Refactor
    ixgbe->dummy routing 1.20Mpps 2.21Mpps
    ------------------------------------------------------------
    processing time per packet 835ns 453ns
    fib_table_lookup 50.1% 418ns 25.0% 113ns
    check_leaf.isra.9 7.9% 66ns -- --
    ixgbe_clean_rx_irq 5.3% 44ns 9.8% 44ns
    ip_route_input_noref 2.9% 25ns 4.6% 21ns
    pvclock_clocksource_read 2.6% 21ns 4.6% 21ns
    ip_rcv 2.6% 22ns 4.0% 18ns

    In the simple case of receiving a frame and dropping it before it can reach
    the socket layer I saw a reduction of 40ns per packet. This represents a
    trip through the local trie with the correct leaf found with no need for
    any backtracing.

    Baseline Refactor
    ixgbe->local receive 2.65Mpps 2.96Mpps
    ------------------------------------------------------------
    processing time per packet 377ns 337ns
    fib_table_lookup 25.1% 95ns 25.8% 87ns
    ixgbe_clean_rx_irq 8.7% 33ns 9.0% 30ns
    check_leaf.isra.9 7.2% 27ns -- --
    ip_rcv 5.7% 21ns 6.5% 22ns

    These changes have resulted in several functions being inlined such as
    check_leaf and fib_find_node, but due to the code simplification the
    overall size of the code has been reduced.

    text data bss dec hex filename
    16932 376 16 17324 43ac net/ipv4/fib_trie.o - before
    15259 376 8 15643 3d1b net/ipv4/fib_trie.o - after

    Changes since RFC:
    Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
    Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This change adds a tracking value for the maximum suffix length of all
    prefixes stored in any given tnode. With this value we can determine if we
    need to backtrace or not based on if the suffix is greater than the pos
    value.

    By doing this we can reduce the CPU overhead for lookups in the local table
    as many of the prefixes there are 32b long and have a suffix length of 0
    meaning we can immediately backtrace to the root node without needing to
    test any of the nodes between it and where we ended up.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • For some reason the compiler doesn't seem to understand that when we are in
    a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
    of tn->bits to change. As such every call to tnode_get_child was rerunning
    tnode_chile_length which ended up consuming quite a bit of space in the
    resultant assembly code.

    I have gone though and verified that in all cases where tnode_get_child
    is used we are either winding though a fixed loop from tnode_child_length -
    1 to 0, or are in a fastpath case where we are verifying the value by
    either checking for any remaining bits after shifting index by bits and
    testing for leaf, or by using tnode_child_length.

    size net/ipv4/fib_trie.o
    Before:
    text data bss dec hex filename
    15506 376 8 15890 3e12 net/ipv4/fib_trie.o

    After:
    text data bss dec hex filename
    14827 376 8 15211 3b6b net/ipv4/fib_trie.o

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This change pulls the node_set_parent functionality out of put_child_reorg
    and instead leaves that to the function to take care of as well. By doing
    this we can fully construct the new cluster of tnodes and all of the
    pointers out of it before we start routing pointers into it.

    I am suspecting this will likely fix some concurency issues though I don't
    have a good test to show as such.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck