06 Dec, 2016

2 commits

  • It has been reported that update_suffix can be expensive when it is called
    on a large node in which most of the suffix lengths are the same. The time
    required to add 200K entries had increased from around 3 seconds to almost
    49 seconds.

    In order to address this we need to move the code for updating the suffix
    out of resize and instead just have it handled in the cases where we are
    pushing a node that increases the suffix length, or will decrease the
    suffix length.

    Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
    Reported-by: Robert Shearman
    Signed-off-by: Alexander Duyck
    Reviewed-by: Robert Shearman
    Tested-by: Robert Shearman
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • It wasn't necessary to pass a leaf in when doing the suffix updates so just
    drop it. Instead just pass the suffix and work with that.

    Since we dropped the leaf there is no need to include that in the name so
    the names are updated to node_push_suffix and node_pull_suffix.

    Finally I noticed that the logic for pulling the suffix length back
    actually had some issues. Specifically it would stop prematurely if there
    was a longer suffix, but it was not as long as the original suffix. I
    updated the code to address that in node_pull_suffix.

    Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
    Suggested-by: Robert Shearman
    Signed-off-by: Alexander Duyck
    Reviewed-by: Robert Shearman
    Tested-by: Robert Shearman
    Signed-off-by: David S. Miller

    Alexander Duyck
     

17 Nov, 2016

2 commits

  • Fix a small memory leak that can occur where we leak a fib_alias in the
    event of us not being able to insert it into the local table.

    Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse")
    Reported-by: Eric Dumazet
    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • The patch that removed the FIB offload infrastructure was a bit too
    aggressive and also removed code needed to clean up us splitting the table
    if additional rules were added. Specifically the function
    fib_trie_flush_external was called at the end of a new rule being added to
    flush the foreign trie entries from the main trie.

    I updated the code so that we only call fib_trie_flush_external on the main
    table so that we flush the entries for local from main. This way we don't
    call it for every rule change which is what was happening previously.

    Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")
    Reported-by: Eric Dumazet
    Cc: Jiri Pirko
    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     

08 Nov, 2016

1 commit

  • The display of /proc/net/route has had a couple issues due to the fact that
    when I originally rewrote most of fib_trie I made it so that the iterator
    was tracking the next value to use instead of the current.

    In addition it had an off by 1 error where I was tracking the first piece
    of data as position 0, even though in reality that belonged to the
    SEQ_START_TOKEN.

    This patch updates the code so the iterator tracks the last reported
    position and key instead of the next expected position and key. In
    addition it shifts things so that all of the leaves start at 1 instead of
    trying to report leaves starting with offset 0 as being valid. With these
    two issues addressed this should resolve any off by one errors that were
    present in the display of /proc/net/route.

    Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
    Cc: Andy Whitcroft
    Reported-by: Jason Baron
    Tested-by: Jason Baron
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

28 Sep, 2016

2 commits


10 Sep, 2016

1 commit

  • fib_table_insert() inconsistently fills the nlmsg_flags field in its
    notification messages.

    Since commit b8f558313506 ("[RTNETLINK]: Fix sending netlink message
    when replace route."), the netlink message has its nlmsg_flags set to
    NLM_F_REPLACE if the route replaced a preexisting one.

    Then commit a2bb6d7d6f42 ("ipv4: include NLM_F_APPEND flag in append
    route notifications") started setting nlmsg_flags to NLM_F_APPEND if
    the route matched a preexisting one but was appended.

    In other cases (exclusive creation or prepend), nlmsg_flags is 0.

    This patch sets ->nlmsg_flags in all situations, preserving the
    semantic of the NLM_F_* bits:

    * NLM_F_CREATE: a new fib entry has been created for this route.
    * NLM_F_EXCL: no other fib entry existed for this route.
    * NLM_F_REPLACE: this route has overwritten a preexisting fib entry.
    * NLM_F_APPEND: the new fib entry was added after other entries for
    the same route.

    As a result, the possible flag combination can now be reported
    (iproute2's terminology into parentheses):

    * NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
    ("add").
    * NLM_F_CREATE | NLM_F_APPEND: route did already exist, new route
    added after preexisting ones ("append").
    * NLM_F_CREATE: route did already exist, new route added before
    preexisting ones ("prepend").
    * NLM_F_REPLACE: route did already exist, new route replaced the
    first preexisting one ("change").

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

19 Aug, 2016

1 commit


06 Aug, 2016

1 commit

  • Panic occurs when issuing "cat /proc/net/route" whilst
    populating FIB with > 1M routes.

    Use of cached node pointer in fib_route_get_idx is unsafe.

    BUG: unable to handle kernel paging request at ffffc90001630024
    IP: [] leaf_walk_rcu+0x10/0xe0
    PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
    snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
    acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
    tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
    CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
    RIP: 0010:[] [] leaf_walk_rcu+0x10/0xe0
    RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202
    RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
    RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
    RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
    R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
    R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
    FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
    Stack:
    ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
    ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
    ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
    Call Trace:
    [] ? fib_route_seq_start+0x93/0xc0
    [] ? seq_read+0x149/0x380
    [] ? fsnotify+0x3b4/0x500
    [] ? process_echoes+0x70/0x70
    [] ? proc_reg_read+0x47/0x70
    [] ? __vfs_read+0x23/0xd0
    [] ? rw_verify_area+0x52/0xf0
    [] ? vfs_read+0x81/0x120
    [] ? SyS_read+0x42/0xa0
    [] ? entry_SYSCALL_64_fastpath+0x16/0x75
    Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
    a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
    RIP [] leaf_walk_rcu+0x10/0xe0
    RSP
    CR2: ffffc90001630024

    Signed-off-by: Dave Forster
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    David Forster
     

02 Feb, 2016

1 commit

  • Pull networking fixes from David Miller:
    "This looks like a lot but it's a mixture of regression fixes as well
    as fixes for longer standing issues.

    1) Fix on-channel cancellation in mac80211, from Johannes Berg.

    2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
    module, from Eric Dumazet.

    3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
    Dumazet.

    4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
    bound, from Craig Gallek.

    5) GRO key comparisons don't take lightweight tunnels into account,
    from Jesse Gross.

    6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
    Dumazet.

    7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
    register them, otherwise the NEWLINK netlink message is missing
    the proper attributes. From Thadeu Lima de Souza Cascardo.

    8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
    Schimmel

    9) Handle fragments properly in ipv4 easly socket demux, from Eric
    Dumazet.

    10) Don't ignore the ifindex key specifier on ipv6 output route
    lookups, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
    tcp: avoid cwnd undo after receiving ECN
    irda: fix a potential use-after-free in ircomm_param_request
    net: tg3: avoid uninitialized variable warning
    net: nb8800: avoid uninitialized variable warning
    net: vxge: avoid unused function warnings
    net: bgmac: clarify CONFIG_BCMA dependency
    net: hp100: remove unnecessary #ifdefs
    net: davinci_cpdma: use dma_addr_t for DMA address
    ipv6/udp: use sticky pktinfo egress ifindex on connect()
    ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
    netlink: not trim skb for mmaped socket when dump
    vxlan: fix a out of bounds access in __vxlan_find_mac
    net: dsa: mv88e6xxx: fix port VLAN maps
    fib_trie: Fix shift by 32 in fib_table_lookup
    net: moxart: use correct accessors for DMA memory
    ipv4: ipconfig: avoid unused ic_proto_used symbol
    bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
    bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
    bnxt_en: Ring free response from close path should use completion ring
    net_sched: drr: check for NULL pointer in drr_dequeue
    ...

    Linus Torvalds
     

30 Jan, 2016

1 commit

  • The fib_table_lookup function had a shift by 32 that triggered a UBSAN
    warning. This was due to the fact that I had placed the shift first and
    then followed it with the check for the suffix length to ignore the
    undefined behavior. If we reorder this so that we verify the suffix is
    less than 32 before shifting the value we can avoid the issue.

    Reported-by: Toralf Förster
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

23 Jan, 2016

1 commit

  • There are many locations that do

    if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
    else
    kfree(ptr);

    but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
    using is_vmalloc_addr(). Unless callers have special reasons, we can
    replace this branch with kvfree(). Please check and reply if you found
    problems.

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Acked-by: Jan Kara
    Acked-by: Russell King
    Reviewed-by: Andreas Dilger
    Acked-by: "Rafael J. Wysocki"
    Acked-by: David Rientjes
    Cc: "Luck, Tony"
    Cc: Oleg Drokin
    Cc: Boris Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

28 Oct, 2015

1 commit

  • We were computing the child index in cases where the key value we were
    looking for was actually less than the base key of the tnode. As a result
    we were getting incorrect index values that would cause us to skip over
    some children.

    To fix this I have added a test that will force us to use child index 0 if
    the key we are looking for is less than the key of the current tnode.

    Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
    Reported-by: Brian Rak
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

18 Sep, 2015

1 commit

  • Steffen reported that the recent change to add oif to dst lookups breaks
    the VTI use case. The problem is that with the oif set in the flow struct
    the comparison to the nh_oif is triggered. Fix by splitting the
    FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
    bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
    nh oif (FLOWI_FLAG_SKIP_NH_OIF).

    Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")

    Signed-off-by: David Ahern
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    David Ahern
     

30 Aug, 2015

1 commit


22 Aug, 2015

1 commit


14 Aug, 2015

2 commits

  • As with ingress use the index of VRF master device for route lookups on
    egress. However, the oif should only be used to direct the lookups to a
    specific table. Routes in the table are not based on the VRF device but
    rather interfaces that are part of the VRF so do not consider the oif for
    lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
    latter part.

    Signed-off-by: Shrijeet Mukherjee
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • When generating /proc/net/route we emit a header followed by a line for
    each route. When a short read is performed we will restart this process
    based on the open file descriptor. When calculating the start point we
    fail to take into account that the 0th entry is the header. This leads
    us to skip the first entry when doing a continuation read.

    This can be easily seen with the comparison below:

    while read l; do echo "$l"; done A
    cat /proc/net/route >B
    diff -bu A B | grep '^[+-]'

    On my example machine I have approximatly 10KB of route output. There we
    see the very first non-title element is lost in the while read case,
    and an entry around the 8K mark in the cat case:

    +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
    -tun1 00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0

    Fix up the off-by-one when reaquiring position on continuation.

    Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
    BugLink: http://bugs.launchpad.net/bugs/1483440
    Acked-by: Alexander Duyck
    Signed-off-by: Andy Whitcroft
    Signed-off-by: David S. Miller

    Andy Whitcroft
     

28 Jul, 2015

1 commit

  • It was reported that update_suffix was taking a long time on systems where
    a large number of leaves were attached to a single node. As it turns out
    fib_table_flush was calling update_suffix for each leaf that didn't have all
    of the aliases stripped from it. As a result, on this large node removing
    one leaf would result in us calling update_suffix for every other leaf on
    the node.

    The fix is to just remove the calls to leaf_pull_suffix since they are
    redundant as we already have a call in resize that will go through and
    update the suffix length for the node before we exit out of
    fib_table_flush or fib_table_flush_external.

    Reported-by: David Ahern
    Signed-off-by: Alexander Duyck
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Alexander Duyck
     

25 Jul, 2015

1 commit

  • fib_select_default considers alternative routes only when
    res->fi is for the first alias in res->fa_head. In the
    common case this can happen only when the initial lookup
    matches the first alias with highest TOS value. This
    prevents the alternative routes to require specific TOS.

    This patch solves the problem as follows:

    - routes that require specific TOS should be returned by
    fib_select_default only when TOS matches, as already done
    in fib_table_lookup. This rule implies that depending on the
    TOS we can have many different lists of alternative gateways
    and we have to keep the last used gateway (fa_default) in first
    alias for the TOS instead of using single tb_default value.

    - as the aliases are ordered by many keys (TOS desc,
    fib_priority asc), we restrict the possible results to
    routes with matching TOS and lowest metric (fib_priority)
    and routes that match any TOS, again with lowest metric.

    For example, packet with TOS 8 can not use gw3 (not lowest
    metric), gw4 (different TOS) and gw6 (not lowest metric),
    all other gateways can be used:

    tos 8 via gw1 metric 2 fa_head and res->fi
    tos 8 via gw2 metric 2
    tos 8 via gw3 metric 3
    tos 4 via gw4
    tos 0 via gw5
    tos 0 via gw6 metric 1

    Reported-by: Hagen Paul Pfeifer
    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

24 Jun, 2015

1 commit

  • This feature is only enabled with the new per-interface or ipv4 global
    sysctls called 'ignore_routes_with_linkdown'.

    net.ipv4.conf.all.ignore_routes_with_linkdown = 0
    net.ipv4.conf.default.ignore_routes_with_linkdown = 0
    net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
    ...

    When the above sysctls are set, will report to userspace that a route is
    dead and will no longer resolve to this nexthop when performing a fib
    lookup. This will signal to userspace that the route will not be
    selected. The signalling of a RTNH_F_DEAD is only passed to userspace
    if the sysctl is enabled and link is down. This was done as without it
    the netlink listeners would have no idea whether or not a nexthop would
    be selected. The kernel only sets RTNH_F_DEAD internally if the
    interface has IFF_UP cleared.

    With the new sysctl set, the following behavior can be observed
    (interface p8p1 is link-down):

    default via 10.0.5.2 dev p9p1
    10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
    70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
    80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
    90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
    90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
    90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
    cache
    local 80.0.0.1 dev lo src 80.0.0.1
    cache
    80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
    cache

    While the route does remain in the table (so it can be modified if
    needed rather than being wiped away as it would be if IFF_UP was
    cleared), the proper next-hop is chosen automatically when the link is
    down. Now interface p8p1 is linked-up:

    default via 10.0.5.2 dev p9p1
    10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
    70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
    80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
    90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
    90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
    192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
    90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
    cache
    local 80.0.0.1 dev lo src 80.0.0.1
    cache
    80.0.0.2 dev p8p1 src 80.0.0.1
    cache

    and the output changes to what one would expect.

    If the sysctl is not set, the following output would be expected when
    p8p1 is down:

    default via 10.0.5.2 dev p9p1
    10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
    70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
    80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
    90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
    90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2

    Since the dead flag does not appear, there should be no expectation that
    the kernel would skip using this route due to link being down.

    v2: Split kernel changes into 2 patches, this actually makes a
    behavioral change if the sysctl is set. Also took suggestion from Alex
    to simplify code by only checking sysctl during fib lookup and
    suggestion from Scott to add a per-interface sysctl.

    v3: Code clean-ups to make it more readable and efficient as well as a
    reverse path check fix.

    v4: Drop binary sysctl

    v5: Whitespace fixups from Dave

    v6: Style changes from Dave and checkpatch suggestions

    v7: One more checkpatch fixup

    Signed-off-by: Andy Gospodarek
    Signed-off-by: Dinesh Dutt
    Acked-by: Scott Feldman
    Signed-off-by: David S. Miller

    Andy Gospodarek
     

22 Jun, 2015

1 commit

  • This patch adds NLM_F_APPEND flag to struct nlmsg_hdr->nlmsg_flags
    in newroute notifications if the route add was an append.
    (This is similar to how NLM_F_REPLACE is already part of new
    route replace notifications today)

    This helps userspace determine if the route add operation was
    an append.

    Signed-off-by: Roopa Prabhu
    Acked-by: Scott Feldman
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

08 Jun, 2015

1 commit

  • As Alexander Duyck pointed out that:
    struct tnode {
    ...
    struct key_vector kv[1];
    }
    The kv[1] member of struct tnode is an arry that refernced by
    a null pointer will not crash the system, like this:
    struct tnode *p = NULL;
    struct key_vector *kv = p->kv;
    As such p->kv doesn't actually dereference anything, it is simply a
    means for getting the offset to the array from the pointer p.

    This patch make the code more regular to avoid making people feel
    odd when they look at the code.

    Signed-off-by: Firo Yang
    Signed-off-by: David S. Miller

    Firo Yang
     

27 May, 2015

1 commit

  • We used to get this indirectly I supposed, but no longer do.

    Either way, an explicit include should have been done in the
    first place.

    net/ipv4/fib_trie.c: In function '__node_free_rcu':
    >> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
    vfree(n);
    ^
    net/ipv4/fib_trie.c: In function 'tnode_alloc':
    >> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
    return vzalloc(size);
    ^
    >> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer without a cast
    cc1: some warnings being treated as errors

    Reported-by: kbuild test robot
    Signed-off-by: David S. Miller

    David S. Miller
     

23 May, 2015

2 commits

  • Conflicts:
    drivers/net/ethernet/cadence/macb.c
    drivers/net/phy/phy.c
    include/linux/skbuff.h
    net/ipv4/tcp.c
    net/switchdev/switchdev.c

    Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
    renaming overlapping with net-next changes of various
    sorts.

    phy.c was a case of two changes, one adding a local
    variable to a function whilst the second was removing
    one.

    tcp.c overlapped a deadlock fix with the addition of new tcp_info
    statistic values.

    macb.c involved the addition of two zyncq device entries.

    skbuff.h involved adding back ipv4_daddr to nf_bridge_info
    whilst net-next changes put two other existing members of
    that struct into a union.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When replacing an IPv4 route, tb_id member of the new fib_alias
    structure is not set in the replace code path so that the new route is
    ignored.

    Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
    Signed-off-by: Michal Kubecek
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Michal Kubeček
     

15 May, 2015

1 commit


13 May, 2015

1 commit


04 Apr, 2015

2 commits

  • The ipv4 code uses a mixture of coding styles. In some instances check
    for non-NULL pointer is done as x != NULL and sometimes as x. x is
    preferred according to checkpatch and this patch makes the code
    consistent by adopting the latter form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     
  • The ipv4 code uses a mixture of coding styles. In some instances check
    for NULL pointer is done as x == NULL and sometimes as !x. !x is
    preferred according to checkpatch and this patch makes the code
    consistent by adopting the latter form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

24 Mar, 2015

1 commit

  • When I updated the code to address a possible null pointer dereference in
    resize I ended up reverting an exception handling fix for the suffix length
    in the event that inflate or halve failed. This change is meant to correct
    that by reverting the earlier fix and instead simply getting the parent
    again after inflate has been completed to avoid the possible null pointer
    issue.

    Fixes: ddb4b9a13 ("fib_trie: Address possible NULL pointer dereference in resize")
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

13 Mar, 2015

1 commit

  • This change makes it so that we should always have a deterministic ordering
    for the main and local aliases within the merged table when two leaves
    overlap.

    So for example if we have a leaf with a key of 192.168.254.0. If we
    previously added two aliases with a prefix length of 24 from both local and
    main the first entry would be first and the second would be second. When I
    was coding this I had added a WARN_ON should such a situation occur as I
    wasn't sure how likely it would be. However this WARN_ON has been
    triggered so this is something that should be addressed.

    With this patch the ordering of the aliases is as follows. First they are
    sorted on prefix length, then on their table ID, then tos, and finally
    priority. This way what we end up doing is essentially interleaving the
    two tables on what used to be leaf_info structure boundaries.

    Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
    Reported-by: Eric Dumazet
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

12 Mar, 2015

2 commits

  • When we merged the tries for local and main I had overlooked the iterator
    for /proc/net/route. As a result it was outputting both local and main
    when the two tries were merged.

    This patch resolves that by only providing output for aliases that are
    actually in the main trie. As a result we should go back to the original
    behavior which I assume will be necessary to maintain legacy support.

    Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch is meant to collapse local and main into one by converting
    tb_data from an array to a pointer. Doing this allows us to point the
    local table into the main while maintaining the same variables in the
    table.

    As such the tb_data was converted from an array to a pointer, and a new
    array called data is added in order to still provide an object for tb_data
    to point to.

    In order to track the origin of the fib aliases a tb_id value was added in
    a hole that existed on 64b systems. Using this we can also reverse the
    merge in the event that custom FIB rules are enabled.

    With this patch I am seeing an improvement of 20ns to 30ns for routing
    lookups as long as custom rules are not enabled, with custom rules enabled
    we fall back to split tables and the original behavior.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

11 Mar, 2015

2 commits

  • If the inflate call failed it would return NULL. As a result tp would be
    set to NULL and cause use to trigger a NULL pointer dereference in
    should_halve if the inflate failed on the first attempt.

    In order to prevent this we should decrement max_work before we actually
    attempt to inflate as this will force us to exit before attempting to halve
    a node we should have inflated. In order to keep things symmetric between
    inflate and halve I went ahead and also moved the decrement of max_work for
    the halve case as well so we take care of that before we actually attempt
    to halve the tnode.

    Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize")
    Reported-by: Dan Carpenter
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • In the case of a trie that had no tnodes with a key of 0 the initial
    look-up would fail resulting in an out-of-bounds cindex on the first tnode.
    This resulted in an entire trie being skipped.

    In order resolve this I have updated the cindex logic in the initial
    look-up so that if the key is zero we will always traverse the child zero
    path.

    Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
    Reported-by: Sabrina Dubroca
    Signed-off-by: Alexander Duyck
    Tested-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Alexander Duyck
     

10 Mar, 2015

1 commit

  • Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
    to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
    and append commands correctly.

    Suggested-by: Jamal Hadi Salim
    Suggested-by: Roopa Prabhu
    Signed-off-by: Scott Feldman
    Reviewed-by: Simon Horman
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Scott Feldman
     

07 Mar, 2015

2 commits