30 Mar, 2018

2 commits

  • Add checking to call to call_fib_entry_notifiers for IPv4 route replace.
    Allows a notifier handler to fail the replace.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     
  • Move call to call_fib_entry_notifiers for new IPv4 routes to right
    before the call to fib_insert_alias. At this point the only remaining
    failure path is memory allocations in fib_insert_node. Handle that
    very unlikely failure with a call to call_fib_entry_notifiers to
    tell drivers about it.

    At this point notifier handlers can decide the fate of the new route
    with a clean path to delete the potential new entry if the notifier
    returns non-0.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

27 Mar, 2018

1 commit

  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

27 Feb, 2018

1 commit


17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

01 Nov, 2017

1 commit

  • Add extack to fib_notifier_info and plumb through stack to
    call_fib_rule_notifiers, call_fib_entry_notifiers and
    call_fib6_entry_notifiers. This allows notifer handlers to
    return messages to user.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

20 Oct, 2017

1 commit


24 Aug, 2017

1 commit

  • Now when ipv4 route inserts a fib_info, it memcmp fib_metrics.
    It means ipv4 route identifies one route also with metrics.

    But when removing a route, it tries to find the route without
    caring about the metrics. It will cause that the route with
    right metrics can't be removed.

    Thomas noticed this issue when doing the testing:

    1. add:
    # ip route append 192.168.7.0/24 dev v window 1000
    # ip route append 192.168.7.0/24 dev v window 1001
    # ip route append 192.168.7.0/24 dev v window 1002
    # ip route append 192.168.7.0/24 dev v window 1003
    2. delete:
    # ip route delete 192.168.7.0/24 dev v window 1002
    3. show:
    192.168.7.0/24 proto boot scope link window 1001
    192.168.7.0/24 proto boot scope link window 1002
    192.168.7.0/24 proto boot scope link window 1003

    The one with window 1002 wasn't deleted but the first one was.

    This patch is to do metrics match when looking up and deleting
    one route.

    Reported-by: Thomas Haller
    Signed-off-by: Xin Long
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Xin Long
     

04 Aug, 2017

1 commit

  • The FIB notification chain is currently soley used by IPv4 code.
    However, we're going to introduce IPv6 FIB offload support, which
    requires these notification as well.

    As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when
    registering FIB notifier"), upon registration to the chain, the callee
    receives a full dump of the FIB tables and rules by traversing all the
    net namespaces. The integrity of the dump is ensured by a per-namespace
    sequence counter that is incremented whenever a change to the tables or
    rules occurs.

    In order to allow more address families to use the chain, each family is
    expected to register its fib_notifier_ops in its pernet init. These
    operations allow the common code to read the family's sequence counter
    as well as dump its tables and rules in the given net namespace.

    Additionally, a 'family' parameter is added to sent notifications, so
    that listeners could distinguish between the different families.

    Implement the common code that allows listeners to register to the chain
    and for address families to register their fib_notifier_ops. Subsequent
    patches will implement these operations in IPv6.

    In the future, ipmr and ip6mr will be extended to provide these
    notifications as well.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

04 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

30 May, 2017

3 commits


27 May, 2017

1 commit


23 May, 2017

1 commit


17 May, 2017

1 commit

  • In general, rtnetlink dumps do not anticipate failure to dump a single
    object (e.g., link or route) on a single pass. As both route and link
    objects have grown via more attributes, that is no longer a given.

    netlink dumps can handle a failure if the dump function returns an
    error; specifically, netlink_dump adds the return code to the response
    if it is len != 0). IPv6 route dumps
    (rt6_dump_route) already return the error; this patch updates IPv4 and
    link dumps. Other dump functions may need to be ajusted as well.

    Reported-by: Jan Moskyto Matejka
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

11 Mar, 2017

2 commits

  • We always pass the same event type to fib_notify() and
    fib_rules_notify(), so we can safely drop this argument.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Most of the code concerned with the FIB notification chain currently
    resides in fib_trie.c, but this isn't really appropriate, as the FIB
    notification chain is also used for FIB rules.

    Therefore, it makes sense to move the common FIB notification code to a
    separate file and have it export the relevant functions, which can be
    invoked by its different users (e.g., fib_trie.c, fib_rules.c).

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

28 Feb, 2017

1 commit

  • Now that %z is standartised in C99 there is no reason to support %Z.
    Unlike %L it doesn't even make format strings smaller.

    Use BUILD_BUG_ON in a couple ATM drivers.

    In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
    is in my opinion is quite an achievement. Hopefully this patch inspires
    someone else to trim vsprintf.c more.

    Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Feb, 2017

4 commits

  • The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
    flags to signal routes being replaced or appended.

    Instead of using netlink flags for in-kernel notifications we can simply
    introduce two new events in the FIB notification chain. This has the
    added advantage of making the API cleaner, thereby making it clear that
    these events should be supported by listeners of the notification chain.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD
    notification is sent after the reference on the previous FIB info was
    dropped. This is problematic as potential listeners might need to access
    it in their notification blocks.

    Solve this by sending the notification prior to the deletion of the
    replaced FIB alias. This is consistent with ENTRY_DEL notifications.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a FIB alias is removed, a notification is sent using the type
    passed from user space - can be RTN_UNSPEC - instead of the actual type
    of the removed alias. This is problematic for listeners of the FIB
    notification chain, as several FIB aliases can exist with matching
    parameters, but the type.

    Solve this by passing the actual type of the removed FIB alias.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • In case the MAIN table is flushed and its trie is shared with the LOCAL
    table, then we might be flushing FIB aliases belonging to the latter.
    This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
    ID.

    The above doesn't affect current listeners, as the table ID is ignored
    during entry deletion, but this will change later in the patchset.

    When flushing a particular table, skip any aliases belonging to a
    different one.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    CC: Alexander Duyck
    CC: Patrick McHardy
    Reviewed-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Ido Schimmel
     

25 Dec, 2016

1 commit


07 Dec, 2016

1 commit


06 Dec, 2016

2 commits

  • It has been reported that update_suffix can be expensive when it is called
    on a large node in which most of the suffix lengths are the same. The time
    required to add 200K entries had increased from around 3 seconds to almost
    49 seconds.

    In order to address this we need to move the code for updating the suffix
    out of resize and instead just have it handled in the cases where we are
    pushing a node that increases the suffix length, or will decrease the
    suffix length.

    Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
    Reported-by: Robert Shearman
    Signed-off-by: Alexander Duyck
    Reviewed-by: Robert Shearman
    Tested-by: Robert Shearman
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • It wasn't necessary to pass a leaf in when doing the suffix updates so just
    drop it. Instead just pass the suffix and work with that.

    Since we dropped the leaf there is no need to include that in the name so
    the names are updated to node_push_suffix and node_pull_suffix.

    Finally I noticed that the logic for pulling the suffix length back
    actually had some issues. Specifically it would stop prematurely if there
    was a longer suffix, but it was not as long as the original suffix. I
    updated the code to address that in node_pull_suffix.

    Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
    Suggested-by: Robert Shearman
    Signed-off-by: Alexander Duyck
    Reviewed-by: Robert Shearman
    Tested-by: Robert Shearman
    Signed-off-by: David S. Miller

    Alexander Duyck
     

04 Dec, 2016

3 commits

  • Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
    introduced a new notification chain to notify listeners (f.e., switchdev
    drivers) about addition and deletion of routes.

    However, upon registration to the chain the FIB tables can already be
    populated, which means potential listeners will have an incomplete view
    of the tables.

    Solve that by dumping the FIB tables and replaying the events to the
    passed notification block. The dump itself is done using RCU in order
    not to starve consumers that need RTNL to make progress.

    The integrity of the dump is ensured by reading the FIB change sequence
    counter before and after the dump under RTNL. This allows us to avoid
    the problematic situation in which the dumping process sends a ENTRY_ADD
    notification following ENTRY_DEL generated by another process holding
    RTNL.

    Callers of the registration function may pass a callback that is
    executed in case the dump was inconsistent with current FIB tables.

    The number of retries until a consistent dump is achieved is set to a
    fixed number to prevent callers from looping for long periods of time.
    In case current limit proves to be problematic in the future, it can be
    easily converted to be configurable using a sysctl.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • The next patch will enable listeners of the FIB notification chain to
    request a dump of the FIB tables. However, since RTNL isn't taken during
    the dump, it's possible for the FIB tables to change mid-dump, which
    will result in inconsistency between the listener's table and the
    kernel's.

    Allow listeners to know about changes that occurred mid-dump, by adding
    a change sequence counter to each net namespace. The counter is
    incremented just before a notification is sent in the FIB chain.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • In order not to hold RTNL for long periods of time we're going to dump
    the FIB tables using RCU.

    Convert the FIB notification chain to be atomic, as we can't block in
    RCU critical sections.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

17 Nov, 2016

2 commits

  • Fix a small memory leak that can occur where we leak a fib_alias in the
    event of us not being able to insert it into the local table.

    Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse")
    Reported-by: Eric Dumazet
    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • The patch that removed the FIB offload infrastructure was a bit too
    aggressive and also removed code needed to clean up us splitting the table
    if additional rules were added. Specifically the function
    fib_trie_flush_external was called at the end of a new rule being added to
    flush the foreign trie entries from the main trie.

    I updated the code so that we only call fib_trie_flush_external on the main
    table so that we flush the entries for local from main. This way we don't
    call it for every rule change which is what was happening previously.

    Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")
    Reported-by: Eric Dumazet
    Cc: Jiri Pirko
    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     

08 Nov, 2016

1 commit

  • The display of /proc/net/route has had a couple issues due to the fact that
    when I originally rewrote most of fib_trie I made it so that the iterator
    was tracking the next value to use instead of the current.

    In addition it had an off by 1 error where I was tracking the first piece
    of data as position 0, even though in reality that belonged to the
    SEQ_START_TOKEN.

    This patch updates the code so the iterator tracks the last reported
    position and key instead of the next expected position and key. In
    addition it shifts things so that all of the leaves start at 1 instead of
    trying to report leaves starting with offset 0 as being valid. With these
    two issues addressed this should resolve any off by one errors that were
    present in the display of /proc/net/route.

    Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
    Cc: Andy Whitcroft
    Reported-by: Jason Baron
    Tested-by: Jason Baron
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

28 Sep, 2016

2 commits


10 Sep, 2016

1 commit

  • fib_table_insert() inconsistently fills the nlmsg_flags field in its
    notification messages.

    Since commit b8f558313506 ("[RTNETLINK]: Fix sending netlink message
    when replace route."), the netlink message has its nlmsg_flags set to
    NLM_F_REPLACE if the route replaced a preexisting one.

    Then commit a2bb6d7d6f42 ("ipv4: include NLM_F_APPEND flag in append
    route notifications") started setting nlmsg_flags to NLM_F_APPEND if
    the route matched a preexisting one but was appended.

    In other cases (exclusive creation or prepend), nlmsg_flags is 0.

    This patch sets ->nlmsg_flags in all situations, preserving the
    semantic of the NLM_F_* bits:

    * NLM_F_CREATE: a new fib entry has been created for this route.
    * NLM_F_EXCL: no other fib entry existed for this route.
    * NLM_F_REPLACE: this route has overwritten a preexisting fib entry.
    * NLM_F_APPEND: the new fib entry was added after other entries for
    the same route.

    As a result, the possible flag combination can now be reported
    (iproute2's terminology into parentheses):

    * NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
    ("add").
    * NLM_F_CREATE | NLM_F_APPEND: route did already exist, new route
    added after preexisting ones ("append").
    * NLM_F_CREATE: route did already exist, new route added before
    preexisting ones ("prepend").
    * NLM_F_REPLACE: route did already exist, new route replaced the
    first preexisting one ("change").

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

19 Aug, 2016

1 commit


06 Aug, 2016

1 commit

  • Panic occurs when issuing "cat /proc/net/route" whilst
    populating FIB with > 1M routes.

    Use of cached node pointer in fib_route_get_idx is unsafe.

    BUG: unable to handle kernel paging request at ffffc90001630024
    IP: [] leaf_walk_rcu+0x10/0xe0
    PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
    snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
    acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
    tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
    CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
    RIP: 0010:[] [] leaf_walk_rcu+0x10/0xe0
    RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202
    RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
    RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
    RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
    R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
    R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
    FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
    Stack:
    ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
    ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
    ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
    Call Trace:
    [] ? fib_route_seq_start+0x93/0xc0
    [] ? seq_read+0x149/0x380
    [] ? fsnotify+0x3b4/0x500
    [] ? process_echoes+0x70/0x70
    [] ? proc_reg_read+0x47/0x70
    [] ? __vfs_read+0x23/0xd0
    [] ? rw_verify_area+0x52/0xf0
    [] ? vfs_read+0x81/0x120
    [] ? SyS_read+0x42/0xa0
    [] ? entry_SYSCALL_64_fastpath+0x16/0x75
    Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
    a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
    RIP [] leaf_walk_rcu+0x10/0xe0
    RSP
    CR2: ffffc90001630024

    Signed-off-by: Dave Forster
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    David Forster
     

02 Feb, 2016

1 commit

  • Pull networking fixes from David Miller:
    "This looks like a lot but it's a mixture of regression fixes as well
    as fixes for longer standing issues.

    1) Fix on-channel cancellation in mac80211, from Johannes Berg.

    2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
    module, from Eric Dumazet.

    3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
    Dumazet.

    4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
    bound, from Craig Gallek.

    5) GRO key comparisons don't take lightweight tunnels into account,
    from Jesse Gross.

    6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
    Dumazet.

    7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
    register them, otherwise the NEWLINK netlink message is missing
    the proper attributes. From Thadeu Lima de Souza Cascardo.

    8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
    Schimmel

    9) Handle fragments properly in ipv4 easly socket demux, from Eric
    Dumazet.

    10) Don't ignore the ifindex key specifier on ipv6 output route
    lookups, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
    tcp: avoid cwnd undo after receiving ECN
    irda: fix a potential use-after-free in ircomm_param_request
    net: tg3: avoid uninitialized variable warning
    net: nb8800: avoid uninitialized variable warning
    net: vxge: avoid unused function warnings
    net: bgmac: clarify CONFIG_BCMA dependency
    net: hp100: remove unnecessary #ifdefs
    net: davinci_cpdma: use dma_addr_t for DMA address
    ipv6/udp: use sticky pktinfo egress ifindex on connect()
    ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
    netlink: not trim skb for mmaped socket when dump
    vxlan: fix a out of bounds access in __vxlan_find_mac
    net: dsa: mv88e6xxx: fix port VLAN maps
    fib_trie: Fix shift by 32 in fib_table_lookup
    net: moxart: use correct accessors for DMA memory
    ipv4: ipconfig: avoid unused ic_proto_used symbol
    bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
    bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
    bnxt_en: Ring free response from close path should use completion ring
    net_sched: drr: check for NULL pointer in drr_dequeue
    ...

    Linus Torvalds
     

30 Jan, 2016

1 commit

  • The fib_table_lookup function had a shift by 32 that triggered a UBSAN
    warning. This was due to the fact that I had placed the shift first and
    then followed it with the check for the suffix length to ignore the
    undefined behavior. If we reorder this so that we verify the suffix is
    less than 32 before shifting the value we can avoid the issue.

    Reported-by: Toralf Förster
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck