30 Mar, 2018
2 commits
-
Add checking to call to call_fib_entry_notifiers for IPv4 route replace.
Allows a notifier handler to fail the replace.Signed-off-by: David Ahern
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller -
Move call to call_fib_entry_notifiers for new IPv4 routes to right
before the call to fib_insert_alias. At this point the only remaining
failure path is memory allocations in fib_insert_node. Handle that
very unlikely failure with a call to call_fib_entry_notifiers to
tell drivers about it.At this point notifier handlers can decide the fate of the new route
with a clean path to delete the potential new entry if the notifier
returns non-0.Signed-off-by: David Ahern
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
27 Mar, 2018
1 commit
-
Prefer the direct use of octal for permissions.
Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.Miscellanea:
o Whitespace neatening around these conversions.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
27 Feb, 2018
1 commit
-
All kmem caches aren't reallocated once set up.
Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
17 Jan, 2018
1 commit
-
/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
+ if (de->proc_fops) {
+ if (S_ISREG(inode->i_mode))
+ inode->i_fop = &proc_reg_file_ops;
+ else
+ inode->i_fop = de->proc_fops;
+ }VFS stopped pinning module at this point.
Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
01 Nov, 2017
1 commit
-
Add extack to fib_notifier_info and plumb through stack to
call_fib_rule_notifiers, call_fib_entry_notifiers and
call_fib6_entry_notifiers. This allows notifer handlers to
return messages to user.Signed-off-by: David Ahern
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
20 Oct, 2017
1 commit
-
All of the notifier data (fib_info, tos, type and table id) are
contained in the fib_alias. Pass it to the notifier instead of
each data separately shortening the argument list by 3.Signed-off-by: David Ahern
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
24 Aug, 2017
1 commit
-
Now when ipv4 route inserts a fib_info, it memcmp fib_metrics.
It means ipv4 route identifies one route also with metrics.But when removing a route, it tries to find the route without
caring about the metrics. It will cause that the route with
right metrics can't be removed.Thomas noticed this issue when doing the testing:
1. add:
# ip route append 192.168.7.0/24 dev v window 1000
# ip route append 192.168.7.0/24 dev v window 1001
# ip route append 192.168.7.0/24 dev v window 1002
# ip route append 192.168.7.0/24 dev v window 1003
2. delete:
# ip route delete 192.168.7.0/24 dev v window 1002
3. show:
192.168.7.0/24 proto boot scope link window 1001
192.168.7.0/24 proto boot scope link window 1002
192.168.7.0/24 proto boot scope link window 1003The one with window 1002 wasn't deleted but the first one was.
This patch is to do metrics match when looking up and deleting
one route.Reported-by: Thomas Haller
Signed-off-by: Xin Long
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
04 Aug, 2017
1 commit
-
The FIB notification chain is currently soley used by IPv4 code.
However, we're going to introduce IPv6 FIB offload support, which
requires these notification as well.As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when
registering FIB notifier"), upon registration to the chain, the callee
receives a full dump of the FIB tables and rules by traversing all the
net namespaces. The integrity of the dump is ensured by a per-namespace
sequence counter that is incremented whenever a change to the tables or
rules occurs.In order to allow more address families to use the chain, each family is
expected to register its fib_notifier_ops in its pernet init. These
operations allow the common code to read the family's sequence counter
as well as dump its tables and rules in the given net namespace.Additionally, a 'family' parameter is added to sent notifications, so
that listeners could distinguish between the different families.Implement the common code that allows listeners to register to the chain
and for address families to register their fib_notifier_ops. Subsequent
patches will implement these operations in IPv6.In the future, ipmr and ip6mr will be extended to provide these
notifications as well.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
04 Jul, 2017
1 commit
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller
30 May, 2017
3 commits
-
Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
Add messages for failures in lwtunnel_build_state, and add the extarg to
nla_parse where possible in the build_state callbacks.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Add extack error message for invalid prefix length and invalid prefix.
Example of the latter is a route spec containing 172.16.100.1/24, where
the /24 mask means the lower 8-bits should be 0. Amazing how easy that
one is to overlook when an EINVAL is returned.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
fib_table_insert and fib_table_delete have the same checks on the prefix
and length. Refactor into a helper. Avoids duplicate extack messages in
the next patch.Signed-off-by: David Ahern
Signed-off-by: David S. Miller
27 May, 2017
1 commit
-
Prefix is needed for returning matching route spec on get route request.
Signed-off-by: David Ahern
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller
23 May, 2017
1 commit
-
Plumb extack argument down to route add functions.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
17 May, 2017
1 commit
-
In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.Reported-by: Jan Moskyto Matejka
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
11 Mar, 2017
2 commits
-
We always pass the same event type to fib_notify() and
fib_rules_notify(), so we can safely drop this argument.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller -
Most of the code concerned with the FIB notification chain currently
resides in fib_trie.c, but this isn't really appropriate, as the FIB
notification chain is also used for FIB rules.Therefore, it makes sense to move the common FIB notification code to a
separate file and have it export the relevant functions, which can be
invoked by its different users (e.g., fib_trie.c, fib_rules.c).Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller
28 Feb, 2017
1 commit
-
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.Use BUILD_BUG_ON in a couple ATM drivers.
In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement. Hopefully this patch inspires
someone else to trim vsprintf.c more.Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan
Cc: Andy Shevchenko
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Feb, 2017
4 commits
-
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
CC: Patrick McHardy
Signed-off-by: David S. Miller -
When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD
notification is sent after the reference on the previous FIB info was
dropped. This is problematic as potential listeners might need to access
it in their notification blocks.Solve this by sending the notification prior to the deletion of the
replaced FIB alias. This is consistent with ENTRY_DEL notifications.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
CC: Patrick McHardy
Signed-off-by: David S. Miller -
When a FIB alias is removed, a notification is sent using the type
passed from user space - can be RTN_UNSPEC - instead of the actual type
of the removed alias. This is problematic for listeners of the FIB
notification chain, as several FIB aliases can exist with matching
parameters, but the type.Solve this by passing the actual type of the removed FIB alias.
Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
CC: Patrick McHardy
Signed-off-by: David S. Miller -
In case the MAIN table is flushed and its trie is shared with the LOCAL
table, then we might be flushing FIB aliases belonging to the latter.
This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
ID.The above doesn't affect current listeners, as the table ID is ignored
during entry deletion, but this will change later in the patchset.When flushing a particular table, skip any aliases belonging to a
different one.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
CC: Alexander Duyck
CC: Patrick McHardy
Reviewed-by: Alexander Duyck
Signed-off-by: David S. Miller
25 Dec, 2016
1 commit
-
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
07 Dec, 2016
1 commit
06 Dec, 2016
2 commits
-
It has been reported that update_suffix can be expensive when it is called
on a large node in which most of the suffix lengths are the same. The time
required to add 200K entries had increased from around 3 seconds to almost
49 seconds.In order to address this we need to move the code for updating the suffix
out of resize and instead just have it handled in the cases where we are
pushing a node that increases the suffix length, or will decrease the
suffix length.Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Reported-by: Robert Shearman
Signed-off-by: Alexander Duyck
Reviewed-by: Robert Shearman
Tested-by: Robert Shearman
Signed-off-by: David S. Miller -
It wasn't necessary to pass a leaf in when doing the suffix updates so just
drop it. Instead just pass the suffix and work with that.Since we dropped the leaf there is no need to include that in the name so
the names are updated to node_push_suffix and node_pull_suffix.Finally I noticed that the logic for pulling the suffix length back
actually had some issues. Specifically it would stop prematurely if there
was a longer suffix, but it was not as long as the original suffix. I
updated the code to address that in node_pull_suffix.Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
Suggested-by: Robert Shearman
Signed-off-by: Alexander Duyck
Reviewed-by: Robert Shearman
Tested-by: Robert Shearman
Signed-off-by: David S. Miller
04 Dec, 2016
3 commits
-
Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
introduced a new notification chain to notify listeners (f.e., switchdev
drivers) about addition and deletion of routes.However, upon registration to the chain the FIB tables can already be
populated, which means potential listeners will have an incomplete view
of the tables.Solve that by dumping the FIB tables and replaying the events to the
passed notification block. The dump itself is done using RCU in order
not to starve consumers that need RTNL to make progress.The integrity of the dump is ensured by reading the FIB change sequence
counter before and after the dump under RTNL. This allows us to avoid
the problematic situation in which the dumping process sends a ENTRY_ADD
notification following ENTRY_DEL generated by another process holding
RTNL.Callers of the registration function may pass a callback that is
executed in case the dump was inconsistent with current FIB tables.The number of retries until a consistent dump is achieved is set to a
fixed number to prevent callers from looping for long periods of time.
In case current limit proves to be problematic in the future, it can be
easily converted to be configurable using a sysctl.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
The next patch will enable listeners of the FIB notification chain to
request a dump of the FIB tables. However, since RTNL isn't taken during
the dump, it's possible for the FIB tables to change mid-dump, which
will result in inconsistency between the listener's table and the
kernel's.Allow listeners to know about changes that occurred mid-dump, by adding
a change sequence counter to each net namespace. The counter is
incremented just before a notification is sent in the FIB chain.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
In order not to hold RTNL for long periods of time we're going to dump
the FIB tables using RCU.Convert the FIB notification chain to be atomic, as we can't block in
RCU critical sections.Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
17 Nov, 2016
2 commits
-
Fix a small memory leak that can occur where we leak a fib_alias in the
event of us not being able to insert it into the local table.Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Eric Dumazet
Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
The patch that removed the FIB offload infrastructure was a bit too
aggressive and also removed code needed to clean up us splitting the table
if additional rules were added. Specifically the function
fib_trie_flush_external was called at the end of a new rule being added to
flush the foreign trie entries from the main trie.I updated the code so that we only call fib_trie_flush_external on the main
table so that we flush the entries for local from main. This way we don't
call it for every rule change which is what was happening previously.Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure")
Reported-by: Eric Dumazet
Cc: Jiri Pirko
Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
08 Nov, 2016
1 commit
-
The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key. In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid. With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
Cc: Andy Whitcroft
Reported-by: Jason Baron
Tested-by: Jason Baron
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller
28 Sep, 2016
2 commits
-
Since this is now taken care of by FIB notifier, remove the code, with
all unused dependencies.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
This allows to pass information about added/deleted FIB entries/rules to
whoever is interested. This is done in a very similar way as devinet
notifies address additions/removals.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
10 Sep, 2016
1 commit
-
fib_table_insert() inconsistently fills the nlmsg_flags field in its
notification messages.Since commit b8f558313506 ("[RTNETLINK]: Fix sending netlink message
when replace route."), the netlink message has its nlmsg_flags set to
NLM_F_REPLACE if the route replaced a preexisting one.Then commit a2bb6d7d6f42 ("ipv4: include NLM_F_APPEND flag in append
route notifications") started setting nlmsg_flags to NLM_F_APPEND if
the route matched a preexisting one but was appended.In other cases (exclusive creation or prepend), nlmsg_flags is 0.
This patch sets ->nlmsg_flags in all situations, preserving the
semantic of the NLM_F_* bits:* NLM_F_CREATE: a new fib entry has been created for this route.
* NLM_F_EXCL: no other fib entry existed for this route.
* NLM_F_REPLACE: this route has overwritten a preexisting fib entry.
* NLM_F_APPEND: the new fib entry was added after other entries for
the same route.As a result, the possible flag combination can now be reported
(iproute2's terminology into parentheses):* NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
("add").
* NLM_F_CREATE | NLM_F_APPEND: route did already exist, new route
added after preexisting ones ("append").
* NLM_F_CREATE: route did already exist, new route added before
preexisting ones ("prepend").
* NLM_F_REPLACE: route did already exist, new route replaced the
first preexisting one ("change").Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller
19 Aug, 2016
1 commit
-
1) Fix one typo: s/tn/tp/
2) Fix the description about the "u" bits.Signed-off-by: Xunlei Pang
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller
06 Aug, 2016
1 commit
-
Panic occurs when issuing "cat /proc/net/route" whilst
populating FIB with > 1M routes.Use of cached node pointer in fib_route_get_idx is unsafe.
BUG: unable to handle kernel paging request at ffffc90001630024
IP: [] leaf_walk_rcu+0x10/0xe0
PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0
Oops: 0000 [#1] SMP
Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti
acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd
tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod
CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000
RIP: 0010:[] [] leaf_walk_rcu+0x10/0xe0
RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202
RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20
RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950
RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000
R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950
R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480
FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0
Stack:
ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00
ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000
ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80
Call Trace:
[] ? fib_route_seq_start+0x93/0xc0
[] ? seq_read+0x149/0x380
[] ? fsnotify+0x3b4/0x500
[] ? process_echoes+0x70/0x70
[] ? proc_reg_read+0x47/0x70
[] ? __vfs_read+0x23/0xd0
[] ? rw_verify_area+0x52/0xf0
[] ? vfs_read+0x81/0x120
[] ? SyS_read+0x42/0xa0
[] ? entry_SYSCALL_64_fastpath+0x16/0x75
Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89
RIP [] leaf_walk_rcu+0x10/0xe0
RSP
CR2: ffffc90001630024Signed-off-by: Dave Forster
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller
02 Feb, 2016
1 commit
-
Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.1) Fix on-channel cancellation in mac80211, from Johannes Berg.
2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...
30 Jan, 2016
1 commit
-
The fib_table_lookup function had a shift by 32 that triggered a UBSAN
warning. This was due to the fact that I had placed the shift first and
then followed it with the check for the suffix length to ignore the
undefined behavior. If we reorder this so that we verify the suffix is
less than 32 before shifting the value we can avoid the issue.Reported-by: Toralf Förster
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller