15 Sep, 2018
1 commit
-
[ Upstream commit 98c8f125fd8a6240ea343c1aa50a1be9047791b8 ]
Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
policy, so max length isn't enforced, only minimum. This means nkeys
(from userspace) was being trusted without checking the actual size of
nla_len(), which could lead to a memory over-read, and ultimately an
exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
a namespace.Reported-by: Al Viro
Cc: Jamal Hadi Salim
Cc: Cong Wang
Cc: Jiri Pirko
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
09 Mar, 2018
2 commits
-
[ Upstream commit d7cdee5ea8d28ae1b6922deb0c1badaa3aa0ef8c ]
Li Shuang reported an Oops with cls_u32 due to an use-after-free
in u32_destroy_key(). The use-after-free can be triggered with:dev=lo
tc qdisc add dev $dev root handle 1: htb default 10
tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256
tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\
10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1:
tc qdisc del dev $dev rootWhich causes the following kasan splat:
==================================================================
BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32]
Call Trace:
dump_stack+0xd6/0x182
? dma_virt_map_sg+0x22e/0x22e
print_address_description+0x73/0x290
kasan_report+0x277/0x360
? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
u32_delete_key_freepf_work+0x1c/0x30 [cls_u32]
process_one_work+0xae0/0x1c80
? sched_clock+0x5/0x10
? pwq_dec_nr_in_flight+0x3c0/0x3c0
? _raw_spin_unlock_irq+0x29/0x40
? trace_hardirqs_on_caller+0x381/0x570
? _raw_spin_unlock_irq+0x29/0x40
? finish_task_switch+0x1e5/0x760
? finish_task_switch+0x208/0x760
? preempt_notifier_dec+0x20/0x20
? __schedule+0x839/0x1ee0
? check_noncircular+0x20/0x20
? firmware_map_remove+0x73/0x73
? find_held_lock+0x39/0x1c0
? worker_thread+0x434/0x1820
? lock_contended+0xee0/0xee0
? lock_release+0x1100/0x1100
? init_rescuer.part.16+0x150/0x150
? retint_kernel+0x10/0x10
worker_thread+0x216/0x1820
? process_one_work+0x1c80/0x1c80
? lock_acquire+0x1a5/0x540
? lock_downgrade+0x6b0/0x6b0
? sched_clock+0x5/0x10
? lock_release+0x1100/0x1100
? compat_start_thread+0x80/0x80
? do_raw_spin_trylock+0x190/0x190
? _raw_spin_unlock_irq+0x29/0x40
? trace_hardirqs_on_caller+0x381/0x570
? _raw_spin_unlock_irq+0x29/0x40
? finish_task_switch+0x1e5/0x760
? finish_task_switch+0x208/0x760
? preempt_notifier_dec+0x20/0x20
? __schedule+0x839/0x1ee0
? kmem_cache_alloc_trace+0x143/0x320
? firmware_map_remove+0x73/0x73
? sched_clock+0x5/0x10
? sched_clock_cpu+0x18/0x170
? find_held_lock+0x39/0x1c0
? schedule+0xf3/0x3b0
? lock_downgrade+0x6b0/0x6b0
? __schedule+0x1ee0/0x1ee0
? do_wait_intr_irq+0x340/0x340
? do_raw_spin_trylock+0x190/0x190
? _raw_spin_unlock_irqrestore+0x32/0x60
? process_one_work+0x1c80/0x1c80
? process_one_work+0x1c80/0x1c80
kthread+0x312/0x3d0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x50Allocated by task 1688:
kasan_kmalloc+0xa0/0xd0
__kmalloc+0x162/0x380
u32_change+0x1220/0x3c9e [cls_u32]
tc_ctl_tfilter+0x1ba6/0x2f80
rtnetlink_rcv_msg+0x4f0/0x9d0
netlink_rcv_skb+0x124/0x320
netlink_unicast+0x430/0x600
netlink_sendmsg+0x8fa/0xd60
sock_sendmsg+0xb1/0xe0
___sys_sendmsg+0x678/0x980
__sys_sendmsg+0xc4/0x210
do_syscall_64+0x232/0x7f0
return_from_SYSCALL_64+0x0/0x75Freed by task 112:
kasan_slab_free+0x71/0xc0
kfree+0x114/0x320
rcu_process_callbacks+0xc3f/0x1600
__do_softirq+0x2bf/0xc06The buggy address belongs to the object at ffff881b83dae600
which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 24 bytes inside of
4096-byte region [ffff881b83dae600, ffff881b83daf600)
The buggy address belongs to the page:
page:ffffea006e0f6a00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x17ffffc0008100(slab|head)
raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007
raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000
page dumped because: kasan: bad access detectedMemory state around the buggy address:
ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================The problem is that the htnode is freed before the linked knodes and the
latter will try to access the first at u32_destroy_key() time.
This change addresses the issue using the htnode refcnt to guarantee
the correct free order. While at it also add a RCU annotation,
to keep sparse happy.v1 -> v2: use rtnl_derefence() instead of RCU read locks
v2 -> v3:
- don't check refcnt in u32_destroy_hnode()
- cleaned-up u32_destroy() implementation
- cleaned-up code comment
v3 -> v4:
- dropped unneeded commentReported-by: Li Shuang
Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter")
Signed-off-by: Paolo Abeni
Acked-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit eb53f7af6f15285e2f6ada97285395343ce9f433 ]
The following sequence is currently broken:
# tc qdisc add dev foo ingress
# tc filter replace dev foo protocol all ingress \
u32 match u8 0 0 action mirred egress mirror dev bar1
# tc filter replace dev foo protocol all ingress \
handle 800::800 pref 49152 \
u32 match u8 0 0 action mirred egress mirror dev bar2
Error: cls_u32: Key node flags do not match passed flags.
We have an error talking to the kernel, -1The error comes from u32_change() when comparing new and
existing flags. The existing ones always contains one of
TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
These flags cannot be passed from userspace so the condition
(n->flags != flags) in u32_change() always fails.Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
TCA_CLS_FLAGS_IN_HW are not taken into account.Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status")
Signed-off-by: Ivan Vecera
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
09 Nov, 2017
1 commit
-
Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.Cc: Lucas Bates
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
29 Oct, 2017
1 commit
-
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.Reported-by: Chris Mi
Cc: Daniel Borkmann
Cc: Jiri Pirko
Cc: John Fastabend
Cc: Jamal Hadi Salim
Cc: "Paul E. McKenney"
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
01 Sep, 2017
1 commit
-
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.This hidden difference hurts performance silently if we have many tc
classes in hierarchy.This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
26 Aug, 2017
1 commit
-
It is ugly to hide a u32-filter-specific pointer inside Qdisc,
this breaks the TC layers:1. Qdisc is a generic representation, should not have any specific
data of any type2. Qdisc layer is above filter layer, should only save filters in
the list of struct tcf_proto.This pointer is used as the head of the chain of u32 hash tables,
that is struct tc_u_hnode, because u32 filter is very special,
it allows to create multiple hash tables within one qdisc and
across multiple u32 filters.Instead of using this ugly pointer, we can just save it in a global
hash table key'ed by (dev ifindex, qdisc handle), therefore we can
still treat it as a per qdisc basis data structure conceptually.Of course, because of network namespaces, this key is not unique
at all, but it is fine as we already have a pointer to Qdisc in
struct tc_u_common, we can just compare the pointers when collision.And this only affects slow paths, has no impact to fast path,
thanks to the pointer ->tp_c.Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Acked-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
12 Aug, 2017
1 commit
-
cops->tcf_cl_offload is no longer needed, as the drivers check what they
can and cannot offload using the classid identify helpers. So remove this.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
08 Aug, 2017
4 commits
-
Now we use 'unsigned long fh' as a pointer in every place,
it is safe to convert it to a void pointer now. This gets
rid of many casts to pointer.Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
05 Aug, 2017
1 commit
-
As the n struct was allocated right before u32_set_parms call,
no need to use tcf_exts_change to do atomic change, and we can just
fill-up the unused exts struct directly by tcf_exts_validate.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
08 Jun, 2017
1 commit
-
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
22 Apr, 2017
1 commit
-
We could have a race condition where in ->classify() path we
dereference tp->root and meanwhile a parallel ->destroy() makes it
a NULL. Daniel cured this bug in commit d936377414fa
("net, sched: respect rcu grace period on cls destruction").This happens when ->destroy() is called for deleting a filter to
check if we are the last one in tp, this tp is still linked and
visible at that time. The root cause of this problem is the semantic
of ->destroy(), it does two things (for non-force case):1) check if tp is empty
2) if tp is empty we could really destroy itand its caller, if cares, needs to check its return value to see if it
is really destroyed. Therefore we can't unlink tp unless we know it is
empty.As suggested by Daniel, we could actually move the test logic to ->delete()
so that we can safely unlink tp after ->delete() tells us the last one is
just deleted and before ->destroy().Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Cc: Roi Dayan
Cc: Daniel Borkmann
Cc: John Fastabend
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
14 Apr, 2017
1 commit
-
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
18 Feb, 2017
1 commit
-
U32 support for the "in hw" offloading flags.
Signed-off-by: Or Gerlitz
Reviewed-by: Amir Vadai
Signed-off-by: David S. Miller
10 Jan, 2017
1 commit
-
This struct member is already initialized to zero upon root_ht's
allocation via kzalloc().Signed-off-by: Alexandru Moise
Signed-off-by: David S. Miller
20 Sep, 2016
1 commit
-
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
23 Aug, 2016
1 commit
-
After commit 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array")
we do dynamic allocation in tcf_exts_init(), therefore we need
to handle the ENOMEM case properly.Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
09 Jun, 2016
2 commits
-
Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).
This patch fixes the knode handling.Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller -
Errors reported by u32_replace_hw_hnode() were not propagated.
Signed-off-by: Jakub Kicinski
Acked-by: Sridhar Samudrala
Signed-off-by: David S. Miller
08 Jun, 2016
3 commits
-
When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Signed-off-by: David S. Miller -
Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller -
'err' variable is not set in this test, we would return whatever
previous test set 'err' to.Signed-off-by: Jakub Kicinski
Acked-by: Sridhar Samudrala
Acked-by: John Fastabend
Signed-off-by: David S. Miller
17 May, 2016
1 commit
-
On devices that support TC U32 offloads, this flag enables a filter to be
added only to HW. skip-sw and skip-hw are mutually exclusive flags. By
default without any flags, the filter is added to both HW and SW, but no
error checks are done in case of failure to add to HW. With skip-sw,
failure to add to HW is treated as an error.Here is a sample script that adds 2 filters, one with skip-sw and the other
with skip-hw flag.# add ingress qdisc
tc qdisc add dev p4p1 ingress# enable hw tc offload.
ethtool -K p4p1 hw-tc-offload on# add u32 filter with skip-sw flag.
tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
handle 800:0:1 u32 ht 800: flowid 800:1 \
skip-sw \
match ip src 192.168.1.0/24 \
action drop# add u32 filter with skip-hw flag.
tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
handle 800:0:2 u32 ht 800: flowid 800:2 \
skip-hw \
match ip src 192.168.2.0/24 \
action dropSigned-off-by: Sridhar Samudrala
Acked-by: John Fastabend
Signed-off-by: David S. Miller
27 Apr, 2016
1 commit
-
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
02 Mar, 2016
2 commits
-
In the initial implementation the only way to stop a rule from being
inserted into the hardware table was via the device feature flag.
However this doesn't work well when working on an end host system
where packets are expect to hit both the hardware and software
datapaths.For example we can imagine a rule that will match an IP address and
increment a field. If we install this rule in both hardware and
software we may increment the field twice. To date we have only
added support for the drop action so we have been able to ignore
these cases. But as we extend the action support we will hit this
example plus more such cases. Arguably these are not even corner
cases in many working systems these cases will be common.To avoid forcing the driver to always abort (i.e. the above example)
this patch adds a flag to add a rule in software only. A careful
user can use this flag to build software and hardware datapaths
that work together. One example we have found particularly useful
is to use hardware resources to set the skb->mark on the skb when
the match may be expensive to run in software but a mark lookup
in a hash table is cheap. The idea here is hardware can do in one
lookup what the u32 classifier may need to traverse multiple lists
and hash tables to compute. The flag is only passed down on inserts.
On deletion to avoid stale references in hardware we always try
to remove a rule if it exists.The flags field is part of the classifier specific options. Although
it is tempting to lift this into the generic structure doing this
proves difficult do to how the tc netlink attributes are implemented
along with how the dump/change routines are called. There is also
precedence for putting seemingly generic pieces in the specific
classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
although not ideal I've left FLAGS in the u32 options as well as it
simplifies the code greatly and user space has already learned how
to manage these bits ala 'tc' tool.Another thing if trying to update a rule we require the flags to
be unchanged. This is to force user space, software u32 and
the hardware u32 to keep in sync. Thanks to Simon Horman for
catching this case.Signed-off-by: John Fastabend
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller -
The offload decision was originally very basic and tied to if the dev
implemented the appropriate ndo op hook. The next step is to allow
the user to more flexibly define if any paticular rule should be
offloaded or not. In order to have this logic in one function lift
the current check into a helper routine tc_should_offload().Signed-off-by: John Fastabend
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
17 Feb, 2016
1 commit
-
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.This works aligns with how network drivers have been doing qdisc
offloads for mqprio.Signed-off-by: John Fastabend
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
26 Aug, 2015
1 commit
-
In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
I added a check in u32_destroy() to see if all real filters are gone
for each tp, however, that is only done for root_ht, same is needed
for others.This can be reproduced by the following tc commands:
tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
ht 15:2: match ip src 10.0.0.2 flowid 1:10
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
ht 15:2: match ip src 10.0.0.3 flowid 1:10Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Reported-by: Akshat Kakkar
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
21 Mar, 2015
1 commit
-
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
net/core/sysctl_net_core.c
net/ipv4/inet_diag.cThe be_main.c conflict resolution was really tricky. The conflict
hunks generated by GIT were very unhelpful, to say the least. It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged. And this worked beautifully.The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.Signed-off-by: David S. Miller
10 Mar, 2015
2 commits
-
We dynamically allocate divisor+1 entries for ->ht[] in tc_u_hnode:
ht = kzalloc(sizeof(*ht) + divisor*sizeof(void *), GFP_KERNEL);
So ->ht is supposed to be the last field of this struct, however
this is broken, since an rcu head is appended after it.Fixes: 1ce87720d456 ("net: sched: make cls_u32 lockless")
Cc: Jamal Hadi Salim
Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
Kernel automatically creates a tp for each
(kind, protocol, priority) tuple, which has handle 0,
when we add a new filter, but it still is left there
after we remove our own, unless we don't specify the
handle (literally means all the filters under
the tuple). For example this one is left:# tc filter show dev eth0
filter parent 8001: protocol arp pref 49152 basicThe user-space is hard to clean up these for kernel
because filters like u32 are organized in a complex way.
So kernel is responsible to remove it after all filters
are gone. Each type of filter has its own way to
store the filters, so each type has to provide its
way to check if all filters are gone.Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
10 Dec, 2014
1 commit
-
It is never called and implementations are void. So just remove it.
Signed-off-by: Jiri Pirko
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
02 Oct, 2014
1 commit
-
This fixes the following crash:
[ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[ 63.980094] RIP: 0010:[] [] u32_destroy_key+0x27/0x6d
[ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202
[ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[ 63.980094] Stack:
[ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[ 63.980094] Call Trace:
[ 63.980094] [] u32_delete_key_freepf_rcu+0x1b/0x1d
[ 63.980094] [] rcu_process_callbacks+0x3bb/0x691
[ 63.980094] [] ? rcu_process_callbacks+0x2c1/0x691
[ 63.980094] [] ? u32_destroy_key+0x6d/0x6d
[ 63.980094] [] __do_softirq+0x142/0x323
[ 63.980094] [] run_ksoftirqd+0x23/0x53
[ 63.980094] [] smpboot_thread_fn+0x203/0x221
[ 63.980094] [] ? smpboot_unpark_thread+0x33/0x33
[ 63.980094] [] kthread+0xc9/0xd1
[ 63.980094] [] ? do_wait_for_common+0xf8/0x125
[ 63.980094] [] ? __kthread_parkme+0x61/0x61
[ 63.980094] [] ret_from_fork+0x7c/0xb0
[ 63.980094] [] ? __kthread_parkme+0x61/0x61tp could be freed in call_rcu callback too, the order is not guaranteed.
John Fastabend says:
====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: John Fastabend
Signed-off-by: David S. Miller
29 Sep, 2014
1 commit
-
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
23 Sep, 2014
3 commits
-
$ grep CONFIG_CLS_U32_MARK .config
# CONFIG_CLS_U32_MARK is not setnet/sched/cls_u32.c: In function 'u32_change':
net/sched/cls_u32.c:852:1: warning: label 'errout' defined but not used
[-Wunused-label]Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Changes to the cls_u32 classifier must appear atomic to the
readers. Before this patch if a change is requested for both
the exts and ifindex, first the ifindex is updated then the
exts with tcf_exts_change(). This opens a small window where
a reader can have a exts chain with an incorrect ifindex. This
violates the the RCU semantics.Here we resolve this by always passing u32_set_parms() a copy
of the tc_u_knode to work on and then inserting it into the hash
table after the updates have been successfully applied.Tested with the following short script:
#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
u32 divisor 256#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
u32 link 1: hashkey mask ffffff00 at 12 \
match ip src 192.168.8.0/2#tc filter add dev p3p2 parent 8001:0 protocol ip prio 102 \
handle 1::10 u32 classid 1:2 ht 1: \
match ip src 192.168.8.0/8 match ip tos 0x0a 1e#tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
handle 1::10 u32 classid 1:2 ht 1: \
match ip src 1.1.0.0/8 match ip tos 0x0b 1eCC: Eric Dumazet
CC: Jamal Hadi Salim
Signed-off-by: John Fastabend
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
This fixes a missed free_percpu in the unwind code path and when
keys are destroyed.Signed-off-by: John Fastabend
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller