Eric Lee / smarc-fsl-linux-kernel

21 Mar, 2015

2 commits

94caee8c3 ebpf: add sched_act_type and map it to sk_filter's verifier ops ... Browse Code »

In order to prepare eBPF support for tc action, we need to add
sched_act_type, so that the eBPF verifier is aware of what helper
function act_bpf may use, that it can load skb data and read out
currently available skb fields.

This is bascially analogous to 96be4325f443 ("ebpf: add sched_cls_type
and map it to sk_filter's verifier ops").

BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
separate since both will have a different set of functionality in
future (classifier vs action), thus we won't run into ABI troubles
when the point in time comes to diverge functionality from the
classifier.

The future plan for act_bpf would be that it will be able to write
into skb->data and alter selected fields mirrored in struct __sk_buff.

For an initial support, it's sufficient to map it to sk_filter_ops.

Signed-off-by: Daniel Borkmann
Cc: Jiri Pirko
Reviewed-by: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-21 07:10:44 +0800
0fa74a4be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
net/core/sysctl_net_core.c
net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky. The conflict
hunks generated by GIT were very unhelpful, to say the least. It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged. And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller

David S. Miller
2015-03-21 06:51:09 +0800

19 Mar, 2015

1 commit

da11508eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching ... Browse Code »

Pull livepatching fix from Jiri Kosina:

- fix for potential race with module loading, from Petr Mladek.

The race is very unlikely to be seen in real world and has been found
by code inspection, but should be fixed for 4.0 anyway.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix subtle race with coming and going modules

Linus Torvalds
2015-03-19 01:46:39 +0800

18 Mar, 2015

1 commit

13326e5a6 Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of git://gi… ... Browse Code »

…t.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf and timer fixes from Ingo Molnar:
"Two small perf fixes:
- kernel side context leak fix
- tooling crash fix

And two clocksource driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix context leak in put_event()
perf annotate: Fix fallback to unparsed disassembler line

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: sun5i: Fix setup_irq init sequence
clocksource: efm32: Fix a NULL pointer dereference

Linus Torvalds
2015-03-18 04:22:29 +0800

17 Mar, 2015

1 commit

8cb2c2dc4 livepatch: Fix subtle race with coming and going modules ... Browse Code »

There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:

1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.

2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.

This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.

Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.

Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.

Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.

Alternative solutions:
======================

+ reject new patches when a patched module is coming or going; this is ugly

+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean

+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)

+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development

+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations

Example of patch stacking breakage:
===================================

The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:

a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)

If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:

ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)

, so the pointer to b3() is the first and will be used.

Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:

CPU0 CPU1

load_module(M)

complete_formation()

mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);

klp_register_patch(P3);
klp_enable_patch(P3);

# STATE 1

klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);

# STATE 2

The ftrace ops for a() and b() then looks:

STATE1:

ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);

STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);

therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.

Example of the race with going modules:
=======================================

CPU0 CPU1

delete_module() #SYSCALL

try_stop_module()
mod->state = MODULE_STATE_GOING;

mutex_unlock(&module_mutex);

klp_register_patch()
klp_enable_patch()

#save place to switch universe

b() # from module that is going
a() # from core (patched)

mod->exit();

Note that the function b() can be called until we call mod->exit().

If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.

[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek
Acked-by: Josh Poimboeuf
Acked-by: Rusty Russell
Signed-off-by: Jiri Kosina

Petr Mladek
2015-03-17 17:31:54 +0800

16 Mar, 2015

3 commits

9bac3d6d5 bpf: allow extended BPF programs access skb fields ... Browse Code »

introduce user accessible mirror of in-kernel 'struct sk_buff':
struct __sk_buff {
__u32 len;
__u32 pkt_type;
__u32 mark;
__u32 queue_mapping;
};

bpf programs can do:

int bpf_prog(struct __sk_buff *skb)
{
__u32 var = skb->pkt_type;

which will be compiled to bpf assembler as:

dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)

bpf verifier will check validity of access and will convert it to:

dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
dst_reg &= 7

since skb->pkt_type is a bitfield.

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2015-03-16 10:02:28 +0800
c04167ce2 ebpf: add helper for obtaining current processor id ... Browse Code »

This patch adds the possibility to obtain raw_smp_processor_id() in
eBPF. Currently, this is only possible in classic BPF where commit
da2033c28226 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
facilities for this.

Perhaps most importantly, this would also allow us to track per CPU
statistics with eBPF maps, or to implement a poor-man's per CPU data
structure through eBPF maps.

Example function proto-type looks like:

u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id;

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-16 09:57:25 +0800
03e69b508 ebpf: add prandom helper for packet sampling ... Browse Code »

This work is similar to commit 4cd3675ebf74 ("filter: added BPF
random opcode") and adds a possibility for packet sampling in eBPF.

Currently, this is only possible in classic BPF and useful to
combine sampling with f.e. packet sockets, possible also with tc.

Example function proto-type looks like:

u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32;

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-16 09:57:25 +0800

13 Mar, 2015

3 commits

d415a7f1c perf: Fix context leak in put_event() ... Browse Code »

Commit:

a83fe28e2e45 ("perf: Fix put_event() ctx lock")

changed the locking logic in put_event() by replacing mutex_lock_nested()
with perf_event_ctx_lock_nested(), but didn't fix the subsequent
mutex_unlock() with a correct counterpart, perf_event_ctx_unlock().

Contexts are thus leaked as a result of incremented refcount
in perf_event_ctx_lock_nested().

Signed-off-by: Leon Yu
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Peter Zijlstra
Fixes: a83fe28e2e45 ("perf: Fix put_event() ctx lock")
Link: http://lkml.kernel.org/r/1424954613-5034-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Ingo Molnar

Leon Yu
2015-03-13 17:02:18 +0800
a5af5aa8b kasan, module, vmalloc: rework shadow allocation for modules ... Browse Code »

Current approach in handling shadow memory for modules is broken.

Shadow memory could be freed only after memory shadow corresponds it is no
longer used. vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:

void vfree(const void *addr)
{
...
if (unlikely(in_interrupt())) {
struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
if (llist_add((struct llist_node *)addr, &p->list))
schedule_work(&p->wq);

Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.

So shadow memory should be freed after module's memory. However, such
deallocation order could race with kasan_module_alloc() in module_alloc().

Free shadow right before releasing vm area. At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.

Signed-off-by: Andrey Ryabinin
Acked-by: Rusty Russell
Cc: Dmitry Vyukov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2015-03-13 09:46:08 +0800
80f1d68cc ebpf: verifier: check that call reg with ARG_ANYTHING is initialized ... Browse Code »

I noticed that a helper function with argument type ARG_ANYTHING does
not need to have an initialized value (register).

This can worst case lead to unintented stack memory leakage in future
helper functions if they are not carefully designed, or unintended
application behaviour in case the application developer was not careful
enough to match a correct helper function signature in the API.

The underlying issue is that ARG_ANYTHING should actually be split
into two different semantics:

1) ARG_DONTCARE for function arguments that the helper function
does not care about (in other words: the default for unused
function arguments), and

2) ARG_ANYTHING that is an argument actually being used by a
helper function and *guaranteed* to be an initialized register.

The current risk is low: ARG_ANYTHING is only used for the 'flags'
argument (r4) in bpf_map_update_elem() that internally does strict
checking.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-13 03:29:31 +0800

10 Mar, 2015

4 commits

3cef5c5b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2015-03-10 11:38:02 +0800
e7901af14 Merge tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/rostedt/linux-trace

Pull seq-buf/ftrace fixes from Steven Rostedt:
"This includes fixes for seq_buf_bprintf() truncation issue. It also
contains fixes to ftrace when /proc/sys/kernel/ftrace_enabled and
function tracing are started. Doing the following causes some issues:

# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo function_graph > /sys/kernel/debug/tracing/current_tracer

As well as with function tracing too. Pratyush Anand first reported
this issue to me and supplied a patch. When I tested this on my x86
test box, it caused thousands of backtraces and warnings to appear in
dmesg, which also caused a denial of service (a warning for every
function that was listed). I applied Pratyush's patch but it did not
fix the issue for me. I looked into it and found a slight problem
with trampoline accounting. I fixed it and sent Pratyush a patch, but
he said that it did not fix the issue for him.

I later learned tha Pratyush was using an ARM64 server, and when I
tested on my ARM board, I was able to reproduce the same issue as
Pratyush. After applying his patch, it fixed the problem. The above
test uncovered two different bugs, one in x86 and one in ARM and
ARM64. As this looked like it would affect PowerPC, I tested it on my
PPC64 box. It too broke, but neither the patch that fixed ARM or x86
fixed this box (the changes were all in generic code!). The above
test, uncovered two more bugs that affected PowerPC. Again, the
changes were only done to generic code. It's the way the arch code
expected things to be done that was different between the archs. Some
where more sensitive than others.

The rest of this series fixes the PPC bugs as well"

* tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled
ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl
ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl
seq_buf: Fix seq_buf_bprintf() truncation
seq_buf: Fix seq_buf_vprintf() truncation

Linus Torvalds
2015-03-10 09:44:06 +0800
c0e99a71b Merge branch 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:
"The cgroup iteration update two years ago and the recent cpuset
restructuring introduced regressions in subset of cpuset
configurations. Three patches to fix them.

All are marked for -stable"

* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix cpuset sched_relax_domain_level
cpuset: fix a warning when clearing configured masks in old hierarchy
cpuset: initialize effective masks when clone_children is enabled

Linus Torvalds
2015-03-10 08:30:09 +0800
b695f31f4 Merge branch 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"One fix patch for a subtle livelock condition which can happen on
PREEMPT_NONE kernels involving two racing cancel_work calls. Whoever
comes in the second has to wait for the previous one to finish. This
was implemented by making the later one block for the same condition
that the former would be (work item completion) and then loop and
retest; unfortunately, depending on the wake up order, the later one
could lock out the former one to finish by busy looping on the cpu.

This is fixed by implementing explicit wait mechanism. Work item
might not belong anywhere at this point and there's remote possibility
of thundering herd problem. I originally tried to use bit_waitqueue
but it didn't work for static work items on modules. It's currently
using single wait queue with filtering wake up function and exclusive
wakeup. If this ever becomes a problem, which is not very likely, we
can try to figure out a way to piggy back on bit_waitqueue"

* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE

Linus Torvalds
2015-03-10 08:00:54 +0800

09 Mar, 2015

4 commits

524a38682 ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled ... Browse Code »

Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.

That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().

Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.

The problem could be seen by the following commands:

# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# cat /sys/kernel/debug/tracing/trace

The trace will show that function tracing was not active.

Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2015-03-09 22:55:34 +0800
1619dc3f8 ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl ... Browse Code »

When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().

Consider the following situation.

# echo 0 > /proc/sys/kernel/ftrace_enabled

After this ftrace_enabled = 0.

# echo function_graph > /sys/kernel/debug/tracing/current_tracer

Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.

# echo 1 > /proc/sys/kernel/ftrace_enabled

Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.

Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer

Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.

On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.

Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.

Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com

Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Pratyush Anand
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt

Pratyush Anand
2015-03-09 22:50:51 +0800
b24d443b8 ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl ... Browse Code »

When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.

ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).

When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.

For example, the following will cause ftrace to crash:

# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer

As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.

Cc: stable@vger.kernel.org # 3.17+
Reported-by: Pratyush Anand
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2015-03-09 22:46:00 +0800
bbbce516b Merge tag 'tty-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial fixes from Greg KH:
"Here are some tty and serial driver fixes for 4.0-rc3.

Along with the atime fix that you know about, here are some other
serial driver bugfixes as well. Most notable is a wait_until_sent
bugfix that was traced back to being around since before 2.6.12 that
Johan has fixed up.

All have been in linux-next successfully"

* tag 'tty-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
TTY: fix tty_wait_until_sent maximum timeout
TTY: fix tty_wait_until_sent on 64-bit machines
USB: serial: fix infinite wait_until_sent timeout
TTY: bfin_jtag_comm: remove incorrect wait_until_sent operation
net: irda: fix wait_until_sent poll timeout
serial: uapi: Declare all userspace-visible io types
serial: core: Fix iotype userspace breakage
serial: sprd: Fix missing spin_unlock in sprd_handle_irq()
console: Fix console name size mismatch
tty: fix up atime/mtime mess, take four
serial: 8250_dw: Fix get_mctrl behaviour
serial:8250:8250_pci: delete unneeded quirk entries
serial:8250:8250_pci: fix redundant entry report for WCH_CH352_2S
Change email address for 8250_pci
serial: 8250: Revert "tty: serial: 8250_core: read only RX if there is something in the FIFO"
Revert "tty/serial: of_serial: add DT alias ID handling"

Linus Torvalds
2015-03-09 03:25:40 +0800

08 Mar, 2015

1 commit

9aae0df6a Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 fixes from Catalin Marinas:
"arm64 and generic kernel/module.c (acked by Rusty) fixes for
CONFIG_DEBUG_SET_MODULE_RONX"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kernel/module.c: Update debug alignment after symtable generation
arm64: Don't use is_module_addr in setting page attributes

Linus Torvalds
2015-03-08 03:31:17 +0800

07 Mar, 2015

3 commits

3ba67daba ebpf: bpf_map_*: fix linker error on avr32 and openrisc arch ... Browse Code »

Fengguang reported, that on openrisc and avr32 architectures, we
get the following linker errors on *_defconfig builds that have
no bpf syscall support:

net/built-in.o:(.rodata+0x1cd0): undefined reference to `bpf_map_lookup_elem_proto'
net/built-in.o:(.rodata+0x1cd4): undefined reference to `bpf_map_update_elem_proto'
net/built-in.o:(.rodata+0x1cd8): undefined reference to `bpf_map_delete_elem_proto'

Fix it up by providing built-in weak definitions of the symbols,
so they can be overridden when the syscall is enabled. I think
the issue might be that gcc is not able to optimize all that away.
This patch fixes the linker errors for me, tested with Fengguang's
make.cross [1] script.

[1] https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross

Reported-by: Fengguang Wu
Fixes: d4052c4aea0c ("ebpf: remove CONFIG_BPF_SYSCALL ifdefs in socket filter code")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-07 10:50:55 +0800
30a22c215 console: Fix console name size mismatch ... Browse Code »

commit 6ae9200f2cab7 ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.

Cc: # 2.6.22+
Signed-off-by: Peter Hurley
Signed-off-by: Greg Kroah-Hartman

Peter Hurley
2015-03-07 10:39:55 +0800
0d9b9c167 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching ... Browse Code »

Pull livepatching fix from Jiri Kosina:
"Fix an RCU unlock misplacement in live patching infrastructure, from
Peter Zijlstra"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: fix RCU usage in klp_find_external_symbol()

Linus Torvalds
2015-03-07 05:47:56 +0800

06 Mar, 2015

4 commits

168e47f2a kernel/module.c: Update debug alignment after symtable generation ... Browse Code »

When CONFIG_DEBUG_SET_MODULE_RONX is enabled, the sizes of
module sections are aligned up so appropriate permissions can
be applied. Adjusting for the symbol table may cause them to
become unaligned. Make sure to re-align the sizes afterward.

Signed-off-by: Laura Abbott
Acked-by: Rusty Russell
Signed-off-by: Catalin Marinas

Laura Abbott
2015-03-06 20:04:22 +0800
79d223646 Merge branch 'irq-pm' ... Browse Code »

* irq-pm:
genirq / PM: describe IRQF_COND_SUSPEND
tty: serial: atmel: rework interrupt and wakeup handling
watchdog: at91sam9: request the irq with IRQF_NO_SUSPEND
clk: at91: implement suspend/resume for the PMC irqchip
rtc: at91rm9200: rework wakeup and interrupt handling
rtc: at91sam9: rework wakeup and interrupt handling
PM / wakeup: export pm_system_wakeup symbol
genirq / PM: Add flag for shared NO_SUSPEND interrupt lines
genirq / PM: better describe IRQF_NO_SUSPEND semantics

Rafael J. Wysocki
2015-03-06 08:29:05 +0800
eef16e436 Merge branch 'suspend-to-idle' ... Browse Code »

* suspend-to-idle:
cpuidle / sleep: Use broadcast timer for states that stop local timer
cpuidle: Clean up fallback handling in cpuidle_idle_call()
cpuidle / sleep: Do sanity checks in cpuidle_enter_freeze() too
idle / sleep: Avoid excessive disabling and enabling interrupts

Rafael J. Wysocki
2015-03-06 06:14:51 +0800
ef2b22ac5 cpuidle / sleep: Use broadcast timer for states that stop local timer ... Browse Code »

Commit 381063133246 (PM / sleep: Re-implement suspend-to-idle handling)
overlooked the fact that entering some sufficiently deep idle states
by CPUs may cause their local timers to stop and in those cases it
is necessary to switch over to a broadcast timer prior to entering
the idle state. If the cpuidle driver in use does not provide
the new ->enter_freeze callback for any of the idle states, that
problem affects suspend-to-idle too, but it is not taken into account
after the changes made by commit 381063133246.

Fix that by changing the definition of cpuidle_enter_freeze() and
re-arranging of the code in cpuidle_idle_call(), so the former does
not call cpuidle_enter() any more and the fallback case is handled
by cpuidle_idle_call() directly.

Fixes: 381063133246 (PM / sleep: Re-implement suspend-to-idle handling)
Reported-and-tested-by: Lorenzo Pieralisi
Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)

Rafael J. Wysocki
2015-03-06 06:13:19 +0800

05 Mar, 2015

2 commits

8603e1b30 workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE ... Browse Code »

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation. In that case, try_to_grab_pending() returns -ENOENT. In
this case, __cancel_work_timer() currently invokes flush_work(). The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority. Let's say task A just got woken
up from flush_work() by the completion of the target work item. If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing. This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A task B worker

executing work
__cancel_work_timer()
try_to_grab_pending()
set work CANCELING
flush_work()
block for work completion
completion, wakes up A
__cancel_work_timer()
while (forever) {
try_to_grab_pending()
-ENOENT as work is being canceled
flush_work()
false as work is no longer executing
}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
area. Switched to custom wake function which matches the target
work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
the target bit waitqueue has wait_bit_queue's on it. Use
DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu
Vizoso.

Signed-off-by: Tejun Heo
Reported-by: Rabin Vincent
Cc: Tomeu Vizoso
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson
Tested-by: Rabin Vincent

Tejun Heo
2015-03-05 21:04:13 +0800
17f480342 genirq / PM: Add flag for shared NO_SUSPEND interrupt lines ... Browse Code »

It currently is required that all users of NO_SUSPEND interrupt
lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
WARN_ON_ONCE() in irq_pm_install_action() will trigger. That is
done to warn about situations in which unprepared interrupt handlers
may be run unnecessarily for suspended devices and may attempt to
access those devices by mistake. However, it may cause drivers
that have no technical reasons for using IRQF_NO_SUSPEND to set
that flag just because they happen to share the interrupt line
with something like a timer.

Moreover, the generic handling of wakeup interrupts introduced by
commit 9ce7a25849e8 (genirq: Simplify wakeup mechanism) only works
for IRQs without any NO_SUSPEND users, so the drivers of wakeup
devices needing to use shared NO_SUSPEND interrupt lines for
signaling system wakeup generally have to detect wakeup in their
interrupt handlers. Thus if they happen to share an interrupt line
with a NO_SUSPEND user, they also need to request that their
interrupt handlers be run after suspend_device_irqs().

In both cases the reason for using IRQF_NO_SUSPEND is not because
the driver in question has a genuine need to run its interrupt
handler after suspend_device_irqs(), but because it happens to
share the line with some other NO_SUSPEND user. Otherwise, the
driver would do without IRQF_NO_SUSPEND just fine.

To make it possible to specify that condition explicitly, introduce
a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
that, when set, will indicate to the IRQ core that the interrupt
user is generally fine with suspending the IRQ, but it also can
tolerate handler invocations after suspend_device_irqs() and, in
particular, it is capable of detecting system wakeup and triggering
it as appropriate from its interrupt handler.

That will allow us to work around a problem with a shared timer
interrupt line on at91 platforms.

Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
Link: http://marc.info/?t=142252775300011&r=1&w=2
Link: https://lkml.org/lkml/2014/12/15/552
Reported-by: Boris Brezillon
Signed-off-by: Rafael J. Wysocki
Acked-by: Peter Zijlstra (Intel)
Acked-by: Mark Rutland

Rafael J. Wysocki
2015-03-05 04:42:19 +0800

04 Mar, 2015

1 commit

71a83a6db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller

David S. Miller
2015-03-04 10:16:48 +0800

03 Mar, 2015

5 commits

c064a0de1 livepatch: fix RCU usage in klp_find_external_symbol() ... Browse Code »

While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
one must equally hold it over the use of the object returned.

The moment you release the RCU-sched read lock, the object can be dead
and gone.

[jkosina@suse.cz: change subject line to be aligned with other patches]
Cc: Seth Jennings
Cc: Josh Poimboeuf
Cc: Masami Hiramatsu
Cc: Miroslav Benes
Cc: Petr Mladek
Cc: Jiri Kosina
Cc: "Paul E. McKenney"
Cc: Rusty Russell
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Masami Hiramatsu
Acked-by: Paul E. McKenney
Acked-by: Josh Poimboeuf
Signed-off-by: Jiri Kosina

Peter Zijlstra
2015-03-03 07:22:55 +0800
dfcacc154 cpuidle: Clean up fallback handling in cpuidle_idle_call() ... Browse Code »

Move the fallback code path in cpuidle_idle_call() to the end of the
function to avoid jumping to a label in an if () branch.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2015-03-03 05:25:37 +0800
283cb41f4 cpuset: Fix cpuset sched_relax_domain_level ... Browse Code »

The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.

The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.

This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.

Fixes: fc560a26acce ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
Cc: # 3.9+
Signed-off-by: Jason Low
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo
Tested-by: Serge Hallyn

Jason Low
2015-03-03 00:55:04 +0800
79063bffc cpuset: fix a warning when clearing configured masks in old hierarchy ... Browse Code »

When we clear cpuset.cpus, cpuset.effective_cpus won't be cleared:

# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo 0 > /mnt/tmp/cpuset.cpus
# echo > /mnt/tmp/cpuset.cpus
# cat cpuset.cpus

# cat cpuset.effective_cpus
0-15

And a kernel warning in update_cpumasks_hier() is triggered:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 4028 at kernel/cpuset.c:894 update_cpumasks_hier+0x471/0x650()

Cc: # 3.17+
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo
Tested-by: Serge Hallyn

Zefan Li
2015-03-03 00:55:04 +0800
790317e1b cpuset: initialize effective masks when clone_children is enabled ... Browse Code »

If clone_children is enabled, effective masks won't be initialized
due to the bug:

# mount -t cgroup -o cpuset xxx /mnt
# echo 1 > cgroup.clone_children
# mkdir /mnt/tmp
# cat /mnt/tmp/
# cat cpuset.effective_cpus

# cat cpuset.cpus
0-15

And then this cpuset won't constrain the tasks in it.

Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.

Reported-by: Christian Brauner
Reported-by: Serge Hallyn
Cc: # 3.17+
Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo
Tested-by: Serge Hallyn

Zefan Li
2015-03-03 00:55:04 +0800

02 Mar, 2015

5 commits

2ea51b884 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fix from Ingo Molnar:
"An rtmutex deadlock path fixlet"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Set state back to running on error

Linus Torvalds
2015-03-02 03:27:04 +0800
e2e9b6541 cls_bpf: add initial eBPF support for programmable classifiers ... Browse Code »

This work extends the "classic" BPF programmable tc classifier by
extending its scope also to native eBPF code!

This allows for user space to implement own custom, 'safe' C like
classifiers (or whatever other frontend language LLVM et al may
provide in future), that can then be compiled with the LLVM eBPF
backend to an eBPF elf file. The result of this can be loaded into
the kernel via iproute2's tc. In the kernel, they can be JITed on
major archs and thus run in native performance.

Simple, minimal toy example to demonstrate the workflow:

#include
#include
#include

#include "tc_bpf_api.h"

__section("classify")
int cls_main(struct sk_buff *skb)
{
return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos));
}

char __license[] __section("license") = "GPL";

The classifier can then be compiled into eBPF opcodes and loaded
via tc, for example:

clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
tc filter add dev em1 parent 1: bpf cls.o [...]

As it has been demonstrated, the scope can even reach up to a fully
fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c).

For tc, maps are allowed to be used, but from kernel context only,
in other words, eBPF code can keep state across filter invocations.
In future, we perhaps may reattach from a different application to
those maps e.g., to read out collected statistics/state.

Similarly as in socket filters, we may extend functionality for eBPF
classifiers over time depending on the use cases. For that purpose,
cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so
we can allow additional functions/accessors (e.g. an ABI compatible
offset translation to skb fields/metadata). For an initial cls_bpf
support, we allow the same set of helper functions as eBPF socket
filters, but we could diverge at some point in time w/o problem.

I was wondering whether cls_bpf and act_bpf could share C programs,
I can imagine that at some point, we introduce i) further common
handlers for both (or even beyond their scope), and/or if truly needed
ii) some restricted function space for each of them. Both can be
abstracted easily through struct bpf_verifier_ops in future.

The context of cls_bpf versus act_bpf is slightly different though:
a cls_bpf program will return a specific classid whereas act_bpf a
drop/non-drop return code, latter may also in future mangle skbs.
That said, we can surely have a "classify" and "action" section in
a single object file, or considered mentioned constraint add a
possibility of a shared section.

The workflow for getting native eBPF running from tc [1] is as
follows: for f_bpf, I've added a slightly modified ELF parser code
from Alexei's kernel sample, which reads out the LLVM compiled
object, sets up maps (and dynamically fixes up map fds) if any, and
loads the eBPF instructions all centrally through the bpf syscall.

The resulting fd from the loaded program itself is being passed down
to cls_bpf, which looks up struct bpf_prog from the fd store, and
holds reference, so that it stays available also after tc program
lifetime. On tc filter destruction, it will then drop its reference.

Moreover, I've also added the optional possibility to annotate an
eBPF filter with a name (e.g. path to object file, or something
else if preferred) so that when tc dumps currently installed filters,
some more context can be given to an admin for a given instance (as
opposed to just the file descriptor number).

Last but not least, bpf_prog_get() and bpf_prog_put() needed to be
exported, so that eBPF can be used from cls_bpf built as a module.
Thanks to 60a3b2253c41 ("net: bpf: make eBPF interpreter images
read-only") I think this is of no concern since anything wanting to
alter eBPF opcode after verification stage would crash the kernel.

[1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf

Signed-off-by: Daniel Borkmann
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-02 03:05:19 +0800
24701ecea ebpf: move read-only fields to bpf_prog and shrink bpf_prog_aux ... Browse Code »

is_gpl_compatible and prog_type should be moved directly into bpf_prog
as they stay immutable during bpf_prog's lifetime, are core attributes
and they can be locked as read-only later on via bpf_prog_select_runtime().

With a bit of rearranging, this also allows us to shrink bpf_prog_aux
to exactly 1 cacheline.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-02 03:05:19 +0800
96be4325f ebpf: add sched_cls_type and map it to sk_filter's verifier ops ... Browse Code »

As discussed recently and at netconf/netdev01, we want to prevent making
bpf_verifier_ops registration available for modules, but have them at a
controlled place inside the kernel instead.

The reason for this is, that out-of-tree modules can go crazy and define
and register any verfifier ops they want, doing all sorts of crap, even
bypassing available GPLed eBPF helper functions. We don't want to offer
such a shiny playground, of course, but keep strict control to ourselves
inside the core kernel.

This also encourages us to design eBPF user helpers carefully and
generically, so they can be shared among various subsystems using eBPF.

For the eBPF traffic classifier (cls_bpf), it's a good start to share
the same helper facilities as we currently do in eBPF for socket filters.

That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus
one day if there's a good reason to diverge the set of helper functions
from the set available to socket filters, we keep ABI compatibility.

In future, we could place all bpf_prog_type_list at a central place,
perhaps.

Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-02 03:05:19 +0800
a2c83fff5 ebpf: constify various function pointer structs ... Browse Code »

We can move bpf_map_ops and bpf_verifier_ops and other structs into ro
section, bpf_map_type_list and bpf_prog_type_list into read mostly.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-03-02 03:05:18 +0800