Eric Lee / smarc-fsl-linux-kernel

27 Nov, 2017

3 commits

dec0029a5 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq fixes from Thomas Glexiner:

- unbreak the irq trigger type check for legacy platforms

- a handful fixes for ARM GIC v3/4 interrupt controllers

- a few trivial fixes all over the place

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Make - vs ?: Precedence explicit
irqchip/imgpdc: Use resource_size function on resource object
irqchip/qcom: Fix u32 comparison with value less than zero
irqchip/exiu: Fix return value check in exiu_init()
irqchip/gic-v3-its: Remove artificial dependency on PCI
irqchip/gic-v4: Add forward definition of struct irq_domain_ops
irqchip/gic-v3: pr_err() strings should end with newlines
irqchip/s3c24xx: pr_err() strings should end with newlines
irqchip/gic-v3: Fix ppi-partitions lookup
irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
genirq: Track whether the trigger type has been set

Linus Torvalds
2017-11-27 06:39:20 +0800
580e3d552 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc fixes: two PMU driver fixes and a memory leak fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix memory leak triggered by perf --namespace
perf/x86/intel/uncore: Add event constraint for BDX PCU
perf/x86/intel: Hide TSX events when RTM is not supported

Linus Torvalds
2017-11-27 05:41:48 +0800
cd4b5d5d2 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull static key fix from Ingo Molnar:
"Fix a boot warning related to bad init ordering of the static keys
self-test"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_label: Invoke jump_label_test() via early_initcall()

Linus Torvalds
2017-11-27 05:36:54 +0800

26 Nov, 2017

1 commit

844056fd7 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Thomas Gleixner:

- The final conversion of timer wheel timers to timer_setup().

A few manual conversions and a large coccinelle assisted sweep and
the removal of the old initialization mechanisms and the related
code.

- Remove the now unused VSYSCALL update code

- Fix permissions of /proc/timer_list. I still need to get rid of that
file completely

- Rename a misnomed clocksource function and remove a stale declaration

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
m68k/macboing: Fix missed timer callback assignment
treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
timer: Remove redundant __setup_timer*() macros
timer: Pass function down to initialization routines
timer: Remove unused data arguments from macros
timer: Switch callback prototype to take struct timer_list * argument
timer: Pass timer_list pointer to callbacks unconditionally
Coccinelle: Remove setup_timer.cocci
timer: Remove setup_*timer() interface
timer: Remove init_timer() interface
treewide: setup_timer() -> timer_setup() (2 field)
treewide: setup_timer() -> timer_setup()
treewide: init_timer() -> setup_timer()
treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
s390: cmm: Convert timers to use timer_setup()
lightnvm: Convert timers to use timer_setup()
drivers/net: cris: Convert timers to use timer_setup()
drm/vc4: Convert timers to use timer_setup()
block/laptop_mode: Convert timers to use timer_setup()
net/atm/mpc: Avoid open-coded assignment of timer callback function
...

Linus Torvalds
2017-11-26 02:37:16 +0800

24 Nov, 2017

2 commits

75f113387 genirq/matrix: Make - vs ?: Precedence explicit ... Browse Code »

Noticed with a Clang build. This improves the readability of the ?:
expression, as it has lower precedence than the - expression. Show
explicitly that - is evaluated first.

Signed-off-by: Kees Cook
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20171122205645.GA27125@beast

Kees Cook
2017-11-24 03:09:31 +0800
e4be7baba Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf ... Browse Code »

Daniel Borkmann says:

====================
pull-request: bpf 2017-11-23

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Several BPF offloading fixes, from Jakub. Among others:

- Limit offload to cls_bpf and XDP program types only.
- Move device validation into the driver and don't make
any assumptions about the device in the classifier due
to shared blocks semantics.
- Don't pass offloaded XDP program into the driver when
it should be run in native XDP instead. Offloaded ones
are not JITed for the host in such cases.
- Don't destroy device offload state when moved to
another namespace.
- Revert dumping offload info into user space for now,
since ifindex alone is not sufficient. This will be
redone properly for bpf-next tree.

2) Fix test_verifier to avoid using bpf_probe_write_user()
helper in test cases, since it's dumping a warning into
kernel log which may confuse users when only running tests.
Switch to use bpf_trace_printk() instead, from Yonghong.

3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
before it becomes uabi, from Gianluca. More specifically:

- Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
by bpf_csum_diff(), where the argument is either a
valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
then enforces a valid pointer in case of non-0 size
or a valid pointer or NULL in case of size 0. Given
that, the semantics for ARG_PTR_TO_MEM in combination
with ARG_CONST_SIZE_OR_ZERO are now such that in case
of size 0, the pointer must always be valid and cannot
be NULL. This fix in semantics allows for bpf_probe_read()
to drop the recently added size == 0 check in the helper
that would become part of uabi otherwise once released.
At the same time we can then fix bpf_probe_read_str() and
bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
instead of ARG_CONST_SIZE in order to fix recently
reported issues by Arnaldo et al, where LLVM optimizes
two boundary checks into a single one for unknown
variables where the verifier looses track of the variable
bounds and thus rejects valid programs otherwise.

4) A fix for the verifier for the case when it detects
comparison of two constants where the branch is guaranteed
to not be taken at runtime. Verifier will rightfully prune
the exploration of such paths, but we still pass the program
to JITs, where they would complain about using reserved
fields, etc. Track such dead instructions and sanitize
them with mov r0,r0. Rejection is not possible since LLVM
may generate them for valid C code and doesn't do as much
data flow analysis as verifier. For bpf-next we might
implement removal of such dead code and adjust branches
instead. Fix from Alexei.
====================

Signed-off-by: David S. Miller

David S. Miller
2017-11-24 01:33:01 +0800

23 Nov, 2017

6 commits

866c9b94e Merge tag 'for-linus-timers-conversion-final-v4.15-rc1' of git://git.kernel.org/… ... Browse Code »

…pub/scm/linux/kernel/git/kees/linux into timers/urgent

Pull the last batch of manual timer conversions from Kees Cook:

- final batch of "non trivial" timer conversions (multi-tree dependencies,
things Coccinelle couldn't handle, etc).

- treewide conversions via Coccinelle, in 4 steps:
- DEFINE_TIMER() functions converted to struct timer_list * argument
- init_timer() -> setup_timer()
- setup_timer() -> timer_setup()
- setup_timer() -> timer_setup() (with a single embedded structure)

- deprecated timer API removals (init_timer(), setup_*timer())

- finalization of new API (remove global casts)

Thomas Gleixner
2017-11-23 23:29:05 +0800
c131187db bpf: fix branch pruning logic ... Browse Code »

when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Alexei Starovoitov
2017-11-23 17:56:35 +0800
a60dd35d2 bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO ... Browse Code »

Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
the compiler generates optimized BPF code when checking boundaries of an
argument from C code. A typical example of this optimized code can be
generated using the bpf_perf_event_output helper when operating on variable
memory:

/* len is a generic scalar */
if (len > 0 && len 0x7ffe goto pc+6
114: (bf) r1 = r6
115: (18) r2 = 0xffff94e5f166c200
117: (b7) r3 = 0
118: (bf) r4 = r7
119: (85) call bpf_perf_event_output#25
R5 min value is negative, either use unsigned or 'var &= const'

With this code, the verifier loses track of the variable.

Replacing arg5 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
avoids this quite common case which leads to usability issues, and the
compiler generates code that the verifier can more easily test:

if (len
Signed-off-by: Gianluca Borello
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Gianluca Borello
2017-11-23 04:40:54 +0800
5c4e12017 bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO ... Browse Code »

Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
the compiler generates optimized BPF code when checking boundaries of an
argument from C code. A typical example of this optimized code can be
generated using the bpf_probe_read_str helper when operating on variable
memory:

/* len is a generic scalar */
if (len > 0 && len 0x7ffe goto pc-42
254: (bf) r1 = r7
255: (79) r2 = *(u64 *)(r10 -88)
256: (bf) r8 = r4
257: (85) call bpf_probe_read_str#45
R2 min value is negative, either use unsigned or 'var &= const'

With this code, the verifier loses track of the variable.

Replacing arg2 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
avoids this quite common case which leads to usability issues, and the
compiler generates code that the verifier can more easily test:

if (len
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Gianluca Borello
2017-11-23 04:40:54 +0800
eb33f2cca bpf: remove explicit handling of 0 for arg2 in bpf_probe_read ... Browse Code »

Commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to
ARG_CONST_SIZE_OR_ZERO") changed arg2 type to ARG_CONST_SIZE_OR_ZERO to
simplify writing bpf programs by taking advantage of the new semantics
introduced for ARG_CONST_SIZE_OR_ZERO which allows <!NULL, 0> arguments.

In order to prevent the helper from actually passing a NULL pointer to
probe_kernel_read, which can happen when is passed to the helper,
the commit also introduced an explicit check against size == 0.

After the recent introduction of the ARG_PTR_TO_MEM_OR_NULL type,
bpf_probe_read can not receive a pair of arguments anymore, thus
the check is not needed anymore and can be removed, since probe_kernel_read
can correctly handle a <!NULL, 0> call. This also fixes the semantics of
the helper before it gets officially released and bpf programs start
relying on this check.

Fixes: 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO")
Signed-off-by: Gianluca Borello
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Acked-by: Yonghong Song
Signed-off-by: Daniel Borkmann

Gianluca Borello
2017-11-23 04:40:54 +0800
db1ac4964 bpf: introduce ARG_PTR_TO_MEM_OR_NULL ... Browse Code »

With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
and the verifier can prove the value of this next argument is 0. However,
most helpers are just interested in handling <!NULL, 0>, so forcing them to
deal with makes the implementation of those helpers more
complicated for no apparent benefits, requiring them to explicitly handle
those corner cases with checks that bpf programs could start relying upon,
preventing the possibility of removing them later.

Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.

Currently, the only helper that needs this is bpf_csum_diff_proto(), so
change arg1 and arg3 to this new type as well.

Also add a new battery of tests that explicitly test the
!ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
various variations are focused on bpf_csum_diff, so cover also
other helpers.

Signed-off-by: Gianluca Borello
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Gianluca Borello
2017-11-23 04:40:54 +0800

22 Nov, 2017

7 commits

841b86f32 treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts ... Browse Code »

With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

perl -pi -e 's|$TIMER_FUNC_TYPE$||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

perl -pi -e 's|$TIMER_DATA_TYPE$||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 08:35:54 +0800
188665b2d timer: Pass function down to initialization routines ... Browse Code »

In preparation for removing more macros, pass the function down to the
initialization routines instead of doing it in macros.

Cc: Thomas Gleixner
Cc: John Stultz
Cc: Stephen Boyd
Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:14 +0800
354b46b1a timer: Switch callback prototype to take struct timer_list * argument ... Browse Code »

Since all callbacks have been converted, we can switch the core
prototype to "struct timer_list *" now too.

Cc: Thomas Gleixner
Cc: John Stultz
Cc: Stephen Boyd
Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:13 +0800
c1eba5bcb timer: Pass timer_list pointer to callbacks unconditionally ... Browse Code »

Now that all timer callbacks are already taking their struct timer_list
pointer as the callback argument, just do this unconditionally and remove
the .data field.

Cc: Thomas Gleixner
Cc: John Stultz
Cc: Stephen Boyd
Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:12 +0800
e99e88a9d treewide: setup_timer() -> timer_setup() ... Browse Code »

This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

void my_callback(unsigned long data)
{
struct something *ptr = (struct something *)data;
...
}
...
setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

void my_callback(struct something *ptr)
{
...
}
...
setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

void my_callback(struct timer_list *t)
{
struct something *ptr = from_timer(ptr, t, my_timer);
...
}
...
timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

void my_callback(unsigned long data)
{
struct something *ptr = (struct something *)data;
...
}
...
ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

void my_callback(struct timer_list *t)
{
struct something *ptr = from_timer(ptr, t, my_timer);
...
}
...
ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

void my_callback(unsigned long data)
{
...
}
...
setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

void my_callback(struct timer_list *unused)
{
...
}
...
timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
-I ./arch/x86/include -I ./arch/x86/include/generated \
-I ./include -I ./arch/x86/include/uapi \
-I ./arch/x86/include/generated/uapi -I ./include/uapi \
-I ./include/generated/uapi --include ./include/linux/kconfig.h \
--dir . \
--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

setup_timer(
-&(e)
+&e
, ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
_E->_timer@_stl.function = _callback;
|
_E->_timer@_stl.function = &_callback;
|
_E->_timer@_stl.function = (_cast_func)_callback;
|
_E->_timer@_stl.function = (_cast_func)&_callback;
|
_E._timer@_stl.function = _callback;
|
_E._timer@_stl.function = &_callback;
|
_E._timer@_stl.function = (_cast_func)_callback;
|
_E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

void _callback(
-_origtype _origarg
+struct timer_list *t
)
{
(
... when != _origarg
_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
|
... when != _origarg
_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
|
... when != _origarg
_handletype *_handle;
... when != _handle
_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
|
... when != _origarg
_handletype *_handle;
... when != _handle
_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
)
}

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
depends on change_timer_function_usage &&
!change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

void _callback(
-_origtype _origarg
+struct timer_list *t
)
{
+ _handletype *_origarg = from_timer(_origarg, t, _timer);
+
... when != _origarg
- (_handletype *)_origarg
+ _origarg
... when != _origarg
}

// Avoid already converted callbacks.
@match_callback_converted
depends on change_timer_function_usage &&
!change_callback_handle_cast &&
!change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

void _callback(struct timer_list *t)
{ ... }

// callback(struct something *handle)
@change_callback_handle_arg
depends on change_timer_function_usage &&
!match_callback_converted &&
!change_callback_handle_cast &&
!change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

void _callback(
-_handletype *_handle
+struct timer_list *t
)
{
+ _handletype *_handle = from_timer(_handle, t, _timer);
...
}

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
depends on change_timer_function_usage &&
change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

void _callback(struct timer_list *t)
{
- _handletype *_handle = from_timer(_handle, t, _timer);
}

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
depends on change_timer_function_usage &&
!change_callback_handle_cast &&
!change_callback_handle_cast_no_arg &&
!change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
depends on change_timer_function_usage &&
(change_callback_handle_cast ||
change_callback_handle_cast_no_arg ||
change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
_E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
;
|
_E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
depends on change_timer_function_usage &&
(change_callback_handle_cast ||
change_callback_handle_cast_no_arg ||
change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

_callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
)

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

void _callback(
-_origtype _origarg
+struct timer_list *unused
)
{
... when != _origarg
}

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:07 +0800
b9eaf1872 treewide: init_timer() -> setup_timer() ... Browse Code »

This mechanically converts all remaining cases of ancient open-coded timer
setup with the old setup_timer() API, which is the first step in timer
conversions. This has no behavioral changes, since it ultimately just
changes the order of assignment to fields of struct timer_list when
finding variations of:

init_timer(&t);
f.function = timer_callback;
t.data = timer_callback_arg;

to be converted into:

setup_timer(&t, timer_callback, timer_callback_arg);

The conversion is done with the following Coccinelle script, which
is an improved version of scripts/cocci/api/setup_timer.cocci, in the
following ways:
- assignments-before-init_timer() cases
- limit the .data case removal to the specific struct timer_list instance
- handling calls by dereference (timer->field vs timer.field)

spatch --very-quiet --all-includes --include-headers \
-I ./arch/x86/include -I ./arch/x86/include/generated \
-I ./include -I ./arch/x86/include/uapi \
-I ./arch/x86/include/generated/uapi -I ./include/uapi \
-I ./include/generated/uapi --include ./include/linux/kconfig.h \
--dir . \
--cocci-file ~/src/data/setup_timer.cocci

@fix_address_of@
expression e;
@@

init_timer(
-&(e)
+&e
, ...)

// Match the common cases first to avoid Coccinelle parsing loops with
// "... when" clauses.

@match_immediate_function_data_after_init_timer@
expression e, func, da;
@@

-init_timer
+setup_timer
( $&e\|e$
+, func, da
);
(
-$e.function\|e->function$ = func;
-$e.data\|e->data$ = da;
|
-$e.data\|e->data$ = da;
-$e.function\|e->function$ = func;
)

@match_immediate_function_data_before_init_timer@
expression e, func, da;
@@

(
-$e.function\|e->function$ = func;
-$e.data\|e->data$ = da;
|
-$e.data\|e->data$ = da;
-$e.function\|e->function$ = func;
)
-init_timer
+setup_timer
( $&e\|e$
+, func, da
);

@match_function_and_data_after_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@

-init_timer
+setup_timer
( $&e\|e$
+, func, da
);
... when != func = e2
when != da = e3
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)

@match_function_and_data_before_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)
... when != func = e2
when != da = e3
-init_timer
+setup_timer
( $&e\|e$
+, func, da
);

@r1 exists@
expression t;
identifier f;
position p;
@@

f(...) { ... when any
init_timer@p($&t\|t$)
... when any
}

@r2 exists@
expression r1.t;
identifier g != r1.f;
expression e8;
@@

g(...) { ... when any
$t.data\|t->data$ = e8
... when any
}

// It is dangerous to use setup_timer if data field is initialized
// in another function.
@script:python depends on r2@
p << r1.p;
@@

cocci.include_match(False)

@r3@
expression r1.t, func, e7;
position r1.p;
@@

(
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
... when != func = e7
-t.function = func;
|
-t.function = func;
... when != func = e7
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
|
-init_timer@p(t);
+setup_timer(t, func, 0UL);
... when != func = e7
-t->function = func;
|
-t->function = func;
... when != func = e7
-init_timer@p(t);
+setup_timer(t, func, 0UL);
)

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:06 +0800
24ed960ab treewide: Switch DEFINE_TIMER callbacks to struct timer_list * ... Browse Code »

This changes all DEFINE_TIMER() callbacks to use a struct timer_list
pointer instead of unsigned long. Since the data argument has already been
removed, none of these callbacks are using their argument currently, so
this renames the argument to "unused".

Done using the following semantic patch:

@match_define_timer@
declarer name DEFINE_TIMER;
identifier _timer, _callback;
@@

DEFINE_TIMER(_timer, _callback);

@change_callback depends on match_define_timer@
identifier match_define_timer._callback;
type _origtype;
identifier _origarg;
@@

void
-_callback(_origtype _origarg)
+_callback(struct timer_list *unused)
{ ... }

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:05 +0800

21 Nov, 2017

8 commits

11ca75d2d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk ... Browse Code »

Pull printk updates from Petr Mladek:

- print the warning about dropped messages on consoles on a separate
line. It makes it more legible.

- one typo fix and small code clean up.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
added new line symbol after warning about dropped messages
printk: fix typo in printk_safe.c
printk: simplify no_printk()

Linus Torvalds
2017-11-21 23:28:13 +0800
1ee640095 bpf: revert report offload info to user space ... Browse Code »

This reverts commit bd601b6ada11 ("bpf: report offload info to user
space"). The ifindex by itself is not sufficient, we should provide
information on which network namespace this ifindex belongs to.
After considering some options we concluded that it's best to just
remove this API for now, and rework it in -next.

Signed-off-by: Jakub Kicinski
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800
62c71b45e bpf: offload: ignore namespace moves ... Browse Code »

We are currently destroying the device offload state when device
moves to another net namespace. This doesn't break with current
NFP code, because offload state is not used on program removal,
but it's not correct behaviour.

Ignore the device unregister notifications on namespace move.

Signed-off-by: Jakub Kicinski
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800
479321e9c bpf: turn bpf_prog_get_type() into a wrapper ... Browse Code »

bpf_prog_get_type() is identical to bpf_prog_get_type_dev(),
with false passed as attach_drv. Instead of keeping it as
an exported symbol turn it into static inline wrapper.

Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800
288b3de55 bpf: offload: move offload device validation out to the drivers ... Browse Code »

With TC shared block changes we can't depend on correct netdev
pointer being available in cls_bpf. Move the device validation
to the driver. Core will only make sure that offloaded programs
are always attached in the driver (or in HW by the driver). We
trust that drivers which implement offload callbacks will perform
necessary checks.

Moving the checks to the driver is generally a useful thing,
in practice the check should be against a switchdev instance,
not a netdev, given that most ASICs will probably allow using
the same program on many ports.

Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Acked-by: Jiri Pirko
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800
1f6f4cb7b bpf: offload: rename the ifindex field ... Browse Code »

bpf_target_prog seems long and clunky, rename it to prog_ifindex.
We don't want to call this field just ifindex, because maps
may need a similar field in the future and bpf_attr members for
programs and maps are unnamed.

Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800
649f11dcd bpf: offload: limit offload to cls_bpf and xdp programs only ... Browse Code »

We are currently only allowing attachment of device-bound
cls_bpf and XDP programs. Make this restriction explicit in
the BPF offload code. This way we can potentially reuse the
ifindex field in the future.

Since XDP and cls_bpf programs can only be loaded by admin,
we can drop the explicit capability check from offload code.

Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800
13a9c48a8 bpf: offload: add comment warning developers about double destroy ... Browse Code »

Offload state may get destroyed either because the device for which
it was constructed is going away, or because the refcount of bpf
program itself has reached 0. In both of those cases we will call
__bpf_prog_offload_destroy() to unlink the offload from the device.
We may in fact call it twice, which works just fine, but we should
make clear this is intended and caution others trying to extend the
function.

Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann

Jakub Kicinski
2017-11-21 07:37:35 +0800

18 Nov, 2017

13 commits

fa7f57807 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- a bit more MM

- procfs updates

- dynamic-debug fixes

- lib/ updates

- checkpatch

- epoll

- nilfs2

- signals

- rapidio

- PID management cleanup and optimization

- kcov updates

- sysvipc updates

- quite a few misc things all over the place

* emailed patches from Andrew Morton : (94 commits)
EXPERT Kconfig menu: fix broken EXPERT menu
include/asm-generic/topology.h: remove unused parent_node() macro
arch/tile/include/asm/topology.h: remove unused parent_node() macro
arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
arch/sh/include/asm/topology.h: remove unused parent_node() macro
arch/ia64/include/asm/topology.h: remove unused parent_node() macro
drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
mm: add infrastructure for get_user_pages_fast() benchmarking
sysvipc: make get_maxid O(1) again
sysvipc: properly name ipc_addid() limit parameter
sysvipc: duplicate lock comments wrt ipc_addid()
sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
initramfs: use time64_t timestamps
drivers/watchdog: make use of devm_register_reboot_notifier()
kernel/reboot.c: add devm_register_reboot_notifier()
kcov: update documentation
Makefile: support flag -fsanitizer-coverage=trace-cmp
kcov: support comparison operands collection
kcov: remove pointless current != NULL check
kernel/panic.c: add TAINT_AUX
...

Linus Torvalds
2017-11-18 08:56:17 +0800
2d8364bae kernel/reboot.c: add devm_register_reboot_notifier() ... Browse Code »

Add devm_* wrapper around register_reboot_notifier to simplify device
specific reboot notifier registration/unregistration.

[akpm@linux-foundation.org: move `struct device' forward decl to top-of-file]
Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.com
Signed-off-by: Andrey Smirnov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Smirnov
2017-11-18 08:10:04 +0800
ded97d2c2 kcov: support comparison operands collection ... Browse Code »

Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing. E.g. they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented. Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
Signed-off-by: Victor Chibotaru
Signed-off-by: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Andrey Konovalov
Cc: Mark Rutland
Cc: Alexander Popov
Cc: Andrey Ryabinin
Cc: Kees Cook
Cc: Vegard Nossum
Cc: Quentin Casasnovas
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Victor Chibotaru
2017-11-18 08:10:04 +0800
fcf4edac0 kcov: remove pointless current != NULL check ... Browse Code »

__sanitizer_cov_trace_pc() is a hot code, so it's worth to remove
pointless '!current' check. Current is never NULL.

Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin
Acked-by: Dmitry Vyukov
Acked-by: Mark Rutland
Cc: Andrey Konovalov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2017-11-18 08:10:04 +0800
4efb442cc kernel/panic.c: add TAINT_AUX ... Browse Code »

This is the gist of a patch which we've been forward-porting in our
kernels for a long time now and it probably would make a good sense to
have such TAINT_AUX flag upstream which can be used by each distro etc,
how they see fit. This way, we won't need to forward-port a distro-only
version indefinitely.

Add an auxiliary taint flag to be used by distros and others. This
obviates the need to forward-port whatever internal solutions people
have in favor of a single flag which they can map arbitrarily to a
definition of their pleasing.

The "X" mnemonic could also mean eXternal, which would be taint from a
distro or something else but not the upstream kernel. We will use it to
mark modules for which we don't provide support. I.e., a really
eXternal module.

Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic
Signed-off-by: Borislav Petkov
Cc: Kees Cook
Cc: Jessica Yu
Cc: Peter Zijlstra
Cc: Jiri Slaby
Cc: Jiri Olsa
Cc: Michal Marek
Cc: Jiri Kosina
Cc: Takashi Iwai
Cc: Petr Mladek
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Borislav Petkov
2017-11-18 08:10:04 +0800
e8cfbc245 pid: remove pidhash ... Browse Code »

pidhash is no longer required as all the information can be looked up
from idr tree. nr_hashed represented the number of pids that had been
hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
it has been renamed to pid_allocated and PIDNS_ADDING respectively.

[gs051095@gmail.com: v6]
Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma
Reviewed-by: Rik van Riel
Tested-by: Tony Luck [ia64]
Cc: Julia Lawall
Cc: Ingo Molnar
Cc: Pavel Tatashin
Cc: Kirill Tkhai
Cc: Oleg Nesterov
Cc: Eric W. Biederman
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gargi Sharma
2017-11-18 08:10:04 +0800
95846ecf9 pid: replace pid bitmap implementation with IDR API ... Browse Code »

Patch series "Replacing PID bitmap implementation with IDR API", v4.

This series replaces kernel bitmap implementation of PID allocation with
IDR API. These patches are written to simplify the kernel by replacing
custom code with calls to generic code.

The following are the stats for pid and pid_namespace object files
before and after the replacement. There is a noteworthy change between
the IDR and bitmap implementation.

Before
text data bss dec hex filename
8447 3894 64 12405 3075 kernel/pid.o
After
text data bss dec hex filename
3397 304 0 3701 e75 kernel/pid.o

Before
text data bss dec hex filename
5692 1842 192 7726 1e2e kernel/pid_namespace.o
After
text data bss dec hex filename
2854 216 16 3086 c0e kernel/pid_namespace.o

The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.

ps:
With IDR API With bitmap
real 0m1.479s 0m2.319s
user 0m0.070s 0m0.060s
sys 0m0.289s 0m0.516s

pstree:
With IDR API With bitmap
real 0m1.024s 0m1.794s
user 0m0.348s 0m0.612s
sys 0m0.184s 0m0.264s

proc:
With IDR API With bitmap
real 0m0.059s 0m0.074s
user 0m0.000s 0m0.004s
sys 0m0.016s 0m0.016s

This patch (of 2):

Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc. are removed. The rest of the functions are
modified to use the IDR API. The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.

[gs051095@gmail.com: v6]
Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma
Reviewed-by: Rik van Riel
Acked-by: Oleg Nesterov
Cc: Julia Lawall
Cc: Ingo Molnar
Cc: Pavel Tatashin
Cc: Kirill Tkhai
Cc: Eric W. Biederman
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gargi Sharma
2017-11-18 08:10:03 +0800
f9eb2fdd0 kernel/sysctl.c: code cleanups ... Browse Code »

Remove unnecessary else block, remove redundant return and call to kfree
in if block.

Link: http://lkml.kernel.org/r/1510238435-1655-1-git-send-email-mail@okal.no
Signed-off-by: Ola N. Kaldestad
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ola N. Kaldestad
2017-11-18 08:10:03 +0800
de40ccefd kdump: print a message in case parse_crashkernel_mem resulted in zero bytes ... Browse Code »

parse_crashkernel_mem() silently returns if we get zero bytes in the
parsing function. It is useful for debugging to add a message,
especially if the kernel cannot boot correctly.

Add a pr_info instead of pr_warn because it is expected behavior for
size = 0, eg. crashkernel=2G-4G:128M, size will be 0 in case system
memory is less than 2G.

Link: http://lkml.kernel.org/r/20171114080129.GA6115@dhcp-128-65.nay.redhat.com
Signed-off-by: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Bhupesh Sharma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2017-11-18 08:10:03 +0800
426915796 kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() ... Browse Code »

complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.

If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.

After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.

This should hopefully fix the problem reported by Dmitry. This
test-case

static int init(void *arg)
{
for (;;)
pause();
}

int main(void)
{
char stack[16 * 1024];

for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);

assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);

assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}

triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.

And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.

Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov
Reported-by: Dmitry Vyukov
Reported-by: Kyle Huey
Reviewed-by: Kees Cook
Tested-by: Kyle Huey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2017-11-18 08:10:03 +0800
ac2538508 kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals ... Browse Code »

Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.

Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov
Tested-by: Kyle Huey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2017-11-18 08:10:03 +0800
628c1bcba kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL ... Browse Code »

The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.

Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().

SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.

Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov
Tested-by: Kyle Huey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2017-11-18 08:10:03 +0800
fb910c42c sysctl: check for UINT_MAX before unsigned int min/max ... Browse Code »

Mikulas noticed in the existing do_proc_douintvec_minmax_conv() and
do_proc_dopipe_max_size_conv() introduced in this patchset, that they
inconsistently handle overflow and min/max range inputs:

For example:

0 ... param->min - 1 ---> ERANGE
param->min ... param->max ---> the value is accepted
param->max + 1 ... 0x100000000L + param->min - 1 ---> ERANGE
0x100000000L + param->min ... 0x100000000L + param->max ---> EINVAL
0x100000000L + param->max + 1, 0x200000000L + param->min - 1 ---> ERANGE
0x200000000L + param->min ... 0x200000000L + param->max ---> EINVAL
0x200000000L + param->max + 1, 0x300000000L + param->min - 1 ---> ERANGE

In do_proc_do*() routines which store values into unsigned int variables
(4 bytes wide for 64-bit builds), first validate that the input unsigned
long value (8 bytes wide for 64-bit builds) will fit inside the smaller
unsigned int variable. Then check that the unsigned int value falls
inside the specified parameter min, max range. Otherwise the unsigned
long -> unsigned int conversion drops leading bits from the input value,
leading to the inconsistent pattern Mikulas documented above.

Link: http://lkml.kernel.org/r/1507658689-11669-5-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence
Reported-by: Mikulas Patocka
Reviewed-by: Mikulas Patocka
Cc: Al Viro
Cc: Jens Axboe
Cc: Michael Kerrisk
Cc: Randy Dunlap
Cc: Josh Poimboeuf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Lawrence
2017-11-18 08:10:03 +0800