27 Nov, 2017

3 commits

  • Pull irq fixes from Thomas Glexiner:

    - unbreak the irq trigger type check for legacy platforms

    - a handful fixes for ARM GIC v3/4 interrupt controllers

    - a few trivial fixes all over the place

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq/matrix: Make - vs ?: Precedence explicit
    irqchip/imgpdc: Use resource_size function on resource object
    irqchip/qcom: Fix u32 comparison with value less than zero
    irqchip/exiu: Fix return value check in exiu_init()
    irqchip/gic-v3-its: Remove artificial dependency on PCI
    irqchip/gic-v4: Add forward definition of struct irq_domain_ops
    irqchip/gic-v3: pr_err() strings should end with newlines
    irqchip/s3c24xx: pr_err() strings should end with newlines
    irqchip/gic-v3: Fix ppi-partitions lookup
    irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
    genirq: Track whether the trigger type has been set

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Misc fixes: two PMU driver fixes and a memory leak fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/core: Fix memory leak triggered by perf --namespace
    perf/x86/intel/uncore: Add event constraint for BDX PCU
    perf/x86/intel: Hide TSX events when RTM is not supported

    Linus Torvalds
     
  • Pull static key fix from Ingo Molnar:
    "Fix a boot warning related to bad init ordering of the static keys
    self-test"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    jump_label: Invoke jump_label_test() via early_initcall()

    Linus Torvalds
     

26 Nov, 2017

1 commit

  • Pull timer updates from Thomas Gleixner:

    - The final conversion of timer wheel timers to timer_setup().

    A few manual conversions and a large coccinelle assisted sweep and
    the removal of the old initialization mechanisms and the related
    code.

    - Remove the now unused VSYSCALL update code

    - Fix permissions of /proc/timer_list. I still need to get rid of that
    file completely

    - Rename a misnomed clocksource function and remove a stale declaration

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    m68k/macboing: Fix missed timer callback assignment
    treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
    timer: Remove redundant __setup_timer*() macros
    timer: Pass function down to initialization routines
    timer: Remove unused data arguments from macros
    timer: Switch callback prototype to take struct timer_list * argument
    timer: Pass timer_list pointer to callbacks unconditionally
    Coccinelle: Remove setup_timer.cocci
    timer: Remove setup_*timer() interface
    timer: Remove init_timer() interface
    treewide: setup_timer() -> timer_setup() (2 field)
    treewide: setup_timer() -> timer_setup()
    treewide: init_timer() -> setup_timer()
    treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
    s390: cmm: Convert timers to use timer_setup()
    lightnvm: Convert timers to use timer_setup()
    drivers/net: cris: Convert timers to use timer_setup()
    drm/vc4: Convert timers to use timer_setup()
    block/laptop_mode: Convert timers to use timer_setup()
    net/atm/mpc: Avoid open-coded assignment of timer callback function
    ...

    Linus Torvalds
     

24 Nov, 2017

2 commits

  • Noticed with a Clang build. This improves the readability of the ?:
    expression, as it has lower precedence than the - expression. Show
    explicitly that - is evaluated first.

    Signed-off-by: Kees Cook
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20171122205645.GA27125@beast

    Kees Cook
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2017-11-23

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Several BPF offloading fixes, from Jakub. Among others:

    - Limit offload to cls_bpf and XDP program types only.
    - Move device validation into the driver and don't make
    any assumptions about the device in the classifier due
    to shared blocks semantics.
    - Don't pass offloaded XDP program into the driver when
    it should be run in native XDP instead. Offloaded ones
    are not JITed for the host in such cases.
    - Don't destroy device offload state when moved to
    another namespace.
    - Revert dumping offload info into user space for now,
    since ifindex alone is not sufficient. This will be
    redone properly for bpf-next tree.

    2) Fix test_verifier to avoid using bpf_probe_write_user()
    helper in test cases, since it's dumping a warning into
    kernel log which may confuse users when only running tests.
    Switch to use bpf_trace_printk() instead, from Yonghong.

    3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
    before it becomes uabi, from Gianluca. More specifically:

    - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
    by bpf_csum_diff(), where the argument is either a
    valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
    then enforces a valid pointer in case of non-0 size
    or a valid pointer or NULL in case of size 0. Given
    that, the semantics for ARG_PTR_TO_MEM in combination
    with ARG_CONST_SIZE_OR_ZERO are now such that in case
    of size 0, the pointer must always be valid and cannot
    be NULL. This fix in semantics allows for bpf_probe_read()
    to drop the recently added size == 0 check in the helper
    that would become part of uabi otherwise once released.
    At the same time we can then fix bpf_probe_read_str() and
    bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
    instead of ARG_CONST_SIZE in order to fix recently
    reported issues by Arnaldo et al, where LLVM optimizes
    two boundary checks into a single one for unknown
    variables where the verifier looses track of the variable
    bounds and thus rejects valid programs otherwise.

    4) A fix for the verifier for the case when it detects
    comparison of two constants where the branch is guaranteed
    to not be taken at runtime. Verifier will rightfully prune
    the exploration of such paths, but we still pass the program
    to JITs, where they would complain about using reserved
    fields, etc. Track such dead instructions and sanitize
    them with mov r0,r0. Rejection is not possible since LLVM
    may generate them for valid C code and doesn't do as much
    data flow analysis as verifier. For bpf-next we might
    implement removal of such dead code and adjust branches
    instead. Fix from Alexei.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Nov, 2017

6 commits

  • …pub/scm/linux/kernel/git/kees/linux into timers/urgent

    Pull the last batch of manual timer conversions from Kees Cook:

    - final batch of "non trivial" timer conversions (multi-tree dependencies,
    things Coccinelle couldn't handle, etc).

    - treewide conversions via Coccinelle, in 4 steps:
    - DEFINE_TIMER() functions converted to struct timer_list * argument
    - init_timer() -> setup_timer()
    - setup_timer() -> timer_setup()
    - setup_timer() -> timer_setup() (with a single embedded structure)

    - deprecated timer API removals (init_timer(), setup_*timer())

    - finalization of new API (remove global casts)

    Thomas Gleixner
     
  • when the verifier detects that register contains a runtime constant
    and it's compared with another constant it will prune exploration
    of the branch that is guaranteed not to be taken at runtime.
    This is all correct, but malicious program may be constructed
    in such a way that it always has a constant comparison and
    the other branch is never taken under any conditions.
    In this case such path through the program will not be explored
    by the verifier. It won't be taken at run-time either, but since
    all instructions are JITed the malicious program may cause JITs
    to complain about using reserved fields, etc.
    To fix the issue we have to track the instructions explored by
    the verifier and sanitize instructions that are dead at run time
    with NOPs. We cannot reject such dead code, since llvm generates
    it for valid C code, since it doesn't do as much data flow
    analysis as the verifier does.

    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     
  • Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
    semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
    the compiler generates optimized BPF code when checking boundaries of an
    argument from C code. A typical example of this optimized code can be
    generated using the bpf_perf_event_output helper when operating on variable
    memory:

    /* len is a generic scalar */
    if (len > 0 && len 0x7ffe goto pc+6
    114: (bf) r1 = r6
    115: (18) r2 = 0xffff94e5f166c200
    117: (b7) r3 = 0
    118: (bf) r4 = r7
    119: (85) call bpf_perf_event_output#25
    R5 min value is negative, either use unsigned or 'var &= const'

    With this code, the verifier loses track of the variable.

    Replacing arg5 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
    avoids this quite common case which leads to usability issues, and the
    compiler generates code that the verifier can more easily test:

    if (len
    Signed-off-by: Gianluca Borello
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     
  • Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
    semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
    the compiler generates optimized BPF code when checking boundaries of an
    argument from C code. A typical example of this optimized code can be
    generated using the bpf_probe_read_str helper when operating on variable
    memory:

    /* len is a generic scalar */
    if (len > 0 && len 0x7ffe goto pc-42
    254: (bf) r1 = r7
    255: (79) r2 = *(u64 *)(r10 -88)
    256: (bf) r8 = r4
    257: (85) call bpf_probe_read_str#45
    R2 min value is negative, either use unsigned or 'var &= const'

    With this code, the verifier loses track of the variable.

    Replacing arg2 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
    avoids this quite common case which leads to usability issues, and the
    compiler generates code that the verifier can more easily test:

    if (len
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     
  • Commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to
    ARG_CONST_SIZE_OR_ZERO") changed arg2 type to ARG_CONST_SIZE_OR_ZERO to
    simplify writing bpf programs by taking advantage of the new semantics
    introduced for ARG_CONST_SIZE_OR_ZERO which allows <!NULL, 0> arguments.

    In order to prevent the helper from actually passing a NULL pointer to
    probe_kernel_read, which can happen when is passed to the helper,
    the commit also introduced an explicit check against size == 0.

    After the recent introduction of the ARG_PTR_TO_MEM_OR_NULL type,
    bpf_probe_read can not receive a pair of arguments anymore, thus
    the check is not needed anymore and can be removed, since probe_kernel_read
    can correctly handle a <!NULL, 0> call. This also fixes the semantics of
    the helper before it gets officially released and bpf programs start
    relying on this check.

    Fixes: 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO")
    Signed-off-by: Gianluca Borello
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     
  • With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
    argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
    and the verifier can prove the value of this next argument is 0. However,
    most helpers are just interested in handling <!NULL, 0>, so forcing them to
    deal with makes the implementation of those helpers more
    complicated for no apparent benefits, requiring them to explicitly handle
    those corner cases with checks that bpf programs could start relying upon,
    preventing the possibility of removing them later.

    Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
    even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
    ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.

    Currently, the only helper that needs this is bpf_csum_diff_proto(), so
    change arg1 and arg3 to this new type as well.

    Also add a new battery of tests that explicitly test the
    !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
    various variations are focused on bpf_csum_diff, so cover also
    other helpers.

    Signed-off-by: Gianluca Borello
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     

22 Nov, 2017

7 commits

  • With all callbacks converted, and the timer callback prototype
    switched over, the TIMER_FUNC_TYPE cast is no longer needed,
    so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
    $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
    $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

    The now unused macros are also dropped from include/linux/timer.h.

    Signed-off-by: Kees Cook

    Kees Cook
     
  • In preparation for removing more macros, pass the function down to the
    initialization routines instead of doing it in macros.

    Cc: Thomas Gleixner
    Cc: John Stultz
    Cc: Stephen Boyd
    Signed-off-by: Kees Cook

    Kees Cook
     
  • Since all callbacks have been converted, we can switch the core
    prototype to "struct timer_list *" now too.

    Cc: Thomas Gleixner
    Cc: John Stultz
    Cc: Stephen Boyd
    Signed-off-by: Kees Cook

    Kees Cook
     
  • Now that all timer callbacks are already taking their struct timer_list
    pointer as the callback argument, just do this unconditionally and remove
    the .data field.

    Cc: Thomas Gleixner
    Cc: John Stultz
    Cc: Stephen Boyd
    Signed-off-by: Kees Cook

    Kees Cook
     
  • This converts all remaining cases of the old setup_timer() API into using
    timer_setup(), where the callback argument is the structure already
    holding the struct timer_list. These should have no behavioral changes,
    since they just change which pointer is passed into the callback with
    the same available pointers after conversion. It handles the following
    examples, in addition to some other variations.

    Casting from unsigned long:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

    and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

    become:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    Direct function assignments:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

    have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

    And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

    have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    The conversion is done with the following Coccinelle script:

    spatch --very-quiet --all-includes --include-headers \
    -I ./arch/x86/include -I ./arch/x86/include/generated \
    -I ./include -I ./arch/x86/include/uapi \
    -I ./arch/x86/include/generated/uapi -I ./include/uapi \
    -I ./include/generated/uapi --include ./include/linux/kconfig.h \
    --dir . \
    --cocci-file ~/src/data/timer_setup.cocci

    @fix_address_of@
    expression e;
    @@

    setup_timer(
    -&(e)
    +&e
    , ...)

    // Update any raw setup_timer() usages that have a NULL callback, but
    // would otherwise match change_timer_function_usage, since the latter
    // will update all function assignments done in the face of a NULL
    // function initialization in setup_timer().
    @change_timer_function_usage_NULL@
    expression _E;
    identifier _timer;
    type _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, NULL, _E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, &_E);
    +timer_setup(&_E._timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
    +timer_setup(&_E._timer, NULL, 0);
    )

    @change_timer_function_usage@
    expression _E;
    identifier _timer;
    struct timer_list _stl;
    identifier _callback;
    type _cast_func, _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, _callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    _E->_timer@_stl.function = _callback;
    |
    _E->_timer@_stl.function = &_callback;
    |
    _E->_timer@_stl.function = (_cast_func)_callback;
    |
    _E->_timer@_stl.function = (_cast_func)&_callback;
    |
    _E._timer@_stl.function = _callback;
    |
    _E._timer@_stl.function = &_callback;
    |
    _E._timer@_stl.function = (_cast_func)_callback;
    |
    _E._timer@_stl.function = (_cast_func)&_callback;
    )

    // callback(unsigned long arg)
    @change_callback_handle_cast
    depends on change_timer_function_usage@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    (
    ... when != _origarg
    _handletype *_handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    )
    }

    // callback(unsigned long arg) without existing variable
    @change_callback_handle_cast_no_arg
    depends on change_timer_function_usage &&
    !change_callback_handle_cast@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    + _handletype *_origarg = from_timer(_origarg, t, _timer);
    +
    ... when != _origarg
    - (_handletype *)_origarg
    + _origarg
    ... when != _origarg
    }

    // Avoid already converted callbacks.
    @match_callback_converted
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    { ... }

    // callback(struct something *handle)
    @change_callback_handle_arg
    depends on change_timer_function_usage &&
    !match_callback_converted &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_handletype *_handle
    +struct timer_list *t
    )
    {
    + _handletype *_handle = from_timer(_handle, t, _timer);
    ...
    }

    // If change_callback_handle_arg ran on an empty function, remove
    // the added handler.
    @unchange_callback_handle_arg
    depends on change_timer_function_usage &&
    change_callback_handle_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    {
    - _handletype *_handle = from_timer(_handle, t, _timer);
    }

    // We only want to refactor the setup_timer() data argument if we've found
    // the matching callback. This undoes changes in change_timer_function_usage.
    @unchange_timer_function_usage
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg &&
    !change_callback_handle_arg@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type change_timer_function_usage._cast_data;
    @@

    (
    -timer_setup(&_E->_timer, _callback, 0);
    +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    |
    -timer_setup(&_E._timer, _callback, 0);
    +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    )

    // If we fixed a callback from a .function assignment, fix the
    // assignment cast now.
    @change_timer_function_assignment
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_func;
    typedef TIMER_FUNC_TYPE;
    @@

    (
    _E->_timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -&_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    )

    // Sometimes timer functions are called directly. Replace matched args.
    @change_timer_function_calls
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression _E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_data;
    @@

    _callback(
    (
    -(_cast_data)_E
    +&_E->_timer
    |
    -(_cast_data)&_E
    +&_E._timer
    |
    -_E
    +&_E->_timer
    )
    )

    // If a timer has been configured without a data argument, it can be
    // converted without regard to the callback argument, since it is unused.
    @match_timer_function_unused_data@
    expression _E;
    identifier _timer;
    identifier _callback;
    @@

    (
    -setup_timer(&_E->_timer, _callback, 0);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0L);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0UL);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0L);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0UL);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0L);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0UL);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0L);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0UL);
    +timer_setup(_timer, _callback, 0);
    )

    @change_callback_unused_data
    depends on match_timer_function_unused_data@
    identifier match_timer_function_unused_data._callback;
    type _origtype;
    identifier _origarg;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *unused
    )
    {
    ... when != _origarg
    }

    Signed-off-by: Kees Cook

    Kees Cook
     
  • This mechanically converts all remaining cases of ancient open-coded timer
    setup with the old setup_timer() API, which is the first step in timer
    conversions. This has no behavioral changes, since it ultimately just
    changes the order of assignment to fields of struct timer_list when
    finding variations of:

    init_timer(&t);
    f.function = timer_callback;
    t.data = timer_callback_arg;

    to be converted into:

    setup_timer(&t, timer_callback, timer_callback_arg);

    The conversion is done with the following Coccinelle script, which
    is an improved version of scripts/cocci/api/setup_timer.cocci, in the
    following ways:
    - assignments-before-init_timer() cases
    - limit the .data case removal to the specific struct timer_list instance
    - handling calls by dereference (timer->field vs timer.field)

    spatch --very-quiet --all-includes --include-headers \
    -I ./arch/x86/include -I ./arch/x86/include/generated \
    -I ./include -I ./arch/x86/include/uapi \
    -I ./arch/x86/include/generated/uapi -I ./include/uapi \
    -I ./include/generated/uapi --include ./include/linux/kconfig.h \
    --dir . \
    --cocci-file ~/src/data/setup_timer.cocci

    @fix_address_of@
    expression e;
    @@

    init_timer(
    -&(e)
    +&e
    , ...)

    // Match the common cases first to avoid Coccinelle parsing loops with
    // "... when" clauses.

    @match_immediate_function_data_after_init_timer@
    expression e, func, da;
    @@

    -init_timer
    +setup_timer
    ( \(&e\|e\)
    +, func, da
    );
    (
    -\(e.function\|e->function\) = func;
    -\(e.data\|e->data\) = da;
    |
    -\(e.data\|e->data\) = da;
    -\(e.function\|e->function\) = func;
    )

    @match_immediate_function_data_before_init_timer@
    expression e, func, da;
    @@

    (
    -\(e.function\|e->function\) = func;
    -\(e.data\|e->data\) = da;
    |
    -\(e.data\|e->data\) = da;
    -\(e.function\|e->function\) = func;
    )
    -init_timer
    +setup_timer
    ( \(&e\|e\)
    +, func, da
    );

    @match_function_and_data_after_init_timer@
    expression e, e2, e3, e4, e5, func, da;
    @@

    -init_timer
    +setup_timer
    ( \(&e\|e\)
    +, func, da
    );
    ... when != func = e2
    when != da = e3
    (
    -e.function = func;
    ... when != da = e4
    -e.data = da;
    |
    -e->function = func;
    ... when != da = e4
    -e->data = da;
    |
    -e.data = da;
    ... when != func = e5
    -e.function = func;
    |
    -e->data = da;
    ... when != func = e5
    -e->function = func;
    )

    @match_function_and_data_before_init_timer@
    expression e, e2, e3, e4, e5, func, da;
    @@
    (
    -e.function = func;
    ... when != da = e4
    -e.data = da;
    |
    -e->function = func;
    ... when != da = e4
    -e->data = da;
    |
    -e.data = da;
    ... when != func = e5
    -e.function = func;
    |
    -e->data = da;
    ... when != func = e5
    -e->function = func;
    )
    ... when != func = e2
    when != da = e3
    -init_timer
    +setup_timer
    ( \(&e\|e\)
    +, func, da
    );

    @r1 exists@
    expression t;
    identifier f;
    position p;
    @@

    f(...) { ... when any
    init_timer@p(\(&t\|t\))
    ... when any
    }

    @r2 exists@
    expression r1.t;
    identifier g != r1.f;
    expression e8;
    @@

    g(...) { ... when any
    \(t.data\|t->data\) = e8
    ... when any
    }

    // It is dangerous to use setup_timer if data field is initialized
    // in another function.
    @script:python depends on r2@
    p << r1.p;
    @@

    cocci.include_match(False)

    @r3@
    expression r1.t, func, e7;
    position r1.p;
    @@

    (
    -init_timer@p(&t);
    +setup_timer(&t, func, 0UL);
    ... when != func = e7
    -t.function = func;
    |
    -t.function = func;
    ... when != func = e7
    -init_timer@p(&t);
    +setup_timer(&t, func, 0UL);
    |
    -init_timer@p(t);
    +setup_timer(t, func, 0UL);
    ... when != func = e7
    -t->function = func;
    |
    -t->function = func;
    ... when != func = e7
    -init_timer@p(t);
    +setup_timer(t, func, 0UL);
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • This changes all DEFINE_TIMER() callbacks to use a struct timer_list
    pointer instead of unsigned long. Since the data argument has already been
    removed, none of these callbacks are using their argument currently, so
    this renames the argument to "unused".

    Done using the following semantic patch:

    @match_define_timer@
    declarer name DEFINE_TIMER;
    identifier _timer, _callback;
    @@

    DEFINE_TIMER(_timer, _callback);

    @change_callback depends on match_define_timer@
    identifier match_define_timer._callback;
    type _origtype;
    identifier _origarg;
    @@

    void
    -_callback(_origtype _origarg)
    +_callback(struct timer_list *unused)
    { ... }

    Signed-off-by: Kees Cook

    Kees Cook
     

21 Nov, 2017

8 commits

  • Pull printk updates from Petr Mladek:

    - print the warning about dropped messages on consoles on a separate
    line. It makes it more legible.

    - one typo fix and small code clean up.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    added new line symbol after warning about dropped messages
    printk: fix typo in printk_safe.c
    printk: simplify no_printk()

    Linus Torvalds
     
  • This reverts commit bd601b6ada11 ("bpf: report offload info to user
    space"). The ifindex by itself is not sufficient, we should provide
    information on which network namespace this ifindex belongs to.
    After considering some options we concluded that it's best to just
    remove this API for now, and rework it in -next.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • We are currently destroying the device offload state when device
    moves to another net namespace. This doesn't break with current
    NFP code, because offload state is not used on program removal,
    but it's not correct behaviour.

    Ignore the device unregister notifications on namespace move.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • bpf_prog_get_type() is identical to bpf_prog_get_type_dev(),
    with false passed as attach_drv. Instead of keeping it as
    an exported symbol turn it into static inline wrapper.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • With TC shared block changes we can't depend on correct netdev
    pointer being available in cls_bpf. Move the device validation
    to the driver. Core will only make sure that offloaded programs
    are always attached in the driver (or in HW by the driver). We
    trust that drivers which implement offload callbacks will perform
    necessary checks.

    Moving the checks to the driver is generally a useful thing,
    in practice the check should be against a switchdev instance,
    not a netdev, given that most ASICs will probably allow using
    the same program on many ports.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Jiri Pirko
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • bpf_target_prog seems long and clunky, rename it to prog_ifindex.
    We don't want to call this field just ifindex, because maps
    may need a similar field in the future and bpf_attr members for
    programs and maps are unnamed.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • We are currently only allowing attachment of device-bound
    cls_bpf and XDP programs. Make this restriction explicit in
    the BPF offload code. This way we can potentially reuse the
    ifindex field in the future.

    Since XDP and cls_bpf programs can only be loaded by admin,
    we can drop the explicit capability check from offload code.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • Offload state may get destroyed either because the device for which
    it was constructed is going away, or because the refcount of bpf
    program itself has reached 0. In both of those cases we will call
    __bpf_prog_offload_destroy() to unlink the offload from the device.
    We may in fact call it twice, which works just fine, but we should
    make clear this is intended and caution others trying to extend the
    function.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     

18 Nov, 2017

13 commits

  • Merge more updates from Andrew Morton:

    - a bit more MM

    - procfs updates

    - dynamic-debug fixes

    - lib/ updates

    - checkpatch

    - epoll

    - nilfs2

    - signals

    - rapidio

    - PID management cleanup and optimization

    - kcov updates

    - sysvipc updates

    - quite a few misc things all over the place

    * emailed patches from Andrew Morton : (94 commits)
    EXPERT Kconfig menu: fix broken EXPERT menu
    include/asm-generic/topology.h: remove unused parent_node() macro
    arch/tile/include/asm/topology.h: remove unused parent_node() macro
    arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
    arch/sh/include/asm/topology.h: remove unused parent_node() macro
    arch/ia64/include/asm/topology.h: remove unused parent_node() macro
    drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
    mm: add infrastructure for get_user_pages_fast() benchmarking
    sysvipc: make get_maxid O(1) again
    sysvipc: properly name ipc_addid() limit parameter
    sysvipc: duplicate lock comments wrt ipc_addid()
    sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
    initramfs: use time64_t timestamps
    drivers/watchdog: make use of devm_register_reboot_notifier()
    kernel/reboot.c: add devm_register_reboot_notifier()
    kcov: update documentation
    Makefile: support flag -fsanitizer-coverage=trace-cmp
    kcov: support comparison operands collection
    kcov: remove pointless current != NULL check
    kernel/panic.c: add TAINT_AUX
    ...

    Linus Torvalds
     
  • Add devm_* wrapper around register_reboot_notifier to simplify device
    specific reboot notifier registration/unregistration.

    [akpm@linux-foundation.org: move `struct device' forward decl to top-of-file]
    Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.com
    Signed-off-by: Andrey Smirnov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Smirnov
     
  • Enables kcov to collect comparison operands from instrumented code.
    This is done by using Clang's -fsanitize=trace-cmp instrumentation
    (currently not available for GCC).

    The comparison operands help a lot in fuzz testing. E.g. they are used
    in Syzkaller to cover the interiors of conditional statements with way
    less attempts and thus make previously unreachable code reachable.

    To allow separate collection of coverage and comparison operands two
    different work modes are implemented. Mode selection is now done via a
    KCOV_ENABLE ioctl call with corresponding argument value.

    Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
    Signed-off-by: Victor Chibotaru
    Signed-off-by: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Andrey Konovalov
    Cc: Mark Rutland
    Cc: Alexander Popov
    Cc: Andrey Ryabinin
    Cc: Kees Cook
    Cc: Vegard Nossum
    Cc: Quentin Casasnovas
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Victor Chibotaru
     
  • __sanitizer_cov_trace_pc() is a hot code, so it's worth to remove
    pointless '!current' check. Current is never NULL.

    Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com
    Signed-off-by: Andrey Ryabinin
    Acked-by: Dmitry Vyukov
    Acked-by: Mark Rutland
    Cc: Andrey Konovalov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • This is the gist of a patch which we've been forward-porting in our
    kernels for a long time now and it probably would make a good sense to
    have such TAINT_AUX flag upstream which can be used by each distro etc,
    how they see fit. This way, we won't need to forward-port a distro-only
    version indefinitely.

    Add an auxiliary taint flag to be used by distros and others. This
    obviates the need to forward-port whatever internal solutions people
    have in favor of a single flag which they can map arbitrarily to a
    definition of their pleasing.

    The "X" mnemonic could also mean eXternal, which would be taint from a
    distro or something else but not the upstream kernel. We will use it to
    mark modules for which we don't provide support. I.e., a really
    eXternal module.

    Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic
    Signed-off-by: Borislav Petkov
    Cc: Kees Cook
    Cc: Jessica Yu
    Cc: Peter Zijlstra
    Cc: Jiri Slaby
    Cc: Jiri Olsa
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Takashi Iwai
    Cc: Petr Mladek
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • pidhash is no longer required as all the information can be looked up
    from idr tree. nr_hashed represented the number of pids that had been
    hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
    it has been renamed to pid_allocated and PIDNS_ADDING respectively.

    [gs051095@gmail.com: v6]
    Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
    Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com
    Signed-off-by: Gargi Sharma
    Reviewed-by: Rik van Riel
    Tested-by: Tony Luck [ia64]
    Cc: Julia Lawall
    Cc: Ingo Molnar
    Cc: Pavel Tatashin
    Cc: Kirill Tkhai
    Cc: Oleg Nesterov
    Cc: Eric W. Biederman
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gargi Sharma
     
  • Patch series "Replacing PID bitmap implementation with IDR API", v4.

    This series replaces kernel bitmap implementation of PID allocation with
    IDR API. These patches are written to simplify the kernel by replacing
    custom code with calls to generic code.

    The following are the stats for pid and pid_namespace object files
    before and after the replacement. There is a noteworthy change between
    the IDR and bitmap implementation.

    Before
    text data bss dec hex filename
    8447 3894 64 12405 3075 kernel/pid.o
    After
    text data bss dec hex filename
    3397 304 0 3701 e75 kernel/pid.o

    Before
    text data bss dec hex filename
    5692 1842 192 7726 1e2e kernel/pid_namespace.o
    After
    text data bss dec hex filename
    2854 216 16 3086 c0e kernel/pid_namespace.o

    The following are the stats for ps, pstree and calling readdir on /proc
    for 10,000 processes.

    ps:
    With IDR API With bitmap
    real 0m1.479s 0m2.319s
    user 0m0.070s 0m0.060s
    sys 0m0.289s 0m0.516s

    pstree:
    With IDR API With bitmap
    real 0m1.024s 0m1.794s
    user 0m0.348s 0m0.612s
    sys 0m0.184s 0m0.264s

    proc:
    With IDR API With bitmap
    real 0m0.059s 0m0.074s
    user 0m0.000s 0m0.004s
    sys 0m0.016s 0m0.016s

    This patch (of 2):

    Replace the current bitmap implementation for Process ID allocation.
    Functions that are no longer required, for example, free_pidmap(),
    alloc_pidmap(), etc. are removed. The rest of the functions are
    modified to use the IDR API. The change was made to make the PID
    allocation less complex by replacing custom code with calls to generic
    API.

    [gs051095@gmail.com: v6]
    Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
    [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
    Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
    Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
    Signed-off-by: Gargi Sharma
    Reviewed-by: Rik van Riel
    Acked-by: Oleg Nesterov
    Cc: Julia Lawall
    Cc: Ingo Molnar
    Cc: Pavel Tatashin
    Cc: Kirill Tkhai
    Cc: Eric W. Biederman
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gargi Sharma
     
  • Remove unnecessary else block, remove redundant return and call to kfree
    in if block.

    Link: http://lkml.kernel.org/r/1510238435-1655-1-git-send-email-mail@okal.no
    Signed-off-by: Ola N. Kaldestad
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ola N. Kaldestad
     
  • parse_crashkernel_mem() silently returns if we get zero bytes in the
    parsing function. It is useful for debugging to add a message,
    especially if the kernel cannot boot correctly.

    Add a pr_info instead of pr_warn because it is expected behavior for
    size = 0, eg. crashkernel=2G-4G:128M, size will be 0 in case system
    memory is less than 2G.

    Link: http://lkml.kernel.org/r/20171114080129.GA6115@dhcp-128-65.nay.redhat.com
    Signed-off-by: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Bhupesh Sharma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
    the thread group, today this is wrong in many ways.

    If nothing else, fatal_signal_pending() should always imply that the
    whole thread group (except ->group_exit_task if it is not NULL) is
    killed, this check breaks the rule.

    After the previous changes we can rely on sig_task_ignored();
    sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
    to kill this task and sig == SIGKILL OR it is traced and debugger can
    intercept the signal.

    This should hopefully fix the problem reported by Dmitry. This
    test-case

    static int init(void *arg)
    {
    for (;;)
    pause();
    }

    int main(void)
    {
    char stack[16 * 1024];

    for (;;) {
    int pid = clone(init, stack + sizeof(stack)/2,
    CLONE_NEWPID | SIGCHLD, NULL);
    assert(pid > 0);

    assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
    assert(waitpid(-1, NULL, WSTOPPED) == pid);

    assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
    assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
    assert(pid == wait(NULL));
    }
    }

    triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
    task_participate_group_stop(). do_signal_stop()->signal_group_exit()
    checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
    checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.

    And his should fix the minor security problem reported by Kyle,
    SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
    task is the root of a pid namespace.

    Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
    Signed-off-by: Oleg Nesterov
    Reported-by: Dmitry Vyukov
    Reported-by: Kyle Huey
    Reviewed-by: Kees Cook
    Tested-by: Kyle Huey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
    signals even if force == T. This simplifies the next change and this
    matches the same check in get_signal() which will drop these signals
    anyway.

    Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
    Signed-off-by: Oleg Nesterov
    Tested-by: Kyle Huey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The comment in sig_ignored() says "Tracers may want to know about even
    ignored signals" but SIGKILL can not be reported to debugger and it is
    just wrong to return 0 in this case: SIGKILL should only kill the
    SIGNAL_UNKILLABLE task if it comes from the parent ns.

    Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
    sig_task_ignored().

    SISGTOP coming from within the namespace is not really right too but at
    least debugger can intercept it, and we can't drop it here because this
    will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
    another ->ptrace check later, we will see.

    Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
    Signed-off-by: Oleg Nesterov
    Tested-by: Kyle Huey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Mikulas noticed in the existing do_proc_douintvec_minmax_conv() and
    do_proc_dopipe_max_size_conv() introduced in this patchset, that they
    inconsistently handle overflow and min/max range inputs:

    For example:

    0 ... param->min - 1 ---> ERANGE
    param->min ... param->max ---> the value is accepted
    param->max + 1 ... 0x100000000L + param->min - 1 ---> ERANGE
    0x100000000L + param->min ... 0x100000000L + param->max ---> EINVAL
    0x100000000L + param->max + 1, 0x200000000L + param->min - 1 ---> ERANGE
    0x200000000L + param->min ... 0x200000000L + param->max ---> EINVAL
    0x200000000L + param->max + 1, 0x300000000L + param->min - 1 ---> ERANGE

    In do_proc_do*() routines which store values into unsigned int variables
    (4 bytes wide for 64-bit builds), first validate that the input unsigned
    long value (8 bytes wide for 64-bit builds) will fit inside the smaller
    unsigned int variable. Then check that the unsigned int value falls
    inside the specified parameter min, max range. Otherwise the unsigned
    long -> unsigned int conversion drops leading bits from the input value,
    leading to the inconsistent pattern Mikulas documented above.

    Link: http://lkml.kernel.org/r/1507658689-11669-5-git-send-email-joe.lawrence@redhat.com
    Signed-off-by: Joe Lawrence
    Reported-by: Mikulas Patocka
    Reviewed-by: Mikulas Patocka
    Cc: Al Viro
    Cc: Jens Axboe
    Cc: Michael Kerrisk
    Cc: Randy Dunlap
    Cc: Josh Poimboeuf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Lawrence