20 Aug, 2008

1 commit

  • I outwitted myself again in commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664,
    and broke the SA_NOCLDWAIT behavior so it leaks zombies. This fixes it.

    Reported-by: Andi Kleen
    Signed-off-by: Roland McGrath

    Roland McGrath
     

12 Aug, 2008

1 commit


28 Jul, 2008

1 commit


27 Jul, 2008

7 commits

  • This defines a new hook tracehook_force_sigpending() that lets tracing
    code decide to force TIF_SIGPENDING on in recalc_sigpending().

    This is not used yet, so it compiles away to nothing for now. It lays the
    groundwork for new tracing code that can interrupt a task synthetically
    without actually sending a signal.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This moves the ptrace logic in task death (exit_notify) into tracehook.h
    inlines. Some code is rearranged slightly to make things nicer. There is
    no change, only cleanup.

    There is one hook called with the tasklist_lock write-locked, as ptrace
    needs. There is also a new hook called after exit_state changes and
    without locks. This is a better place for tracing work to be in the
    future, since it doesn't delay the whole system with locking.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This defines the tracehook_notify_jctl() hook to formalize the ptrace
    effects on the job control notifications. There is no change, only
    cleanup.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This defines the tracehook_get_signal() hook to allow tracing code to slip
    in before normal signal dequeuing. This lays the groundwork for new
    tracing features that can inject synthetic signals outside the normal
    queue or control the disposition of delivered signals. The calling
    convention lets tracehook_get_signal() decide both exactly what will
    happen and what signal number to report in the handler/exit.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This defines tracehook_consider_fatal_signal() has a fine-grained hook for
    deciding to skip the special cases for a fatal signal, as ptrace does.
    There is no change, only cleanup.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This defines tracehook_consider_ignored_signal() has a fine-grained hook
    for deciding to prevent the normal short-circuit of sending an ignored
    signal, as ptrace does. There is no change, only cleanup.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • The ptrace_notify() function should not be called by any modules. It was
    only ever exported to be called by binfmt exec functions. But that is no
    longer necessary since fs/exec.c deals with that generically now. There
    should be no calls to ptrace_notify() from outside the core kernel.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

26 Jul, 2008

10 commits

  • This function operated on a pid_t to kill a task, which is no longer valid
    in a containerized system.

    It has finally lost all its users and we can safely remove it from the
    tree.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Move mm->core_waiters into "struct core_state" allocated on stack. This
    shrinks mm_struct a little bit and allows further changes.

    This patch mostly does s/core_waiters/core_state. The only essential
    change is that coredump_wait() must clear mm->core_state before return.

    The coredump_wait()'s path is uglified and .text grows by 30 bytes, this
    is fixed by the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. SIGKILL can't be blocked, remove this check from sigkill_pending().

    2. When ptrace_stop() sees sigkill_pending() == T, it can just return.
    Kill "int killed" and simplify the code. This also is more correct,
    the tracer shouldn't see us in TASK_TRACED if we are not going to
    stop.

    I strongly believe this code needs further changes. We should do the "was
    this task killed" check unconditionally, currently it depends on
    arch_ptrace_stop_needed(). On the other hand, sigkill_pending() isn't
    very clever. If the task was killed tkill(SIGKILL), the signal can be
    already dequeued if the caller is do_exit().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change the type of pid and tgid variables from int to the POSIX type
    pid_t.

    Signed-off-by: Gustavo F. Padovan
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo Fernando Padovan
     
  • In the switch to configurable HZ in 2.6, the treatment of the si_utime and
    si_stime fields that are exposed to userland via the siginfo structure
    looks to have been botched. As things stand, these fields report times in
    units of HZ, so that userland gets information that varies depending on
    the HZ that the kernel was configured with. This patch changes the
    reported values to use USER_HZ units.

    Signed-off-by: Michael Kerrisk
    Acked-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk
     
  • fae5fa44f1fd079ffbed8e0add929dd7bbd1347f changed do_signal_stop() to check
    SIGNAL_UNKILLABLE, this wasn't needed. If signal_group_exit() == F, the
    signal sent to SIGNAL_UNKILLABLE task must be already filtered out by the
    caller, get_signal_to_deliver(). And if signal_group_exit() == T we are
    not going to stop.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • dequeue_signal() checks SIGNAL_GROUP_EXIT before setting
    SIGNAL_STOP_DEQUEUED. This was added by
    788e05a67c343fa22f2ae1d3ca264e7f15c25eaf a long ago to avoid the
    coredump/SIGSTOP race.

    Since then the related code was changed, and now this subtle check is both
    incomplete and unneeded at the same time. It is incomplete because
    nowadays exec() doesn't set SIGNAL_GROUP_EXIT, so in fact we should check
    signal_group_exit() to avoid a similar race. Fortunately, we doesn't need
    the check at all. The only function which relies on SIGNAL_STOP_DEQUEUED
    is do_signal_stop(), and it ignores this flag if signal_group_exit() == T,
    this covers the SIGNAL_GROUP_EXIT case.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • With the recent changes collect_signal() always returns true. Change it
    to return void and update the single caller.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Factor out sigdelset() calls and remove the "still_pending" variable.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • collect_signal() checks sigismember(&list->signal, sig), this is not
    needed. This "sig" was just found by next_signal(), so it must be valid.

    We have a (completely broken) call to ->notifier in between, but it must
    not play with sigpending->signal bits or unlock ->siglock.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

24 Jul, 2008

1 commit

  • The bug was reported and analysed by Mark McLoughlin ,
    the patch is based on his and Roland's suggestions.

    posix_timer_event() always rewrites the pre-allocated siginfo before sending
    the signal. Most of the written info is the same all the time, but memset(0)
    is very wrong. If ->sigq is queued we can race with collect_signal() which
    can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
    copy the wrong .si_code/si_tid/etc.

    In short, sys_timer_settime() can in fact stop the active timer, or the user
    can receive the siginfo with the wrong .si_xxx values.

    Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
    change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
    It would be nice to move the whole sigq->info initialization from send to
    create path, but this is not easy to do without uglifying timer_create()
    further.

    As Roland rightly pointed out, we need more cleanups/fixes here, see the
    "FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
    it can mask the most bad implications.

    Reported-by: Mark McLoughlin
    Signed-off-by: Oleg Nesterov
    Cc: Mark McLoughlin
    Cc: Oliver Pinter
    Cc: Roland McGrath
    Cc: stable@kernel.org
    Cc: Andrew Morton
    Signed-off-by: Thomas Gleixner

    kernel/posix-timers.c | 17 +++++++++++++----
    kernel/signal.c | 1 +
    2 files changed, 14 insertions(+), 4 deletions(-)

    Oleg Nesterov
     

27 May, 2008

2 commits

  • Based on Roland's patch. This approach was suggested by Austin Clements
    from the very beginning, and then by Linus.

    As Austin pointed out, the execing task can be killed by SI_TIMER signal
    because exec flushes the signal handlers, but doesn't discard the pending
    signals generated by posix timers. Perhaps not a bug, but people find this
    surprising. See http://bugzilla.kernel.org/show_bug.cgi?id=10460

    Signed-off-by: Oleg Nesterov
    Cc: Austin Clements
    Cc: Roland McGrath
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently sigqueue_free() removes sigqueue from list, but doesn't cancel the
    pending signal. This is not consistent, the task should either receive the
    "full" signal along with siginfo_t, or it shouldn't receive the signal at all.

    Change sigqueue_free() to clear SIGQUEUE_PREALLOC but leave sigqueue on list
    if it is queued.

    This is a user-visible change. If the signal is blocked, it stays queued
    after sys_timer_delete() until unblocked with the "stale" si_code/si_value,
    and of course it is still counted wrt RLIMIT_SIGPENDING which also limits
    the number of posix timers.

    Signed-off-by: Oleg Nesterov
    Cc: Austin Clements
    Cc: Roland McGrath
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

25 May, 2008

1 commit

  • __exit_signal() does flush_sigqueue(tsk->pending) outside of ->siglock.
    This can race with another thread doing sigqueue_free(), we can free the
    same SIGQUEUE_PREALLOC sigqueue twice or corrupt the pending->list.

    Note that even sys_exit_group() can trigger this race, not only
    sys_timer_delete().

    Move the callsite of flush_sigqueue(tsk->pending) under ->siglock.

    This patch doesn't touch flush_sigqueue(->shared_pending) below, it is
    called when there are no other threads which can play with signals, and
    sigqueue_free() can't be used outside of our thread group.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

30 Apr, 2008

16 commits

  • This adds the set_restore_sigmask() inline in and
    replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
    change, but abstracts the details of the flag protocol from all the calls.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Currently the buggy /sbin/init hangs if SIGSEGV/etc happens. The kernel sends
    the signal, init dequeues it and ignores, returns from the exception, repeats
    the faulting instruction, and so on forever.

    Imho, such a behaviour is not good. I think that the explicit loud death of
    the buggy /sbin/init is better than the silent hang.

    Change force_sig_info() to clear SIGNAL_UNKILLABLE when the task should be
    really killed.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The global init has a lot of long standing problems with the unhandled fatal
    signals.

    - The "is_global_init(current)" check in get_signal_to_deliver()
    protects only the main thread. Sub-thread can dequee the fatal
    signal and shutdown the whole thread group except the main thread.
    If it dequeues SIGSTOP /sbin/init will be stopped, this is not
    right too. Note that we can't use is_global_init(->group_leader),
    this breaks exec and this can't solve other problems we have.

    - Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
    on delivery. This breaks exec, has other bad implications, and this
    is just wrong.

    Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps
    to solve some other problems addressed by the subsequent patches.

    Currently we use this flag for the global init only, but it could also be used
    by kthreads and (perhaps) by the sub-namespace inits.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that task_session() can't return a false NULL, check_kill_permission()
    doesn't need tasklist_lock.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This wasn't documented, but as Atsushi Tsuji pointed out
    check_kill_permission() needs tasklist_lock for task_session_nr(). I missed
    this fact when removed tasklist from the callers.

    Change check_kill_permission() to take tasklist_lock for the SIGCONT case.
    Re-order security checks so that we take tasklist_lock only if/when it is
    actually needed. This is a minimal fix for now, tasklist will be removed
    later.

    Also change the code to use task_session() instead of task_session_nr().

    Also, remove the SIGCONT check from cap_task_kill(), it is bogus (and the
    whole function is bogus. Serge, Eric, why it is still alive?).

    Signed-off-by: Oleg Nesterov
    Acked-by: Atsushi Tsuji
    Cc: Roland McGrath
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • send_signal() shouldn't call signalfd_notify() if it then fails with -EAGAIN.
    Harmless, just a paranoid cleanup.

    Also remove the comment. It is obsolete, signalfd_notify() was simplified and
    does a simple wakeup.

    Signed-off-by: Oleg Nesterov
    Acked-by: Davide Libenzi
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A couple of small comments about how CLD_CONTINUED notification works.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Rename handle_stop_signal() to prepare_signal(), make it return a boolean, and
    move the callsites of sig_ignored() into it.

    No functional changes for now. But it would be nice to factor out the "should
    we drop this signal" checks as much as possible, before we try to fix the bugs
    with the sub-namespace init's signals (actually the global /sbin/init has some
    problems with signals too).

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Move the callsite of print_fatal_signal() down, under "if
    (sig_kernel_coredump(signr))", so we don't need to check signr != SIGKILL.

    We are only interested in the sig_kernel_coredump() signals anyway, and due to
    the previous changes we almost never can see other fatal signals here except
    SIGKILL.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • handle_stop_signal() clears SIGNAL_STOP_DEQUEUED when sig == SIGKILL. Remove
    this nasty special case. It was needed to prevent the race with group stop
    and exit caused by thread-specific SIGKILL. Now that we use complete_signal()
    for private signals too this is not needed, complete_signal() will notice
    SIGKILL and abort the soon-to-begin group stop.

    Except: the target thread is dead (has PF_EXITING). But in that case we
    should not just clear SIGNAL_STOP_DEQUEUED and nothing more. We should either
    kill the whole thread group, or silently ignore the signal.

    I suspect we are not right wrt zombie leaders, but this is another issue which
    and should be fixed separately. Note that this check can't abort the group
    stop if it was already started/finished, this check only adds a subtle side
    effect if we race with the thread which has already dequeued sig_kernel_stop()
    signal and temporary released ->siglock.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • We export send_sigqueue() and send_group_sigqueue() for the only user,
    posix_timer_event(). This is a bit silly, because both are just trivial
    helpers on top of do_send_sigqueue() and because the we pass the unused
    .si_signo parameter.

    Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Suggested by Pavel Emelyanov.

    send_sigqueue/send_group_sigqueue are only differ in how they lock ->siglock.
    Unify them. send_group_sigqueue() uses spin_lock() because it knows the task
    can't exit, but in that case lock_task_sighand() can't fail and doesn't hurt.

    Note that the "sig" argument is ignored, it is always equal to ->si_signo.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Factor out complete_signal() callsites. This change completely unifies the
    helpers sending the specific/group signals.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Based on Pavel Emelyanov's suggestion.

    Rename __group_complete_signal() to complete_signal() and use it to process
    the specific signals too. To do this we simply add the "int group" argument.

    This allows us to greatly simply the signal-sending code and adds a useful
    behaviour change. We can avoid the unneeded wakeups for the private signals
    because wants_signal() is more clever than sigismember(blocked), but more
    importantly we now take into account the fatal specific signals too.

    The latter allows us to kill some subtle checks in handle_stop_signal() and
    makes the specific/group signal's behaviour more consistent. For example,
    currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was the
    signal sent by kill() or tkill() if the signal was not blocked.

    And. This allows us to tweak/fix the behaviour when the specific signal is
    sent to the dying/dead ->group_leader.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • send_signal() is used either with ->pending or with ->signal->shared_pending.
    Change it to take "int group" instead, this argument will be re-used later.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Move the unchanged definition of __group_complete_signal() so that send_signal
    can see it. To simplify the reading of the next patches.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov