30 Apr, 2008

40 commits

  • This adds the set_restore_sigmask() inline in and
    replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
    change, but abstracts the details of the flag protocol from all the calls.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Currently the buggy /sbin/init hangs if SIGSEGV/etc happens. The kernel sends
    the signal, init dequeues it and ignores, returns from the exception, repeats
    the faulting instruction, and so on forever.

    Imho, such a behaviour is not good. I think that the explicit loud death of
    the buggy /sbin/init is better than the silent hang.

    Change force_sig_info() to clear SIGNAL_UNKILLABLE when the task should be
    really killed.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that we rely on SIGNAL_UNKILLABLE flag, de_thread() doesn't need the nasty
    hack to kill the old ->child_reaper during the mt-exec.

    This also means we can avoid taking tasklist_lock around zap_other_threads().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The global init has a lot of long standing problems with the unhandled fatal
    signals.

    - The "is_global_init(current)" check in get_signal_to_deliver()
    protects only the main thread. Sub-thread can dequee the fatal
    signal and shutdown the whole thread group except the main thread.
    If it dequeues SIGSTOP /sbin/init will be stopped, this is not
    right too. Note that we can't use is_global_init(->group_leader),
    this breaks exec and this can't solve other problems we have.

    - Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
    on delivery. This breaks exec, has other bad implications, and this
    is just wrong.

    Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps
    to solve some other problems addressed by the subsequent patches.

    Currently we use this flag for the global init only, but it could also be used
    by kthreads and (perhaps) by the sub-namespace inits.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that task_session() can't return a false NULL, check_kill_permission()
    doesn't need tasklist_lock.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This wasn't documented, but as Atsushi Tsuji pointed out
    check_kill_permission() needs tasklist_lock for task_session_nr(). I missed
    this fact when removed tasklist from the callers.

    Change check_kill_permission() to take tasklist_lock for the SIGCONT case.
    Re-order security checks so that we take tasklist_lock only if/when it is
    actually needed. This is a minimal fix for now, tasklist will be removed
    later.

    Also change the code to use task_session() instead of task_session_nr().

    Also, remove the SIGCONT check from cap_task_kill(), it is bogus (and the
    whole function is bogus. Serge, Eric, why it is still alive?).

    Signed-off-by: Oleg Nesterov
    Acked-by: Atsushi Tsuji
    Cc: Roland McGrath
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • send_signal() shouldn't call signalfd_notify() if it then fails with -EAGAIN.
    Harmless, just a paranoid cleanup.

    Also remove the comment. It is obsolete, signalfd_notify() was simplified and
    does a simple wakeup.

    Signed-off-by: Oleg Nesterov
    Acked-by: Davide Libenzi
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A couple of small comments about how CLD_CONTINUED notification works.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Rename handle_stop_signal() to prepare_signal(), make it return a boolean, and
    move the callsites of sig_ignored() into it.

    No functional changes for now. But it would be nice to factor out the "should
    we drop this signal" checks as much as possible, before we try to fix the bugs
    with the sub-namespace init's signals (actually the global /sbin/init has some
    problems with signals too).

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Move the callsite of print_fatal_signal() down, under "if
    (sig_kernel_coredump(signr))", so we don't need to check signr != SIGKILL.

    We are only interested in the sig_kernel_coredump() signals anyway, and due to
    the previous changes we almost never can see other fatal signals here except
    SIGKILL.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • handle_stop_signal() clears SIGNAL_STOP_DEQUEUED when sig == SIGKILL. Remove
    this nasty special case. It was needed to prevent the race with group stop
    and exit caused by thread-specific SIGKILL. Now that we use complete_signal()
    for private signals too this is not needed, complete_signal() will notice
    SIGKILL and abort the soon-to-begin group stop.

    Except: the target thread is dead (has PF_EXITING). But in that case we
    should not just clear SIGNAL_STOP_DEQUEUED and nothing more. We should either
    kill the whole thread group, or silently ignore the signal.

    I suspect we are not right wrt zombie leaders, but this is another issue which
    and should be fixed separately. Note that this check can't abort the group
    stop if it was already started/finished, this check only adds a subtle side
    effect if we race with the thread which has already dequeued sig_kernel_stop()
    signal and temporary released ->siglock.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • We export send_sigqueue() and send_group_sigqueue() for the only user,
    posix_timer_event(). This is a bit silly, because both are just trivial
    helpers on top of do_send_sigqueue() and because the we pass the unused
    .si_signo parameter.

    Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Suggested by Pavel Emelyanov.

    send_sigqueue/send_group_sigqueue are only differ in how they lock ->siglock.
    Unify them. send_group_sigqueue() uses spin_lock() because it knows the task
    can't exit, but in that case lock_task_sighand() can't fail and doesn't hurt.

    Note that the "sig" argument is ignored, it is always equal to ->si_signo.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Factor out complete_signal() callsites. This change completely unifies the
    helpers sending the specific/group signals.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Based on Pavel Emelyanov's suggestion.

    Rename __group_complete_signal() to complete_signal() and use it to process
    the specific signals too. To do this we simply add the "int group" argument.

    This allows us to greatly simply the signal-sending code and adds a useful
    behaviour change. We can avoid the unneeded wakeups for the private signals
    because wants_signal() is more clever than sigismember(blocked), but more
    importantly we now take into account the fatal specific signals too.

    The latter allows us to kill some subtle checks in handle_stop_signal() and
    makes the specific/group signal's behaviour more consistent. For example,
    currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was the
    signal sent by kill() or tkill() if the signal was not blocked.

    And. This allows us to tweak/fix the behaviour when the specific signal is
    sent to the dying/dead ->group_leader.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • send_signal() is used either with ->pending or with ->signal->shared_pending.
    Change it to take "int group" instead, this argument will be re-used later.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Move the unchanged definition of __group_complete_signal() so that send_signal
    can see it. To simplify the reading of the next patches.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Suggested by Roland McGrath.

    Initialize signal->curr_target in copy_signal(). This way ->curr_target is
    never == NULL, we can kill the check in __group_complete_signal's hot path.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The comment in send_sig_info() is wrong, tasklist_lock can't help.

    The caller must ensure the task can't go away, otherwise ->sighand can be NULL
    even before we take the lock.

    p->sighand could be changed by exec(), but I can't imagine how it is possible
    to prevent exit(), but not exec().

    Since the things seem to work, I assume all callers are correct. However,
    drm_vbl_send_signals() looks broken. block_all_signals() which is solely used
    by drm is definitely broken.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Convert do_tkill() to use rcu_read_lock() + lock_task_sighand() to avoid
    taking tasklist lock.

    Note that we don't return an error if lock_task_sighand() fails, we pretend
    the task dies after receiving the signal. Otherwise, we should fight with the
    nasty races with mt-exec without having any advantage.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Move handle_stop_signal() into send_signal(). This factors out a couple of
    callsites and allows us to do further unifications.

    Also, with this change specific_send_sig_info() does handle_stop_signal().
    Not that this is really important, we never send STOP/CONT via send_sig() and
    friends, but still this looks more consistent.

    The only (afaics) special case is get_signal_to_deliver(). If the traced task
    dequeues SIGCONT, it can re-send it to itself after ptrace_stop() if the
    signal was blocked by debugger. In that case handle_stop_signal() is
    unnecessary, but hopefully not a problem.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • handle_stop_signal() was changed, now send_group_sigqueue() doesn't need
    tasklist_lock.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cosmetic, cache p->signal to make the code a bit more readable.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • send_group_sigqueue() calls handle_stop_signal(), send_sigqueue() doesn't.
    This is not consistent and in fact I'd say this is (minor) bug.

    Move handle_stop_signal() from send_group_sigqueue() to do_send_sigqueue(),
    the latter is called by send_sigqueue() too.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • lock_task_sighand() was changed, send_sigqueue() doesn't need rcu_read_lock()
    any longer.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cache the values of current->signal/sighand. Shrinks .text a bit and makes
    the code more readable. Also, remove "sigset_t *mask", it is pointless
    because in fact we save the constant offset.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cache the value of p->signal, and change the code to use while_each_thread()
    helper.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that handle_stop_signal() doesn't drop ->siglock, we can't see both
    ->group_stop_count && SIGNAL_STOP_STOPPED. Merge two "if" branches.

    As Roland pointed out, we never actually needed 2 do_notify_parent_cldstop()
    calls.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Previously handle_stop_signal(SIGCONT) could drop ->siglock. That is why
    kill_pid_info(SIGCONT) takes tasklist_lock to make sure the target task can't
    go away after unlock. Not needed now.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Based on discussion with Jiri and Roland.

    In short: currently handle_stop_signal(SIGCONT, p) sends the notification to
    p->parent, with this patch p itself notifies its parent when it becomes
    running.

    handle_stop_signal(SIGCONT) has to drop ->siglock temporary in order to notify
    the parent with do_notify_parent_cldstop(). This leads to multiple problems:

    - as Jiri Kosina pointed out, the stopped task can resume without
    actually seeing SIGCONT which may have a handler.

    - we race with another sig_kernel_stop() signal which may come in
    that window.

    - we race with sig_fatal() signals which may set SIGNAL_GROUP_EXIT
    in that window.

    - we can't avoid taking tasklist_lock() while sending SIGCONT.

    With this patch handle_stop_signal() just sets the new SIGNAL_CLD_CONTINUED
    flag in p->signal->flags and returns. The notification is sent by the first
    task which returns from finish_stop() (there should be at least one) or any
    other signalled thread from get_signal_to_deliver().

    This is a user-visible change. Say, currently kill(SIGCONT, stopped_child)
    can't return without seeing SIGCHLD, with this patch SIGCHLD can be delayed
    unpredictably. Another difference is that if the child is ptraced by another
    process, CLD_CONTINUED may be delivered to ->real_parent after ptrace_detach()
    while currently it always goes to the tracer which doesn't actually need this
    notification. Hopefully not a problem.

    The patch asks for the futher obvious cleanups, I'll send them separately.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Every implementation of ->task_kill() does nothing when the signal comes from
    the kernel. This is correct, but means that check_kill_permission() should
    call security_task_kill() only for SI_FROMUSER() case, and we can remove the
    same check from ->task_kill() implementations.

    (sadly, check_kill_permission() is the last user of signal->session/__session
    but we can't s/task_session_nr/task_session/ here).

    NOTE: Eric W. Biederman pointed out cap_task_kill() should die, and I think
    he is very right.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Roland McGrath
    Cc: Casey Schaufler
    Cc: David Quigley
    Cc: Eric Paris
    Cc: Harald Welte
    Cc: Pavel Emelyanov
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Both functions do the same thing after proper locking, but with
    different sigpending structs, so move the common code into a helper.

    After this we have 4 places that look very similar: send_sigqueue: calls
    do_send_sigqueue and signal_wakeup send_group_sigqueue: calls
    do_send_sigqueue and __group_complete_signal __group_send_sig_info:
    calls send_signal and __group_complete_signal specific_send_sig_info:
    calls send_signal and signal_wakeup

    Besides, send_signal performs actions similar to do_send_sigqueue's
    and __group_complete_signal - to signal_wakeup.

    It looks like they can be consolidated gracefully.

    Oleg said:

    Personally, I think this change is very good. But send_sigqueue() and
    send_group_sigqueue() have a very subtle difference which I was never able
    to understand.

    Let's suppose that sigqueue is already queued, and the signal is ignored
    (the latter means we should re-schedule cpu timer or handle overrruns). In
    that case send_sigqueue() returns 0, but send_group_sigqueue() returns 1.

    I think this is not the problem (in fact, I think this patch makes the
    behaviour more correct), but I hope Thomas can take a look and confirm.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The signr variable may be declared without initialization - it is set ro the
    return value from __dequeue_signal() right at the function beginning.

    Besides, after recalc_sigpending() two checks for signr to be not 0 may be
    merged into one. Both if-s become easier to read.

    Thanks to Oleg for pointing out mistakes in the first version of this patch.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Both sig_ignored() and do_sigaction() check for signr to be explicitly or
    implicitly ignored. Introduce a helper for them.

    This patch is aimed to help handling signals by pid namespace's init, and was
    derived from one of Oleg's patches
    https://lists.linux-foundation.org/pipermail/containers/2007-December/009308.html
    so, if he doesn't mind, he should be considered as an author.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • lock_task_sighand() was changed, and do_task_stat() doesn't need
    rcu_read_lock any longer. sighand->siglock protects all "interesting"
    fields.

    Except: it doesn't protect ->tty->pgrp, but neither does rcu_read_lock(), this
    should be fixed.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: "Paul E. McKenney"
    Cc: Roland McGrath
    Cc: Alan Cox
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Just a trivial example, more to come.

    k_getrusage() holds rcu_read_lock() because it was previously required by
    lock_task_sighand(). Unneeded now.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: "Paul E. McKenney"
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Most of the callers of lock_task_sighand() doesn't actually need rcu_lock().
    lock_task_sighand() needs it only to safely play with tsk->sighand, it can
    take the lock itself.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: "Paul E. McKenney"
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_group_exit() checks SIGNAL_GROUP_EXIT to avoid taking sighand->siglock.
    Since ed5d2cac114202fe2978a9cbcab8f5032796d538 exec() doesn't set this
    flag, we should use signal_group_exit().

    This is not needed for correctness, but can speedup the multithreaded exec
    and makes the code more consistent.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_signal_stop() needs signal_group_exit() but checks sig->group_exit_task.
    This (optimization) is correct, SIGNAL_STOP_DEQUEUED and SIGNAL_GROUP_EXIT
    are mutually exclusive, but looks confusing. Use signal_group_exit(), this
    is not fastpath, the code clarity is more important.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Two callers for send_signal() - the specific_send_sig_info and the
    __group_send_sig_info - both check for sig to be ignored or already queued.

    Move these checks into send_signal() and make it return 1 to indicate that the
    signal is dropped, but there's no error in this.

    Besides, merge comments and spell-check them.

    [oleg@tv-sign.ru: simplifications]
    Signed-off-by: Pavel Emelyanov
    Cc: Roland McGrath
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov