09 Feb, 2008

40 commits

  • Probing non-ISA interrupts using the handle_percpu_irq as their handle_irq
    method may crash the system because handle_percpu_irq does not check
    IRQ_WAITING. This for example hits the MIPS Qemu configuration.

    This patch provides two helper functions set_irq_noprobe and set_irq_probe to
    set rsp. clear the IRQ_NOPROBE flag. The only current caller is MIPS code
    but this really belongs into generic code.

    As an aside, interrupt probing these days has become a mostly obsolete if not
    dangerous art. I think Linux interrupts should be changed to default to
    non-probing but that's subject of this patch.

    Signed-off-by: Ralf Baechle
    Acked-and-tested-by: Rob Landley
    Cc: Alan Cox
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     
  • Currently, for every sysfs node, the callers will be responsible for
    implementing store operation, so many many callers are doing duplicate
    things to validate input, they have the same mistakes because they are
    calling simple_strtol/ul/ll/uul, especially for module params, they are
    just numeric, but you can echo such values as 0x1234xxx, 07777888 and
    1234aaa, for these cases, module params store operation just ignores
    succesive invalid char and converts prefix part to a numeric although input
    is acctually invalid.

    This patch tries to fix the aforementioned issues and implements
    strict_strtox serial functions, kernel/params.c uses them to strictly
    validate input, so module params will reject such values as 0x1234xxxx and
    returns an error:

    write error: Invalid argument

    Any modules which export numeric sysfs node can use strict_strtox instead of
    simple_strtox to reject any invalid input.

    Here are some test results:

    Before applying this patch:

    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]#

    After applying this patch:

    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
    -bash: echo: write error: Invalid argument
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
    -bash: echo: write error: Invalid argument
    [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
    -bash: echo: write error: Invalid argument
    [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
    -bash: echo: write error: Invalid argument
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
    [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
    4096
    [root@yangyi-dev /]#

    [akpm@linux-foundation.org: fix compiler warnings]
    [akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de]
    Signed-off-by: Yi Yang
    Cc: Greg KH
    Cc: "Randy.Dunlap"
    Cc: Takashi Iwai
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yi Yang
     
  • Fix following warning:
    WARNING: o-x86_64/kernel/built-in.o(.text+0x36d8b): Section mismatch in reference from the function enable_nonboot_cpus() to the function .cpuinit.text:_cpu_up()

    enable_nonboot_cpus() are used solely from CONFIG_CONFIG_PM_SLEEP_SMP=y
    and PM_SLEEP_SMP imply HOTPLUG_CPU therefore the reference
    to _cpu_up() is valid.
    Annotate enable_nonboot_cpus() with __ref to silence modpost.

    Signed-off-by: Sam Ravnborg
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     
  • The proper behavior to store task's pid and get this task later is to get the
    struct pid pointer and get the task with the pid_task() call.

    Make it for rt_mutex_waiter->deadlock_task_pid field.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • All the functions that need to lookup a task by pid in posix timers obtain
    this pid from a user space, and thus this value refers to a task in the same
    namespace, as the current task lives in.

    So the proper behavior is to call find_task_by_vpid() here.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When the conversion factor between jiffies and milli- or microseconds is
    not a single multiply or divide, as for the case of HZ == 300, we currently
    do a multiply followed by a divide. The intervening result, however, is
    subject to overflows, especially since the fraction is not simplified (for
    HZ == 300, we multiply by 300 and divide by 1000).

    This is exposed to the user when passing a large timeout to poll(), for
    example.

    This patch replaces the multiply-divide with a reciprocal multiplication on
    32-bit platforms. When the input is an unsigned long, there is no portable
    way to do this on 64-bit platforms there is no portable way to do this
    since it requires a 128-bit intermediate result (which gcc does support on
    64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
    since the output is a 32-bit integer in the cases affected, just simplify
    the multiply-divide (*3/10 instead of *300/1000).

    The reciprocal multiply used can have off-by-one errors in the upper half
    of the valid output range. This could be avoided at the expense of having
    to deal with a potential 65-bit intermediate result. Since the intent is
    to avoid overflow problems and most of the other time conversions are only
    semiexact, the off-by-one errors were considered an acceptable tradeoff.

    At Ralf Baechle's suggestion, this version uses a Perl script to compute
    the necessary constants. We already have dependencies on Perl for kernel
    compiles. This does, however, require the Perl module Math::BigInt, which
    is included in the standard Perl distribution starting with version 5.8.0.
    In order to support older versions of Perl, include a table of canned
    constants in the script itself, and structure the script so that
    Math::BigInt isn't required if pulling values from said table.

    Running the script requires that the HZ value is available from the
    Makefile. Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
    architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
    m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
    sh64 architectures, since Paul Mundt has dealt with those separately in the
    sh tree.

    Signed-off-by: H. Peter Anvin
    Cc: Ralf Baechle ,
    Cc: Sam Ravnborg ,
    Cc: Paul Mundt ,
    Cc: Richard Henderson ,
    Cc: Michael Starvik ,
    Cc: David Howells ,
    Cc: Yoshinori Sato ,
    Cc: Hirokazu Takata ,
    Cc: Geert Uytterhoeven ,
    Cc: Roman Zippel ,
    Cc: William L. Irwin ,
    Cc: Chris Zankel ,
    Cc: H. Peter Anvin ,
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     
  • Makes an embedded image a bit smaller.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • delayed_work_timer_fn() is a timer function, make it static.

    Signed-off-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • The scheduled removal of the 'time' option.

    Signed-off-by: Adrian Bunk
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Don't include linux/security.h twice in kernel/sysctl.c

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Remove duplicate inclusion of linux/profile.h from kernel/profile.c

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Remove the duplicate inclusion of linux/jiffies.h from kernel/printk.c

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Remains the question whether it is intended that many, perhaps even large,
    tables are compiled in without ever having a chance to get used, i.e.
    whether there shouldn't #ifdef CONFIG_xxx get added.

    [akpm@linux-foundation.org: fix cut-n-paste error]
    Signed-off-by: Jan Beulich
    Acked-by: "Eric W. Biederman"
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Harvey Harrison
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • Fix typo in comments.

    BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
    checkpatch.pl will be complaining.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Function timekeeping_is_continuous() no longer checks flag
    CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename
    the function accordingly.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • list_for_each_safe() suffices here.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Flag CLOCK_SOURCE_WATCHDOG is cleared twice. Note clocksource_change_rating()
    won't do anyting with the cs flag.

    Signed-off-by: Li Zefan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • There's only one caller left - the kill_pgrp one - so merge these two
    functions and forget the kill_pgrp_info one.

    Signed-off-by: Pavel Emelyanov
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is the first step (of two) in removing the kill_pgrp_info.

    All the users of this function are in kernel/signal.c, but all they need is to
    call __kill_pgrp_info() with the tasklist_lock read-locked.

    Fortunately, one of its users is the kill_something_info(), which already
    needs this lock in one of its branches, so clean these branches up and call
    the __kill_pgrp_info() directly.

    Based on Oleg's view of how this function should look.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
    _all_ converted to operate on the current pid namespace. After this each call
    like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
    one.

    Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
    appropriate.

    Signed-off-by: Pavel Emelyanov
    Reviewed-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • signal_struct->tsk points to the ->group_leader and thus we have the nasty
    code in de_thread() which has to change it and restart ->real_timer if the
    leader is changed.

    Use "struct pid *leader_pid" instead. This also allows us to kill now
    unneeded send_group_sig_info().

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Davide Libenzi
    Cc: Pavel Emelyanov
    Acked-by: Roland McGrath
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • kill_pid_info()->pid_task() could be the old leader of the execing process.
    In that case it is possible that the leader will be released before we take
    siglock. This means that kill_pid_info() (and thus sys_kill()) can return a
    false -ESRCH.

    Change the code to retry when lock_task_sighand() fails. The endless loop is
    not possible, __exit_signal() both clears ->sighand and does detach_pid().

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Davide Libenzi
    Cc: Pavel Emelyanov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • With the new semantics of find_vpid() we don't need to play with ->nsproxy
    explicitely, _vxx() do the right things.

    Also s/tasklist/rcu/.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • pid_vnr returns the user space pid with respect to the pid namespace the
    struct pid was allocated in. What we want before we return a pid to user
    space is the user space pid with respect to the pid namespace of current.

    pid_vnr is a very nice optimization but because it isn't quite what we want
    it is easy to use pid_vnr at times when we aren't certain the struct pid
    was allocated in our pid namespace.

    Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the
    parent process reported in the core dumps and the parent process in
    get_signal_to_deliver.

    So unless the performance impact is huge having an interface that does what
    we want instead of always what we want should be much more reliable and
    much less error prone.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This modifies do_wait and eligible child to take a pair of enum pid_type
    and struct pid *pid to precisely specify what set of processes are eligible
    to be waited for, instead of the raw pid_t value from sys_wait4.

    This fixes a bug in sys_waitid where you could not wait for children in
    just process group 1.

    This fixes a pid namespace crossing case in eligible_child. Allowing us to
    wait for a processes in our current process group even if our current
    process group == 0.

    This allows the no child with this pid case to be optimized. This allows
    us to optimize the pid membership test in eligible child to be optimized.

    This even closes a theoretical pid wraparound race where in a threaded
    parent if two threads are waiting for the same child and one thread picks
    up the child and the pid numbers wrap around and generate another child
    with that same pid before the other thread is scheduled (teribly insanely
    unlikely) we could end up waiting on the second child with the same pid#
    and not discover that the specific child we were waiting for has exited.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The previous bugfix was not optimal, we shouldn't care about group stop
    when we are the only thread or the group stop is in progress. In that case
    nothing special is needed, just set PF_EXITING and return.

    Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().

    So, from the performance POV the only difference is that we don't trust
    !signal_pending() until we take ->siglock. But this in fact fixes another
    ___pure___ theoretical minor race. __group_complete_signal() finds the
    task without PF_EXITING and chooses it as the target for signal_wake_up().
    But nothing prevents this task from exiting in between without noticing the
    pending signal and thus unpredictably delaying the actual delivery.

    Signed-off-by: Oleg Nesterov
    Cc: Davide Libenzi
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Eric's "fix clone(CLONE_NEWPID)" eliminated the last reason for this hack.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_signal_stop() counts all sub-thread and sets ->group_stop_count
    accordingly. Every thread should decrement ->group_stop_count and stop,
    the last one should notify the parent.

    However a sub-thread can exit before it notices the signal_pending(), or it
    may be somewhere in do_exit() already. In that case the group stop never
    finishes properly.

    Note: this is a minimal fix, we can add some optimizations later. Say we
    can return quickly if thread_group_empty(). Also, we can move some signal
    related code from exit_notify() to exit_signals().

    Signed-off-by: Oleg Nesterov
    Acked-by: Davide Libenzi
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • As Eric pointed out, there is no problem with init starting with sid == pgid
    == 0, and this was historical linux behavior changed in 2.6.18.

    Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
    the rules for sys_setsid().

    This change and the previous change in daemonize() mean that /sbin/init does
    not need the special "session != 1" hack in sys_setsid() any longer. We can't
    remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
    update the comment only.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Daemonized kernel threads run in the init's session. This doesn't match the
    behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
    why we need a special hack in sys_setsid().

    Now that set_special_pids() was changed to use struct pid, not pid_t, we can
    use init_struct_pid and set 0,0 special pids.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change set_special_pids() to work with struct pid, not pid_t from global name
    space. This again speedups and imho cleanups the code, also a preparation for
    the next patch.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_setsid() still deals with pid_t's from the global namespace. This means
    that the "session > 1" check can't help for sub-namespace init, setsid() can't
    succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.

    Remove the usage of task_struct->pid and convert the code to use "struct pid".
    This also simplifies and speedups the code, saves one find_pid().

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
    versa. Use "struct pid" more consistently. Saves one find_vpid() and
    eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
    code.

    Also use the same_thread_group() helper.

    Signed-off-by: Oleg Nesterov
    Acked-by: Pavel Emelyanov
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
    The exit_state was EXIT_ZOMBIE when the function was called, and another
    thread can change it to EXIT_DEAD right after the check.

    The second condition is not possible, detached non-traced threads were already
    filtered out by eligible_child(), we didn't drop tasklist since then.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Surprise, the other two wait_task_*() functions also abuse the
    task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
    in wait_task_continued(). wait_task_zombie() doesn't have this problem,
    but it is still better to cache pid_t rather than call task_pid_nr_ns()
    three times on the saved pid_namespace.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Imho, the current usage of security_task_wait() is not logical.

    Suppose we have the single child p, and security_task_wait(p) return
    -EANY. In that case waitpid(-1) returns this error. Why? Isn't it
    better to return ECHLD? We don't really have reapable children.

    Now suppose that child was stolen by gdb. In that case we find this
    child on ->ptrace_children and set flag = 1, but we don't check that the
    child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't
    match the behaviour above. Without WNOHANG do_wait() blocks only to
    return the error later, when the child will be untraced. Inho, really
    strange.

    I think eligible_child() should return the error only if the child's pid
    was requested explicitly, otherwise we should silently ignore the tasks
    which were nacked by security_task_wait().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Chris Wright
    Cc: Eric Paris
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • eligible_child() == 2 means delay_group_leader(). With the previous patch
    this only matters for EXIT_ZOMBIE task, we can move that special check to
    the only place it is really needed.

    Also, with this patch we don't skip security_task_wait() for the group
    leaders in a non-empty thread group. I don't really understand the exact
    semantics of security_task_wait(), but imho this change is a bugfix.

    Also rearrange the code a bit to kill an ugly "check_continued" backdoor.

    Signed-off-by: Oleg Nesterov
    Cc: Eric Paris
    Cc: James Morris
    Cc: Roland McGrath
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • wait_task_stopped() doesn't need the "delay_group_leader" parameter. If
    the child is not traced it must be a group leader. With or without
    subthreads ->group_stop_count == 0 when the whole task is stopped.

    Signed-off-by: Oleg Nesterov
    Cc: Mika Penttila
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If the tracer is gone and we are not going to stop, ptrace_stop() sets
    ->exit_code = nostop_code. However, the tracer could actually clear the
    exit code before detaching. In that case get_signal_to_deliver() "resends"
    the signal which was cancelled by the debugger. For example, it is
    possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in
    STOPPED state.

    Change the behaviour of ptrace_stop(). If the caller is ptrace notify(),
    we should always clear ->exit_code. If the caller is
    get_signal_to_deliver(), we should not touch it at all. To do so, change
    the nonstop_code parameter to "bool clear_code" and change the callers
    accordingly.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov