09 Feb, 2008

40 commits

  • do_signal_stop() counts all sub-thread and sets ->group_stop_count
    accordingly. Every thread should decrement ->group_stop_count and stop,
    the last one should notify the parent.

    However a sub-thread can exit before it notices the signal_pending(), or it
    may be somewhere in do_exit() already. In that case the group stop never
    finishes properly.

    Note: this is a minimal fix, we can add some optimizations later. Say we
    can return quickly if thread_group_empty(). Also, we can move some signal
    related code from exit_notify() to exit_signals().

    Signed-off-by: Oleg Nesterov
    Acked-by: Davide Libenzi
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • As Eric pointed out, there is no problem with init starting with sid == pgid
    == 0, and this was historical linux behavior changed in 2.6.18.

    Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
    the rules for sys_setsid().

    This change and the previous change in daemonize() mean that /sbin/init does
    not need the special "session != 1" hack in sys_setsid() any longer. We can't
    remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
    update the comment only.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Daemonized kernel threads run in the init's session. This doesn't match the
    behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
    why we need a special hack in sys_setsid().

    Now that set_special_pids() was changed to use struct pid, not pid_t, we can
    use init_struct_pid and set 0,0 special pids.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change set_special_pids() to work with struct pid, not pid_t from global name
    space. This again speedups and imho cleanups the code, also a preparation for
    the next patch.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_setsid() still deals with pid_t's from the global namespace. This means
    that the "session > 1" check can't help for sub-namespace init, setsid() can't
    succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.

    Remove the usage of task_struct->pid and convert the code to use "struct pid".
    This also simplifies and speedups the code, saves one find_pid().

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
    versa. Use "struct pid" more consistently. Saves one find_vpid() and
    eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
    code.

    Also use the same_thread_group() helper.

    Signed-off-by: Oleg Nesterov
    Acked-by: Pavel Emelyanov
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
    The exit_state was EXIT_ZOMBIE when the function was called, and another
    thread can change it to EXIT_DEAD right after the check.

    The second condition is not possible, detached non-traced threads were already
    filtered out by eligible_child(), we didn't drop tasklist since then.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Surprise, the other two wait_task_*() functions also abuse the
    task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
    in wait_task_continued(). wait_task_zombie() doesn't have this problem,
    but it is still better to cache pid_t rather than call task_pid_nr_ns()
    three times on the saved pid_namespace.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Imho, the current usage of security_task_wait() is not logical.

    Suppose we have the single child p, and security_task_wait(p) return
    -EANY. In that case waitpid(-1) returns this error. Why? Isn't it
    better to return ECHLD? We don't really have reapable children.

    Now suppose that child was stolen by gdb. In that case we find this
    child on ->ptrace_children and set flag = 1, but we don't check that the
    child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't
    match the behaviour above. Without WNOHANG do_wait() blocks only to
    return the error later, when the child will be untraced. Inho, really
    strange.

    I think eligible_child() should return the error only if the child's pid
    was requested explicitly, otherwise we should silently ignore the tasks
    which were nacked by security_task_wait().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Chris Wright
    Cc: Eric Paris
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • eligible_child() == 2 means delay_group_leader(). With the previous patch
    this only matters for EXIT_ZOMBIE task, we can move that special check to
    the only place it is really needed.

    Also, with this patch we don't skip security_task_wait() for the group
    leaders in a non-empty thread group. I don't really understand the exact
    semantics of security_task_wait(), but imho this change is a bugfix.

    Also rearrange the code a bit to kill an ugly "check_continued" backdoor.

    Signed-off-by: Oleg Nesterov
    Cc: Eric Paris
    Cc: James Morris
    Cc: Roland McGrath
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • wait_task_stopped() doesn't need the "delay_group_leader" parameter. If
    the child is not traced it must be a group leader. With or without
    subthreads ->group_stop_count == 0 when the whole task is stopped.

    Signed-off-by: Oleg Nesterov
    Cc: Mika Penttila
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If the tracer is gone and we are not going to stop, ptrace_stop() sets
    ->exit_code = nostop_code. However, the tracer could actually clear the
    exit code before detaching. In that case get_signal_to_deliver() "resends"
    the signal which was cancelled by the debugger. For example, it is
    possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in
    STOPPED state.

    Change the behaviour of ptrace_stop(). If the caller is ptrace notify(),
    we should always clear ->exit_code. If the caller is
    get_signal_to_deliver(), we should not touch it at all. To do so, change
    the nonstop_code parameter to "bool clear_code" and change the callers
    accordingly.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Every branch if the main "if" statement does the same code at the end. Move
    it down. Also, fix the indentation.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • wait_task_stopped() has multiple races with SIGCONT/SIGKILL. tasklist_lock
    does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info
    reported (including exit_code) may be wrong.

    In fact, the code under write_lock_irq(tasklist_lock) is not safe. The child
    may be PTRACE_DETACH'ed at this time by another subthread, in that case it is
    possible we are no longer its ->parent.

    Change wait_task_stopped() to take ->siglock before inspecting the task. This
    guarantees that the child can't resume and (for example) clear its
    ->exit_code, so we don't need to use xchg(&p->exit_code) and re-check. The
    only exception is ptrace_stop() which changes ->state and ->exit_code without
    ->siglock held during abort. But this can only happen if both the tracer and
    the tracee are dying (coredump is in progress), we don't care.

    With this patch wait_task_stopped() doesn't move the child to the end of
    the ->parent list on success. This optimization could be restored, but
    in that case we have to take write_lock(tasklist) and do some nasty
    checks.

    Also change the do_wait() since we don't return EAGAIN any longer.

    [akpm@linux-foundation.org: fix up after Willy renamed everything]
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If the tracer went away (may_ptrace_stop() failed), ptrace_stop() drops
    tasklist and then changes the ->state from TASK_TRACED to TASK_RUNNING.

    This can fool another tracer which attaches to us in between. Change the
    ->state under tasklist_lock to ensure that ptrace_check_attach() can't wrongly
    succeed. Also, remove the unnecessary mb().

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • It is not possible to see the PT_PTRACED task without ->signal/sighand under
    tasklist_lock, release_task() does ptrace_unlink() first. If the task was
    already released before, ptrace_attach() can't succeed and set PT_PTRACED.
    Remove this check.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED"
    inline and simplify the corresponding logic in do_wait: we can't find the
    child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace()
    either sets TASK_STOPPED or wakes up the tracee.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Since the patch

    "Fix ptrace_attach()/ptrace_traceme()/de_thread() race"
    commit f5b40e363ad6041a96e3da32281d8faa191597b9

    we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock.

    This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer"
    which were needed to catch the "ptrace attach is in progress" case. We can
    also remove the flag itself since nobody else uses it.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
    ipc_namespace is released to free all ipcs of each type. But in fact, they
    do the same thing: they loop around all ipcs to free them individually by
    calling a specific routine.

    This patch proposes to consolidate this by introducing a common function,
    free_ipcs(), that do the job. The specific routine to call on each
    individual ipcs is passed as parameter. For this, these ipc-specific
    'free' routines are reworked to take a generic 'struct ipc_perm' as
    parameter.

    Signed-off-by: Pierre Peiffer
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
    msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
    are dynamically allocated for each icp_namespace as the ipc_namespace
    itself (for the init namespace, they are initialized with pointers to
    static variables instead)

    It is so for historical reason: in fact, before the use of idr to store the
    ipcs, the ipcs were stored in tables of variable length, depending of the
    maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
    size. As they are allocated in any cases for each new ipc_namespace, there
    is no gain of memory in having them allocated separately of the struct
    ipc_namespace.

    This patch proposes to make this table static in the struct ipc_namespace.
    Thus, we can allocate all in once and get rid of all the code needed to
    allocate and free these ipc_ids separately.

    Signed-off-by: Pierre Peiffer
    Acked-by: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • These commands (SEM_STAT and IPC_STAT) are rather doing the same things
    (only the meaning of the id given as input and the return value differ).
    However, for the semaphores, they are handled in two different places (two
    different functions).

    This patch consolidates this for clarification by handling these both
    commands in the same place in semctl_nolock(). It also removes one unused
    parameter for this function.

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
    inline. Besides, they give no optimization being inline as they perform
    calls inside in any case.

    Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
    internal API.

    $ ./scripts/bloat-o-meter vmlinux-orig vmlinux
    add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
    function old new delta
    ipcget - 392 +392
    ipc_lock_check_down - 49 +49
    ipc_lock_check - 49 +49
    sys_semget 119 105 -14
    sys_shmget 108 86 -22
    sys_msgget 100 78 -22
    do_msgsnd 665 631 -34
    do_msgrcv 680 644 -36
    do_shmat 771 733 -38
    sys_msgctl 1302 1229 -73
    ipcget_new 80 - -80
    sys_semtimedop 1534 1452 -82
    sys_semctl 2034 1922 -112
    sys_shmctl 1919 1765 -154
    ipcget_public 322 - -322

    The ipcget() growth is the result of gcc inlining of currently static
    ipcget_new/_public.

    Signed-off-by: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Also removes a cflag comparison that caused some mode changes to get wrongly
    ignored

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Cc: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Get the constness right, avoid nasty cast.

    Cc: Ingo Molnar
    Cc: Kyle McMartin
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • module.c should not define linker variables on its own. We have an include
    file for that.

    Signed-off-by: Christoph Lameter
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The module subsystem cannot handle symbols that are zero. If symbols are
    present that have a zero value then the module resolver prints out a
    message that these symbols are unresolved.

    [akinobu.mita@gmail.com: fix __find_symbl() error checks]
    Cc: Mathieu Desnoyers
    Cc: Kay Sievers
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter