11 May, 2007

3 commits

  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • attach_pid() currently takes a pid_t and then uses find_pid() to find the
    corresponding struct pid. Sometimes we already have the struct pid. We can
    then skip find_pid() if attach_pid() were to take a struct pid parameter.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • If CONFIG_TASK_IO_ACCOUNTING is defined, we update io accounting counters for
    each task.

    This patch permits reporting of values using the well known getrusage()
    syscall, filling ru_inblock and ru_oublock instead of null values.

    As TASK_IO_ACCOUNTING currently counts bytes counts, we approximate blocks
    count doing : nr_blocks = nr_bytes / 512

    Example of use :
    ----------------------
    After patch is applied, /usr/bin/time command can now give a good
    approximation of IO that the process had to do.

    $ /usr/bin/time grep tototo /usr/include/*
    Command exited with non-zero status 1
    0.00user 0.02system 0:02.11elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
    24288inputs+0outputs (0major+259minor)pagefaults 0swaps

    $ /usr/bin/time dd if=/dev/zero of=/tmp/testfile count=1000
    1000+0 enregistrements lus
    1000+0 enregistrements écrits
    512000 octets (512 kB) copiés, 0,00326601 seconde, 157 MB/s
    0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+3000outputs (0major+299minor)pagefaults 0swaps

    Signed-off-by: Eric Dumazet
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

10 May, 2007

2 commits

  • Currently kernel threads use sigprocmask(SIG_BLOCK) to protect against
    signals. This doesn't prevent the signal delivery, this only blocks
    signal_wake_up(). Every "killall -33 kthreadd" means a "struct siginfo"
    leak.

    Change kthreadd_setup() to set all handlers to SIG_IGN instead of blocking
    them (make a new helper ignore_signals() for that). If the kernel thread
    needs some signal, it should use allow_signal() anyway, and in that case it
    should not use CLONE_SIGHAND.

    Note that we can't change daemonize() (should die!) in the same way,
    because it can be used along with CLONE_SIGHAND. This means that
    allow_signal() still should unblock the signal to work correctly with
    daemonize()ed threads.

    However, disallow_signal() doesn't block the signal any longer but ignores
    it.

    NOTE: with or without this patch the kernel threads are not protected from
    handle_stop_signal(), this seems harmless, but not good.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When a kernel thread calls daemonize, instead of reparenting the thread to
    init reparent the thread to kthreadd next to the threads created by
    kthread_create.

    This is really just a stop gap until daemonize goes away, but it does
    ensure no kernel threads are under init and they are all in one place that
    is easy to find.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

09 May, 2007

1 commit


08 May, 2007

1 commit

  • wait* syscalls return -ECHILD even when an individual PID of a live child
    was requested explicitly, when security_task_wait denies the operation.
    This means that something like a broken SELinux policy can produce an
    unexpected failure that looks just like a bug with wait or ptrace or
    something.

    This patch makes do_wait return -EACCES (or other appropriate error returned
    from security_task_wait() instead of -ECHILD if some children were ruled out
    solely because security_task_wait failed.

    [jmorris@namei.org: switch error code to EACCES]
    Signed-off-by: Roland McGrath
    Acked-by: Stephen Smalley
    Cc: Chris Wright
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

29 Mar, 2007

1 commit

  • In commit 0475ac0845f9295bc5f69af45f58dff2c104c8d1 when converting the
    orphaned process group handling to use struct pid I made a small
    mistake. I accidentally replaced an == with a !=.

    Besides just being a dumb thing to do apparently this has a bad side
    effect. The improper orphaned process group detection causes kwin to
    die after a suspend/resume cycle.

    I'm amazed this patch has been around as long as it has without anyone
    else noticing something funny going on.

    And the following people deserve credit for spotting and helping
    to reproduce this.

    Thanks to: Sid Boyce
    Thanks to: "Michael Wu"

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

13 Feb, 2007

4 commits

  • Every call to is_orphaned_pgrp passed in process_group(current) which is racy
    with respect to another thread changing our process group. It didn't bite us
    because we were dealing with integers and the worse we would get would be a
    stale answer.

    In switching the checks to use struct pid to be a little more efficient and
    prepare the way for pid namespaces this race became apparent.

    So I simplified the calls to the more specialized is_current_pgrp_orphaned so
    I didn't have to worry about making logic changes to avoid the race.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Modify has_stopped_jobs and will_become_orphan_pgrp to use struct pid based
    process groups. This reduces the number of hash tables looks ups and paves
    the way for multiple pid spaces.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • To properly implement a pid namespace I need to deal exclusively in terms of
    struct pid, because pid_t values become ambiguous.

    To this end session_of_pgrp is transformed to take and return a struct pid
    pointer. To avoid the need to worry about reference counting I now require my
    caller to hold the appropriate locks. Leaving callers repsonsible for
    increasing the reference count if they need access to the result outside of
    the locks.

    Since session_of_pgrp currently only has one caller and that caller simply
    uses only test the result for equality with another process group, the locking
    change means I don't actually have to acquire the tasklist_lock at all.

    tiocspgrp is also modified to take and release the lock. The logic there is a
    little more complicated but nothing I won't need when I convert pgrp of a tty
    to a struct pid pointer.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • close_files() can sometimes take long enough to trigger the soft lockup
    detector.

    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

12 Feb, 2007

1 commit

  • A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
    source files, including:

    * make multi-line initial descriptions single line
    * denote some function names, constants and structs as such
    * change erroneous opening '/*' to '/**' in a few places
    * reword some text for clarity

    Signed-off-by: Robert P. J. Day
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

31 Jan, 2007

3 commits

  • This is based on a patch by Eric W. Biederman, who pointed out that pid
    namespaces are still fake, and we only have one ever active.

    So for the time being, we can modify any code which could access
    tsk->nsproxy->pid_ns during task exit to just use &init_pid_ns instead,
    and move the exit_task_namespaces call in do_exit() back above
    exit_notify(), so that an exiting nfs server has a valid tsk->sighand to
    work with.

    Long term, pulling pid_ns out of nsproxy might be the cleanest solution.

    Signed-off-by: Eric W. Biederman

    [ Eric's patch fixed to take care of free_pid() too ]

    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • This reverts commit 7a238fcba0629b6f2edbcd37458bae56fcf36be5 in
    preparation for a better and simpler fix proposed by Eric Biederman
    (and fixed up by Serge Hallyn)

    Acked-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Fix exit race by splitting the nsproxy putting into two pieces. First
    piece reduces the nsproxy refcount. If we dropped the last reference, then
    it puts the mnt_ns, and returns the nsproxy as a hint to the caller. Else
    it returns NULL. The second piece of exiting task namespaces sets
    tsk->nsproxy to NULL, and drops the references to other namespaces and
    frees the nsproxy only if an nsproxy was passed in.

    A little awkward and should probably be reworked, but hopefully it fixes
    the NFS oops.

    Signed-off-by: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Cedric Le Goater
    Cc: Daniel Hokka Zakrisson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Jan, 2007

1 commit

  • Commit b2b2cbc4b2a2f389442549399a993a8306420baf introduced a user-
    visible change: ->pdeath_signal is sent only when the entire thread
    group exits.

    While this change is imho good, it may break things. So restore the
    old behaviour for now.

    Signed-off-by: Oleg Nesterov
    To: Albert Cahalan
    Cc: Eric W. Biederman
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Ingo Molnar
    Cc: Qi Yong
    Cc: Roland McGrath
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

23 Dec, 2006

2 commits

  • This patch fixes the case when we reparent to a different thread in the
    same thread group. This modifies the code so that we do not send
    signals and do not change the signal to send to SIGCHLD unless we have
    change the thread group of our parents. It also suppresses sending
    pdeath_sig in this cas as well since the result of geppid doesn't
    change.

    Thanks to Oleg for spotting my bug of only fixing this for non-ptraced
    tasks.

    Signed-off-by: Eric W. Biederman
    Cc: Mike Galbraith
    Cc: Albert Cahalan
    Cc: Andrew Morton
    Cc: Roland McGrath
    Cc: Ingo Molnar
    Cc: Coywolf Qi Hunt
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Christoph Hellwig has expressed concerns that the recent fdtable changes
    expose the details of the RCU methodology used to release no-longer-used
    fdtable structures to the rest of the kernel. The trivial patch below
    addresses these concerns by introducing the appropriate free_fdtable()
    calls, which simply wrap the release RCU usage. Since free_fdtable() is a
    one-liner, it makes sense to promote it to an inline helper.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     

11 Dec, 2006

2 commits

  • An fdtable can either be embedded inside a files_struct or standalone (after
    being expanded). When an fdtable is being discarded after all RCU references
    to it have expired, we must either free it directly, in the standalone case,
    or free the files_struct it is contained within, in the embedded case.

    Currently the free_files field controls this behavior, but we can get rid of
    it entirely, as all the necessary information is already recorded. We can
    distinguish embedded and standalone fdtables using max_fds, and if it is
    embedded we can divine the relevant files_struct using container_of().

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     
  • Currently, each fdtable supports three dynamically-sized arrays of data: the
    fdarray and two fdsets. The code allows the number of fds supported by the
    fdarray (fdtable->max_fds) to differ from the number of fds supported by each
    of the fdsets (fdtable->max_fdset).

    In practice, it is wasteful for these two sizes to differ: whenever we hit a
    limit on the smaller-capacity structure, we will reallocate the entire fdtable
    and all the dynamic arrays within it, so any delta in the memory used by the
    larger-capacity structure will never be touched at all.

    Rather than hogging this excess, we shouldn't even allocate it in the first
    place, and keep the capacities of the fdarray and the fdsets equal. This
    patch removes fdtable->max_fdset. As an added bonus, most of the supporting
    code becomes simpler.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     

09 Dec, 2006

7 commits

  • All members of the process group have the same sid and it can't be == 0.

    NOTE: this code (and a similar one in sys_setpgid) was needed because it
    was possibe to have ->session == 0. It's not possible any longer since

    [PATCH] pidhash: don't use zero pids
    Commit: c7c6464117a02b0d54feb4ebeca4db70fa493678

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Add a per pid_namespace child-reaper. This is needed so processes are reaped
    within the same pid space and do not spill over to the parent pid space. Its
    also needed so containers preserve existing semantic that pid == 1 would reap
    orphaned children.

    This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
    other namespaces being developped for the containers : pid, uts, ipc, etc.
    'namespace' variables and attributes are also renamed to 'mnt_ns'

    Signed-off-by: Kirill Korotaev
    Signed-off-by: Cedric Le Goater
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     
  • Add an anonymous union and ((deprecated)) to catch direct usage of the
    session field.

    [akpm@osdl.org: fix various missed conversions]
    [jdike@addtoit.com: fix UML bug]
    Signed-off-by: Jeff Dike
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Replace occurences of task->signal->session by a new process_session() helper
    routine.

    It will be useful for pid namespaces to abstract the session pid number.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Make set_special_pids() static, the only caller is daemonize().

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Fix the locking of signal->tty.

    Use ->sighand->siglock to protect ->signal->tty; this lock is already used
    by most other members of ->signal/->sighand. And unless we are 'current'
    or the tasklist_lock is held we need ->siglock to access ->signal anyway.

    (NOTE: sys_unshare() is broken wrt ->sighand locking rules)

    Note that tty_mutex is held over tty destruction, so while holding
    tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
    are governed by their open file handles. This leaves some holes for tty
    access from signal->tty (or any other non file related tty access).

    It solves the tty SLAB scribbles we were seeing.

    (NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
    be examined by someone familiar with the security framework, I think
    it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
    invocations)

    [schwidefsky@de.ibm.com: 3270 fix]
    [akpm@osdl.org: various post-viro fixes]
    Signed-off-by: Peter Zijlstra
    Acked-by: Alan Cox
    Cc: Oleg Nesterov
    Cc: Prarit Bhargava
    Cc: Chris Wright
    Cc: Roland McGrath
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Martin Schwidefsky
    Cc: Jan Kara
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

08 Dec, 2006

1 commit

  • do_exit:
    taskstats_exit_alloc()
    ...
    taskstats_exit_send()
    taskstats_exit_free()

    I think this is not good, let it be a single function exported to the core
    kernel, taskstats_exit(), which does alloc + send + free itself.

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

29 Oct, 2006

1 commit

  • taskstats_tgid_free() is called on copy_process's error path. This is wrong.

    IF (clone_flags & CLONE_THREAD)
    We should not clear ->signal->taskstats, current uses it,
    it probably has a valid accumulated info.
    ELSE
    taskstats_tgid_init() set ->signal->taskstats = NULL,
    there is nothing to free.

    Move the callsite to __exit_signal(). We don't need any locking, entire
    thread group is exiting, nobody should have a reference to soon to be
    released ->signal.

    Signed-off-by: Oleg Nesterov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

02 Oct, 2006

3 commits

  • exit_task_namespaces() has replaced the former exit_namespace(). It
    invalidates task->nsproxy and associated namespaces. This is an issue for
    the (futur) pid namespace which is required to be valid in exit_notify().

    This patch moves exit_task_namespaces() after exit_notify() to keep nsproxy
    valid.

    Signed-off-by: Cedric Le Goater
    Cc: Serge E. Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • This moves the mount namespace into the nsproxy. The mount namespace count
    now refers to the number of nsproxies point to it, rather than the number of
    tasks. As a result, the unshare_namespace() function in kernel/fork.c no
    longer checks whether it is being shared.

    Signed-off-by: Serge Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • This patch adds a nsproxy structure to the task struct. Later patches will
    move the fs namespace pointer into this structure, and introduce a new utsname
    namespace into the nsproxy.

    The vserver and openvz functionality, then, would be implemented in large part
    by virtualizing/isolating more and more resources into namespaces, each
    contained in the nsproxy.

    [akpm@osdl.org: build fix]
    Signed-off-by: Serge Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Oct, 2006

2 commits


30 Sep, 2006

5 commits

  • I am not sure about this patch, I am asking Ingo to take a decision.

    task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
    it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
    live only in ->exit_state as documented in sched.h.

    Note that this state is not visible to user-space, get_task_state() masks off
    unsuitable states.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • After the previous change (->flags & PF_DEAD) (->state == EXIT_DEAD), we
    don't need PF_DEAD any longer.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
    to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD
    is in fact a "random" value, we can use any bit except normal TASK_XXX values.

    It is better to set this state in do_exit() along with PF_DEAD flag and remove
    that check in schedule().

    We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
    can not change task's ->state: the 'state' argument of try_to_wake_up() can't
    have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of
    ->state == TASK_RUNNING it will do nothing.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Remove open-coded has_rt_policy(), no changes in kernel/exit.o

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If we are going to BUG() not panic() here then we should cover the case of
    the BUG being compiled out

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox