11 May, 2007

2 commits

  • Statically initialize a struct pid for the swapper process (pid_t == 0) and
    attach it to init_task. This is needed so task_pid(), task_pgrp() and
    task_session() interfaces work on the swapper process also.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Eric Biederman
    Cc: Herbert Poetzl
    Cc:
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • attach_pid() currently takes a pid_t and then uses find_pid() to find the
    corresponding struct pid. Sometimes we already have the struct pid. We can
    then skip find_pid() if attach_pid() were to take a struct pid parameter.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     

13 Feb, 2007

1 commit


09 Dec, 2006

1 commit

  • Add a per pid_namespace child-reaper. This is needed so processes are reaped
    within the same pid space and do not spill over to the parent pid space. Its
    also needed so containers preserve existing semantic that pid == 1 would reap
    orphaned children.

    This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     

03 Oct, 2006

1 commit


02 Oct, 2006

5 commits

  • proc_pid_make_inode:

    ei->pid = get_pid(task_pid(task));

    I think this is not safe. get_pid() can be preempted after checking "pid
    != NULL". Then the task exits, does detach_pid(), and RCU frees the pid.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • I think it is hardly possible to read the current do_each_task_pid(). The
    new version is much simpler and makes the code smaller.

    Only the do_each_task_pid change is tested, the do_each_pid_task isn't.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • As we stop storing pid_t's and move to storing struct pid *. We need a way to
    get the pid_t from the struct pid to report to user space what we have stored.

    Having a clean well defined way to do this is especially important as we move
    to multiple pid spaces as may need to report a different value to the caller
    depending on which pid space the caller is in.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • To avoid pid rollover confusion the kernel needs to work with struct pid *
    instead of pid_t. Currently there is not an iterator that walks through all
    of the tasks of a given pid type starting with a struct pid. This prevents us
    replacing some pid_t instances with struct pid. So this patch adds
    do_each_pid_task which walks through the set of task for a given pid type
    starting with a struct pid.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The problem: An opendir, readdir, closedir sequence can fail to report
    process ids that are continually in use throughout the sequence of system
    calls. For this race to trigger the process that proc_pid_readdir stops at
    must exit before readdir is called again.

    This can cause ps to fail to report processes, and it is in violation of
    posix guarantees and normal application expectations with respect to
    readdir.

    Currently there is no way to work around this problem in user space short
    of providing a gargantuan buffer to user space so the directory read all
    happens in on system call.

    This patch implements the normal directory semantics for proc, that
    guarantee that a directory entry that is neither created nor destroyed
    while reading the directory entry will be returned. For directory that are
    either created or destroyed during the readdir you may or may not see them.
    Furthermore you may seek to a directory offset you have previously seen.

    These are the guarantee that ext[23] provides and that posix requires, and
    more importantly that user space expects. Plus it is a simple semantic to
    implement reliable service. It is just a matter of calling readdir a
    second time if you are wondering if something new has show up.

    These better semantics are implemented by scanning through the pids in
    numerical order and by making the file offset a pid plus a fixed offset.

    The pid scan happens on the pid bitmap, which when you look at it is
    remarkably efficient for a brute force algorithm. Given that a typical
    cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There
    are only 40 cache lines for the entire 32K pid space. A typical system
    will have 100 pids or more so this is actually fewer cache lines we have to
    look at to scan a linked list, and the worst case of having to scan the
    entire pid bitmap is pretty reasonable.

    If we need something more efficient we can go to a more efficient data
    structure for indexing the pids, but for now what we have should be
    sufficient.

    In addition this takes no additional locks and is actually less code than
    what we are doing now.

    Also another very subtle bug in this area has been fixed. It is possible
    to catch a task in the middle of de_thread where a thread is assuming the
    thread of it's thread group leader. This patch carefully handles that case
    so if we hit it we don't fail to return the pid, that is undergoing the
    de_thread dance.

    Thanks to KAMEZAWA Hiroyuki for
    providing the first fix, pointing this out and working on it.

    [oleg@tv-sign.ru: fix it]
    Signed-off-by: Eric W. Biederman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Oleg Nesterov
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

27 Sep, 2006

1 commit

  • In de_thread we move pids from one process to another, a rather ugly case.
    The function transfer_pid makes it clear what we are doing, and makes the
    action atomic. This is useful we ever want to atomically traverse the
    process group and session lists, in a rcu safe manner.

    Even if the atomic properties this change should be a win as transfer_pid
    should be less code to execute than executing both attach_pid and
    detach_pid, and this should make de_thread slightly smaller as only a
    single function call needs to be emitted. The only downside is that the
    code might be slower to execute as the odds are against transfer_pid being
    in cache.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

01 Apr, 2006

1 commit

  • Simplifies the code, reduces the need for 4 pid hash tables, and makes the
    code more capable.

    In the discussions I had with Oleg it was felt that to a large extent the
    cleanup itself justified the work. With struct pid being dynamically
    allocated meant we could create the hash table entry when the pid was
    allocated and free the hash table entry when the pid was freed. Instead of
    playing with the hash lists when ever a process would attach or detach to a
    process.

    For myself the fact that it gave what my previous task_ref patch gave for free
    with simpler code was a big win. The problem is that if you hold a reference
    to struct task_struct you lock in 10K of low memory. If you do that in a user
    controllable way like /proc does, with an unprivileged but hostile user space
    application with typical resource limits of 1000 fds and 100 processes I can
    trigger the OOM killer by consuming all of low memory with task structs, on a
    machine wight 1GB of low memory.

    If I instead hold a reference to struct pid which holds a pointer to my
    task_struct, I don't suffer from that problem because struct pid is 2 orders
    of magnitude smaller. In fact struct pid is small enough that most other
    kernel data structures dwarf it, so simply limiting the number of referring
    data structures is enough to prevent exhaustion of low memory.

    This splits the current struct pid into two structures, struct pid and struct
    pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
    struct pid_link is the per process linkage into the hash tables and lives in
    struct task_struct. struct pid is given an indepedent lifetime, and holds
    pointers to each of the pid types.

    The independent life of struct pid simplifies attach_pid, and detach_pid,
    because we are always manipulating the list of pids and not the hash table.
    In addition in giving struct pid an indpendent life it makes the concept much
    more powerful.

    Kernel data structures can now embed a struct pid * instead of a pid_t and
    not suffer from pid wrap around problems or from keeping unnecessarily
    large amounts of memory allocated.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

29 Mar, 2006

2 commits

  • This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
    kernel/pid.c and speeding up subthreads create/destroy a bit. It is also a
    preparation for the further tref/pids rework.

    This patch adds 'struct list_head thread_group' to 'struct task_struct'
    instead.

    We don't detach group leader from PIDTYPE_PID namespace until another
    thread inherits it's ->pid == ->tgid, so we are safe wrt premature
    free_pidmap(->tgid) call.

    Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
    Should the need arise, we can use find_task_by_pid()->group_leader.

    Signed-off-by: Oleg Nesterov
    Acked-By: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • switch_exec_pids is only called from de_thread by way of exec, and it is
    only called when we are exec'ing from a non thread group leader.

    Currently switch_exec_pids gives the leader the pid of the thread and
    unhashes and rehashes all of the process groups. The leader is already in
    the EXIT_DEAD state so no one cares about it's pids. The only concern for
    the leader is that __unhash_process called from release_task will function
    correctly. If we don't touch the leader at all we know that
    __unhash_process will work fine so there is no need to touch the leader.

    For the task becomming the thread group leader, we just need to give it the
    pid of the old thread group leader, add it to the task list, and attach it
    to the session and the process group of the thread group.

    Currently de_thread is also adding the task to the task list which is just
    silly.

    Currently the only leader of __detach_pid besides detach_pid is
    switch_exec_pids because of the ugly extra work that was being
    performed.

    So this patch removes switch_exec_pids because it is doing too much, it is
    creating an unnecessary special case in pid.c, duing work duplicated in
    de_thread, and generally obscuring what it is going on.

    The necessary work is added to de_thread, and it seems to be a little
    clearer there what is going on.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds