19 Jun, 2009

40 commits

  • create_uts_ns() will be used by C/R to create fresh uts_ns.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • copy_pid_ns() is a perfect example of a case where unwinding leads to more
    code and makes it less clear. Watch the diffstat.

    Signed-off-by: Alexey Dobriyan
    Cc: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Reviewed-by: Serge Hallyn
    Acked-by: Sukadev Bhattiprolu
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • create_pid_namespace() creates everything, but caller has to assign parent
    pidns by hand, which is unnatural. At the moment of call new ->level has
    to be taken from somewhere and parent pidns is already available.

    Signed-off-by: Alexey Dobriyan
    Cc: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Acked-by: Serge Hallyn
    Acked-by: Sukadev Bhattiprolu
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
    find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
    So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
    find_task_by_pid_ns to implement find_task_by_vpid.

    While we're at it also remove the exports for find_task_by_pid_ns and
    find_task_by_vpid - we don't have any modular callers left as the only
    modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
    switched to pid_task which operates on a struct pid pointer instead of a
    pid_t. Given the confusion about pid_t values vs namespace that's
    generally the better option anyway and I think we're better of restricting
    modules to do it that way.

    Signed-off-by: Christoph Hellwig
    Cc: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Remoce the unused variable 'val' from __do_proc_dointvec()

    The integer has been declared and used as 'val = -val' and there is no
    reference to it anywhere.

    Signed-off-by: Sukanto Ghosh
    Cc: Jaswinder Singh Rajput
    Cc: Sukanto Ghosh
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukanto Ghosh
     
  • One of my programs frequently grabs the parport, does something with it
    and then drops it again. This results in spamming of the kernel log with

    "... registered pardevice"
    "... unregistered pardevice"

    These messages are completely useless, except for debugging ppdev,
    probably. So put them under DEBUG (or dynamic debug).

    Signed-off-by: Michael Buesch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Buesch
     
  • Fix this:
    isicom.c: In function `isicom_probe':
    isicom.c:1587: warning: `signature' may be used uninitialized in this function
    by uninitialized_var(), because if the signature is not initialized in
    reset_card(), we won't use it.

    Signed-off-by: Jiri Slaby
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • memory_open() ignores devlist and does a switch for each item, duplicating
    code and conditional definitions.

    Clean it up by adding backing_dev_info to devlist and use it to lookup for
    the minor device.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Adriano dos Santos Fernandes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adriano dos Santos Fernandes
     
  • The definition of ipc_parse_version depends on
    __ARCH_WANT_IPC_PARSE_VERSION, but the header file declares it
    conditionally based on the architecture.

    Use the macro consistently to make it easier to add new architectures.

    Signed-off-by: Arnd Bergmann
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Now that kthread_stop() can be used even if the task has already exited,
    we can kill the "wait_to_die:" loop in migration_thread(). But we must
    pin rq->migration_thread after creation.

    Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
    ->migration_thread exit. Perhaps we can simplify this code a bit more.
    migration_call() can set ->should_stop and forget about this thread. But
    we need a new helper in kthred.c for that.

    Signed-off-by: Oleg Nesterov
    Cc: Christoph Hellwig
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Pavel Emelyanov
    Cc: Rusty Russell
    Cc: Vitaliy Gusev
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Based on Eric's patch which in turn was based on my patch.

    kthread_stop() has the nasty problems:

    - it runs unpredictably long with the global semaphore held.

    - it deadlocks if kthread itself does kthread_stop() before it obeys
    the kthread_should_stop() request.

    - it is not useable if kthread exits on its own, see for example the
    ugly "wait_to_die:" hack in migration_thread()

    - it is not possible to just tell kthread it should stop, we must always
    wait for its exit.

    With this patch kthread() allocates all neccesary data (struct kthread) on
    its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used
    as a pointer into "struct kthread", this means kthread_stop() can easily
    wait for kthread's exit.

    Signed-off-by: Oleg Nesterov
    Cc: Christoph Hellwig
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Pavel Emelyanov
    Cc: Rusty Russell
    Cc: Vitaliy Gusev
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • We use two completions two create the kernel thread, this is a bit ugly.
    kthread() wakes up create_kthread() via ->started, then create_kthread()
    wakes up the caller kthread_create() via ->done. But kthread() does not
    need to wait for kthread(), it can just return. Instead kthread() itself
    can wake up the caller of kthread_create().

    Kill kthread_create_info->started, ->done is enough. This improves the
    scalability a bit and sijmplifies the code.

    The only problem if kernel_thread() fails, in that case create_kthread()
    must do complete(&create->done).

    Signed-off-by: Oleg Nesterov
    Cc: Christoph Hellwig
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Pavel Emelyanov
    Cc: Rusty Russell
    Cc: Vitaliy Gusev
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Reorder struct wait_opts to remove 8 bytes of alignment padding on 64 bit
    builds.

    Signed-off-by: Richard Kennedy
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Kennedy
     
  • do_wait:

    current->state = TASK_INTERRUPTIBLE;

    read_lock(&tasklist_lock);
    ... search for the task to reap ...

    In theory, the ->state changing can leak into the critical section. Since
    the child can change its status under read_lock(tasklist) in parallel
    (finish_stop/ptrace_stop), we can miss the wakeup if __wake_up_parent()
    sees us in TASK_RUNNING state. Add the barrier.

    Also, use __set_current_state() to set TASK_RUNNING.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_wait() does BUG_ON(tsk->signal != current->signal), this looks like a
    raher obsolete check. At least, I don't think do_wait() is the best place
    to verify that all threads have the same ->signal. Remove it.

    Also, change the code to use while_each_thread().

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that we don't pass &retval down to other helpers we can simplify
    the code more.

    - kill tsk_result, just use retval

    - add the "notask" label right after the main loop, and
    s/got end/goto notask/ after the fastpath pid check.

    This way we don't need to initialize retval before this
    check and the code becomes a bit more clean, if this pid
    has no attached tasks we should just skip the list search.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Introduce "struct wait_opts" which holds the parameters for misc helpers
    in do_wait() pathes.

    This adds 13 lines to kernel/exit.c, but saves 256 bytes from .o and imho
    makes the code much more readable.

    This patch temporary uglifies rusage/siginfo code a little bit, will be
    addressed by further cleanups.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No functional changes, preparation for the next patch.

    ptrace_do_wait() adds WUNTRACED to options for wait_task_stopped() which
    should always accept the stopped tracee, even if do_wait() was called
    without WUNTRACED.

    Change wait_task_stopped() to check "ptrace || WUNTRACED" instead. This
    makes the code more explicit, and "int options" argument becomes const in
    do_wait() pathes.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • In theory it is not safe to dereference ->parent/real_parent without
    tasklist or rcu lock, we can race with re-parenting.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The forked child can have TIF_SIGPENDING if it was copied from parent's
    ti->flags. But this is harmless and actually almost never happens,
    because copy_process() can't succeed if signal_pending() == T.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There is no reason for thread_group_cputime() in wait_task_zombie(), there
    must be no other threads.

    This call was previously needed to collect the per-cpu data which we do
    not have any longer.

    Signed-off-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Acked-by: Roland McGrath
    Cc: Stanislaw Gruszka
    Cc: Vitaly Mayatskikh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change ptrace_getsiginfo/ptrace_setsiginfo to use lock_task_sighand()
    without tasklist_lock. Perhaps it makes sense to make a single helper
    with "bool rw" argument.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If the non-traced sub-thread calls do_notify_parent_cldstop(), we send the
    notification to group_leader->real_parent and we report group_leader's
    pid.

    But, if group_leader is traced we use the wrong ->parent->nsproxy->pid_ns,
    the tracer and parent can live in different namespaces. Change the code
    to use "parent" instead of tsk->parent.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Acked-by: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change wait_task_zombie() to use ->real_parent instead of ->parent. We
    could even use current afaics, but ->real_parent is more clean.

    We know that the child is not ptrace_reparented() and thus they are equal.
    But we should avoid using task_struct->parent, we are going to remove it.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • - Use rcu_read_lock() instead of tasklist_lock to find/get the task
    in ptrace_get_task_struct().

    - Make it static, it has no callers outside of ptrace.c.

    - The comment doesn't match the reality, this helper does not do
    any checks. Beacuse it is really trivial and static I removed the
    whole comment.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Remove the "Nasty, nasty" lock dance in ptrace_attach()/ptrace_traceme() -
    from now task_lock() has nothing to do with ptrace at all.

    With the recent changes nobody uses task_lock() to serialize with ptrace,
    but in fact it was never needed and it was never used consistently.

    However ptrace_attach() calls __ptrace_may_access() and needs task_lock()
    to pin task->mm for get_dumpable(). But we can call __ptrace_may_access()
    before we take tasklist_lock, ->cred_exec_mutex protects us against
    do_execve() path which can change creds and MMF_DUMP* flags.

    (ugly, but we can't use ptrace_may_access() because it hides the error
    code, so we have to take task_lock() and use __ptrace_may_access()).

    NOTE: this change assumes that LSM hooks, security_ptrace_may_access() and
    security_ptrace_traceme(), can be called without task_lock() held.

    Signed-off-by: Oleg Nesterov
    Cc: Chris Wright
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • ptrace_attach() and ptrace_traceme() are the last functions which look as
    if the untraced task can have task->ptrace != 0, this must not be
    possible. Change the code to just check ->ptrace != 0 and s/|=/=/ to set
    PT_PTRACED.

    Also, a couple of trivial whitespace cleanups in ptrace_attach().

    And move ptrace_traceme() up near ptrace_attach() to keep them close to
    each other.

    Signed-off-by: Oleg Nesterov
    Cc: Chris Wright
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • - Add PF_KTHREAD check to prevent attaching to the kernel thread
    with a borrowed ->mm.

    With or without this change we can race with daemonize() which
    can set PF_KTHREAD or clear ->mm after ptrace_attach() does the
    check, but this doesn't matter because reparent_to_kthreadd()
    does ptrace_unlink().

    - Kill "!task->mm" check. We don't really care about ->mm != NULL,
    and the task can call exit_mm() right after we drop task_lock().
    What we need is to make sure we can't attach after exit_notify(),
    check task->exit_state != 0 instead.

    Also, move the "already traced" check down for cosmetic reasons.

    Signed-off-by: Oleg Nesterov
    Cc: Chris Wright
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No functional changes.

    - Nobody except ptrace.c & co should use ptrace flags directly, we have
    task_ptrace() for that.

    - No need to specially check PT_PTRACED, we must not have other PT_ bits
    set without PT_PTRACED. And no need to know this flag exists.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • tracehook_unsafe_exec() doesn't need task_lock(), remove the old comment.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • "Search in the siblings" should use ->real_parent, not ->parent. If the
    task is traced then ->parent == tracer, while the task's parent is always
    ->real_parent.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • m32r: PTRACE_SINGLESTEP sets PT_DTRACE, it is never used except cleared
    after do_execve().

    Signed-off-by: Oleg Nesterov
    Acked-by: Hirokazu Takata
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • m68k sets PT_DTRACE in trap_c() but never uses it.

    Signed-off-by: Oleg Nesterov
    Acked-by: Geert Uytterhoeven
    Acked-by: Greg Ungerer
    Cc: Roman Zippel
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • avr32, mn10300, parisc, s390, sh, xtensa:

    They never set PT_DTRACE, but clear it after do_execve().

    Signed-off-by: Oleg Nesterov
    Cc: David Howells
    Acked-by: Kyle McMartin
    Cc: Grant Grundler
    Cc: Matthew Wilcox
    Acked-by: Martin Schwidefsky
    Cc: Heiko Carstens
    Acked-by: Paul Mundt
    Acked-by: Chris Zankel
    Acked-by: Roland McGrath
    Acked-by: Haavard Skinnemoen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • h8300 defines PT_DTRACE for asm but never uses it.

    DEFINE(PT_PTRACED, PT_PTRACED) seems to be unused too.

    Signed-off-by: Oleg Nesterov
    Acked-by: Yoshinori Sato
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • allow_signal() checks ->mm == NULL. Not sure why. Perhaps to make sure
    current is the kernel thread. But this helper must not be used unless we
    are the kernel thread, kill this check.

    Also, document the fact that the CLONE_SIGHAND kthread must not use
    allow_signal(), unless the caller really wants to change the parent's
    ->sighand->action as well.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Try to fix memcg's lru rotation sanity: make memcg use the same logic as
    the global LRU does.

    Now, at __isolate_lru_page() retruns -EBUSY, the page is rotated to the
    tail of LRU in global LRU's isolate LRU pages. But in memcg, it's not
    handled. This makes memcg do the same behavior as global LRU and rotate
    LRU in the page is busy.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Mel Gorman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • We don't have an interface to reset mem.limit or memsw.limit now.

    This patch allows to reset mem.limit or memsw.limit when they are being
    set to -1.

    Signed-off-by: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the
    user just want to limit the total size of applications, in other words,
    not very interested in memory usage itself. In this case, swap-out will
    be done only by global-LRU.

    But, under current implementation, memory.limit_in_bytes is checked at
    first and try_to_free_page() may do swap-out. But, that swap-out is
    useless for memsw.limit_in_bytes and the thread may hit limit again.

    This patch tries to fix the current behavior at memory.limit ==
    memsw.limit case. And documentation is updated to explain the behavior of
    this special case.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch fixes mis-accounting of swap usage in memcg.

    In the current implementation, memcg's swap account is uncharged only when
    swap is completely freed. But there are several cases where swap cannot
    be freed cleanly. For handling that, this patch changes that memcg
    uncharges swap account when swap has no references other than cache.

    By this, memcg's swap entry accounting can be fully synchronous with the
    application's behavior.

    This patch also changes memcg's hooks for swap-out.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki