30 Sep, 2006

40 commits

  • A previous patch to allow an exiting task to OOM kill itself (and thereby
    avoid a little deadlock) introduced a problem. We don't want the
    PF_EXITING task, even if it is 'current', to access mem reserves if there
    is already a TIF_MEMDIE process in the system sucking up reserves.

    Also make the commenting a little bit clearer, and note that our current
    scheme of effectively single threading the OOM killer is not itself
    perfect.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • - It is not possible to have task->mm == &init_mm.

    - task_lock() buys nothing for 'if (!p->mm)' check.

    Signed-off-by: Oleg Nesterov
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No logic changes, but imho easier to read.

    Signed-off-by: Oleg Nesterov
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The only one usage of TASK_DEAD outside of last schedule path,

    select_bad_process:

    for_each_task(p) {

    if (!p->mm)
    continue;
    ...
    if (p->state == TASK_DEAD)
    continue;
    ...

    TASK_DEAD state is set at the end of do_exit(), this means that p->mm
    was already set == NULL by exit_mm(), so this task was already rejected
    by 'if (!p->mm)' above.

    Note also that the caller holds tasklist_lock, this means that p can't
    pass exit_notify() and then set TASK_DEAD when p->mm != NULL.

    Also, remove open-coded is_init().

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • I am not sure about this patch, I am asking Ingo to take a decision.

    task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
    it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
    live only in ->exit_state as documented in sched.h.

    Note that this state is not visible to user-space, get_task_state() masks off
    unsuitable states.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • After the previous change (->flags & PF_DEAD) (->state == EXIT_DEAD), we
    don't need PF_DEAD any longer.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
    to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD
    is in fact a "random" value, we can use any bit except normal TASK_XXX values.

    It is better to set this state in do_exit() along with PF_DEAD flag and remove
    that check in schedule().

    We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
    can not change task's ->state: the 'state' argument of try_to_wake_up() can't
    have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of
    ->state == TASK_RUNNING it will do nothing.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Introduce the disable_irq_nosync_lockdep_irqsave() and
    enable_irq_lockdep_irqrestore() APIs. These are needed for NE2000; basically
    NE2000 calls disable_irq and enable_irq as locking against the IRQ handler,
    but both in cases where interrupts are on and off. This means that lockdep
    needs to track the old state of the virtual irq flags on disable_irq, and
    restore these at enable_irq time.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Everyone passes valid pointer there.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If register_filesystem() fails mux workqueue must be killed.

    Signed-off-by: Alexey Dobriyan
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • It always returns 0, so relying on it is useless. The only caller isn't
    checking return value. In general, un-, de-, -free functions should return
    void.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If register_filesystem() fails, vxfs_inode cache must be destroyed.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Two lines -- two bugs. :-(

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Currently, __acquire and __release take a lock expression, but __cond_lock
    takes only a condition, not the lock acquired if the expression evaluates
    to true. Change __cond_lock to accept a lock expression, and change all
    the callers to pass in a lock expression.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • Fix "quiet" parameter doc. No trailing '=' sign, no value after it. And
    it disables "most" kernel messages, not all of them.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • At the beginning of the routine, "copied" is set to 0, but it is no good
    because in lines 805 and 812 it is set to other values. Finally, the
    routine returns as if it copied 12 (=ENOMEM) bytes less than it actually
    did.

    Signed-off-by: Frederik Deweerdt
    Acked-by: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederik Deweerdt
     
  • In the case below we are locking the whole disk not a partition. This
    change simply brings the code in line with the piece above where when we
    are the 'first' opener, and we are a partition.

    Signed-off-by: Jason Baron
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • spin_trylock_irq and spin_trylock_irqsave use _spin_trylock, which does not
    use the __cond_lock wrapper annotation and thus does not affect the lock
    context; change them to use spin_trylock instead, which does use
    __cond_lock.

    Signed-off-by: Josh Triplett
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • The lock annotations used on spinlocks and rwlocks currently use
    __{acquires,releases}(spinlock_t) and __{acquires,releases}(rwlock_t),
    respectively. This loses the information of which lock actually got
    acquired or released, and assumes a different type for the parameter of
    __acquires and __releases than the rest of the kernel. While the current
    implementations of __acquires and __releases throw away their argument,
    this will not always remain the case. Change this to use the lock
    parameter instead, to preserve this information and increase consistency in
    usage of __acquires and __releases.

    Signed-off-by: Josh Triplett
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • Signed-off-by: Alexey Dobriyan
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If your driver implements "break on" and "break off" this ensures you won't
    get multiple overlapping requests or requests in parallel. If your driver
    has its own break handling then its still your problem as the driver
    author.

    Break is also now serialized against writes from user space properly but no
    new guarantees are made driver level about writes from the line discipline
    itself (eg flow control or echo)

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • [akpm@osdl.org: build fix]
    [akpm@osdl.org: warning fix]
    Signed-off-by: Alan Cox
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Adds a missing exit, if the file that should be parsed couldn't be opened.
    Without it crashes with a segfault, cause the filedescriptor is accessed
    even if the file could not be opened.

    Signed-off-by: Henrik Kretzschmar
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Kretzschmar
     
  • use rcu locks for find_task_by_pid().

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • It is ok to do find_task_by_pid() + get_task_struct() under
    rcu_read_lock(), we cand drop tasklist_lock.

    Note that testing of ->exit_state is racy with or without tasklist anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • During testing I've found that the mount pending flag can be left set at
    exit from autofs4_lookup after a failed mount request. This shouldn't be
    allowed to happen and causes incorrect error returns.

    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • The check for an empty directory in the autofs4_follow_link method fails
    occassionally due to old dentrys. We had the same problem
    autofs4_revalidate ages ago. I thought we wouldn't need this in
    autofs4_follow_link, silly me.

    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • copy_process:
    // holds tasklist_lock + ->siglock
    /*
    * inherit ioprio
    */
    p->ioprio = current->ioprio;

    Why? ->ioprio was already copied in dup_task_struct(). I guess this is
    needed to ensure that the child can't escape
    sys_ioprio_set(IOPRIO_WHO_{PGRP,USER}), yes?

    In that case we don't need ->siglock held, and the comment should be
    updated.

    Signed-off-by: Oleg Nesterov
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Remove open-coded has_rt_policy(), no changes in kernel/exit.o

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • I am not sure this patch is correct: I can't understand what the current
    code does, and I don't know what it was supposed to do.

    The comment says:

    * can't change policy, except between SCHED_NORMAL
    * and SCHED_BATCH:

    The code:

    if (((policy != SCHED_NORMAL && p->policy != SCHED_BATCH) &&
    (policy != SCHED_BATCH && p->policy != SCHED_NORMAL)) &&

    But this is equivalent to:

    if ( (is_rt_policy(policy) && has_rt_policy(p)) &&

    which means something different. We can't _decrease_ the current
    ->rt_priority with such a check (if rlim[RLIMIT_RTPRIO] == 0).

    Probably, it was supposed to be:

    if ( !(policy == SCHED_NORMAL && p->policy == SCHED_BATCH) &&
    !(policy == SCHED_BATCH && p->policy == SCHED_NORMAL)

    this matches the comment, but strange: it doesn't allow to _drop_ the
    realtime priority when rlim[RLIMIT_RTPRIO] == 0.

    I think the right check would be:

    /* can't set/change rt policy */
    if (is_rt_policy(policy) &&
    policy != p->policy &&
    !rlim_rtprio)
    return -EPERM;

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Imho, makes the code a bit easier to read.

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Use rcu locks instead. sched_setscheduler() now takes ->siglock
    before reading ->signal->rlim[].

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Get rid of an extraneous printk in kernel_restart().

    Signed-off-by: Cal Peake
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cal Peake
     
  • If ____call_usermodehelper fails, we're not interested in the child
    process' exit value, but the real error, so let's stop wait_for_helper from
    overwriting it in that case.

    Issue discovered by Benedikt Böhm while working on a Linux-VServer usermode
    helper.

    Signed-off-by: Björn Steinbrink
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Björn Steinbrink
     
  • Currently this module just returns 1 if anything on module init fails. Store
    the error code of the different function calls and return their error on
    problems.

    Signed-off-by: Rolf Eike Beer
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    [ Fixed to not unregister twice on error ]
    Signed-off-by: Linus Torvalds

    Rolf Eike Beer
     
  • It is sure confusing that linux/ptrace.h has:
    #define PTRACE_SINGLESTEP 9
    #define PTRACE_ATTACH 0x10
    #define PTRACE_DETACH 0x11
    #define PTRACE_SYSCALL 24
    All the low-numbered constants are in decimal, but the last two in hex.
    It sure makes it likely that someone will look at this and think that
    9, 10, 11 are used, and that 16 and 17 are not used.

    How about we use the same notation for all the numbers [0,24] in the
    same short list?

    Signed-off-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Add some documentation comments for the cdev interface.

    Signed-off-by: Jonathan Corbet
    Cc: Rolf Eike Beer
    Acked-by: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan Corbet
     
  • [akpm@osdl.org: fix]
    Cc: Alan Cox
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • If we are going to BUG() not panic() here then we should cover the case of
    the BUG being compiled out

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox