18 Oct, 2007

2 commits


17 Oct, 2007

38 commits

  • This patch contains the following cleanups that are now possible:
    - remove the unused security_operations->inode_xattr_getsuffix
    - remove the no longer used security_operations->unregister_security
    - remove some no longer required exit code
    - remove a bunch of no longer used exports

    Signed-off-by: Adrian Bunk
    Acked-by: James Morris
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • For those who don't care about CONFIG_SECURITY.

    Signed-off-by: Alexey Dobriyan
    Cc: "Serge E. Hallyn"
    Cc: Casey Schaufler
    Cc: James Morris
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Change migration_call(CPU_DEAD) to use direct spin_lock_irq() instead of
    task_rq_lock(rq->idle), rq->idle can't change its task_rq().

    This makes the code a bit more symmetrical with migrate_dead_tasks()'s path
    which uses spin_lock_irq/spin_unlock_irq.

    Signed-off-by: Oleg Nesterov
    Cc: Cliff Wickman
    Cc: Gautham R Shenoy
    Cc: Ingo Molnar
    Cc: Srivatsa Vaddagiri
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently move_task_off_dead_cpu() is called under
    write_lock_irq(tasklist). This means it can't use task_lock() which is
    needed to improve migrating to take task's ->cpuset into account.

    Change the code to call move_task_off_dead_cpu() with irqs enabled, and
    change migrate_live_tasks() to use read_lock(tasklist).

    This all is a preparation for the futher changes proposed by Cliff Wickman, see
    http://marc.info/?t=117327786100003

    Signed-off-by: Oleg Nesterov
    Cc: Cliff Wickman
    Cc: Gautham R Shenoy
    Cc: Ingo Molnar
    Cc: Srivatsa Vaddagiri
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • load_module() returns zero when mod_sysfs_init() fails, then the module
    loading will succeed accidentally.

    This patch makes load_module() return error correctly in that case.

    Acked-by: Greg Kroah-Hartman
    Acked-by: Rusty Russell
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Compiling handle_percpu_irq only on uniprocessor generates an artificial
    special case so a typical use like:

    set_irq_chip_and_handler(irq, &some_irq_type, handle_percpu_irq);

    needs to be conditionally compiled only on SMP systems as well and an
    alternative UP construct is usually needed - for no good reason.

    This fixes uniprocessor configurations for some MIPS SMP systems.

    Signed-off-by: Ralf Baechle
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     
  • Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a
    little bit confusing. Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as
    magic for inotifyfs.

    Signed-off-by: Andrey Mirkin
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Mirkin
     
  • The blessed way for standard caches is to use it. Besides, this may give
    this cache a better alignment.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Cedric Le Goater
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • For those who deselect POSIX message queues.

    Reduces SLAB size of user_struct from 64 to 32 bytes here, SLUB size -- from
    40 bytes to 32 bytes.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Save some space because uid_hash_find() has 3 callsites.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • .. in an effort to make read-only whatever can be made, so that
    CONFIG_DEBUG_RODATA can catch as many issues as possible.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • {,un}register_timer_hook() is the API that should be used.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • kernel/sys_ni.c can't #include due to cond_syscall(),
    but let's tell gcc to not warn with -Wmissing-prototypes.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Kconfig.preempt is not included on some archs (for example, m68k). On those
    archs, the Kconfig machinery complains that KVM selects an undefined symbol
    PREEMPT_NOTIFIERS (which lives in Kconfig.preempt).

    So move the offending symbol into a Kconfig file which is included by
    everyone.

    Cc: Roman Zippel
    Cc: Geert Uytterhoeven
    Signed-off-by: Avi Kivity
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Avi Kivity
     
  • robust_list, compat_robust_list, pi_state_list, pi_state_cache are
    really used if futexes are on.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
    were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
    impossible to grep for them. So these names should be changed. This
    discussion is the following:
    http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • [2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data.
    The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed
    the above values, and it was not good from the reliability viewpoint.
    So makedumpfile v1.2.0 came to need these values and I created the patch
    to let the kernel output them.
    makedumpfile site:
    https://sourceforge.net/projects/makedumpfile/

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • [1/3] Cleanup the coding style according to Andrew's comments:
    http://lists.infradead.org/pipermail/kexec/2007-August/000522.html
    - vmcoreinfo_append_str() should have suitable __attribute__s so that
    the compiler can check its use.
    - vmcoreinfo_max_size should have size_t.
    - Use get_seconds() instead of xtime.tv_sec.
    - Use init_uts_ns.name.release instead of UTS_RELEASE.

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • This patch set frees the restriction that makedumpfile users should install a
    vmlinux file (including the debugging information) into each system.

    makedumpfile command is the dump filtering feature for kdump. It creates a
    small dumpfile by filtering unnecessary pages for the analysis. To
    distinguish unnecessary pages, it needs a vmlinux file including the debugging
    information. These days, the debugging package becomes a huge file, and it is
    hard to install it into each system.

    To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
    the result, we reached the conclusion that necessary information for dump
    filtering (called "vmcoreinfo") should be embedded into the first kernel file
    and it should be accessed through /proc/vmcore during the second kernel.
    (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)

    Dan Aloni created the patch set for the above implementation.
    (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)

    And I updated it for multi architectures and memory models.
    (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)

    Signed-off-by: Dan Aloni
    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Bernhard Walle
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • do_sigaction() returns -ERESTARTNOINTR if signal_pending(). The comment says:

    * If there might be a fatal signal pending on multiple
    * threads, make sure we take it before changing the action.

    I think this is not needed. We should only worry about SIGNAL_GROUP_EXIT case,
    bit it implies a pending SIGKILL which can't be cleared by do_sigaction.

    Kill this special case.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks
    if an rt-prio execer shares the same cpu with ->group_leader. Change the code
    to use ->group_exit_task/notify_count mechanics.

    This patch certainly uglifies the code, perhaps someone can suggest something
    better.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Repost of http://lkml.org/lkml/2007/8/10/472 made available by request.

    The locking used by get_random_bytes() can conflict with the
    preempt_disable() and synchronize_sched() form of RCU. This patch changes
    rcutorture's RNG to gather entropy from the new cpu_clock() interface
    (relying on interrupts, preemption, daemons, and rcutorture's reader
    thread's rock-bottom scheduling priority to provide useful entropy), and
    also adds and EXPORT_SYMBOL_GPL() to make that interface available to GPLed
    kernel modules such as rcutorture.

    Passes several hours of rcutorture.

    [ego@in.ibm.com: Use raw_smp_processor_id() in rcu_random()]
    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar
    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • To avoid lock contention, we distribute the sched_timer calls across the
    cpus so they do not trigger at the same instant. However, I used NR_CPUS,
    which can cause needless grouping on small smp systems depending on your
    kernel config. This patch converts to using num_possible_cpus() so we
    spread it as evenly as possible on every machine.

    Briefly tested w/ NR_CPUS=255 and verified reduced contention.

    Signed-off-by: John Stultz
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • - remove the no longer required __attribute__((weak)) of xtime_lock
    - remove the following no longer used EXPORT_SYMBOL's:
    - xtime
    - xtime_lock

    Signed-off-by: Adrian Bunk
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • The child was found on ->children list under tasklist_lock, it must have a
    valid ->signal. __exit_signal() both removes the task from parent->children
    and clears ->signal "atomically" under write_lock(tasklist).

    Remove unneeded checks.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. __group_complete_signal() wakes up ->group_exit_task twice. The
    second wakeup's state includes TASK_UNINTERRUPTIBLE, which is not very
    appropriate.

    Change the code to pass the "correct" argument to signal_wake_up() and kill
    now unneeded wake_up_process().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The "p->exit_signal == -1 && p->ptrace == 0" check and the comment are
    bogus. We already did exactly the same check in eligible_child(), we did
    not drop tasklist_lock since then, and both variables need
    write_lock(tasklist) to be changed.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Nowadays thread_group_empty() and next_thread() are simple list operations,
    this optimization doesn't make sense: we are doing exactly same check one
    line below.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • ->siglock provides enough protection to iterate over the thread group.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Two threads, T1 and T2. T2 ptraces P, and P is not a child of ptracer's
    thread group. P exits and goes to TASK_ZOMBIE.

    T1 does wait_task_zombie(P):

    P->exit_state = TASK_DEAD;
    ...
    read_unlock(&tasklist_lock);

    T2 does exit(), takes tasklist,
    forget_original_parent() does
    __ptrace_unlink(P) but doesn't
    call do_notify_parent(P) because
    p->exit_state == EXIT_DEAD.

    Now, P is not visible to our process: __ptrace_unlink() removed it from
    ->children. We should send notification to P->parent and release P if and
    only if SIGCHLD is ignored.

    And we have 3 bugs:

    1. P->parent does do_wait() and gets -ECHILD (P is on ->parent->children,
    but its state is TASK_DEAD).

    2. // wait_task_zombie() continues

    if (put_user(...)) {
    // TODO: is this safe?
    p->exit_state = EXIT_ZOMBIE;
    return;
    }

    we return without notification/release, task_struct leaked.

    Solution: ignore -EFAULT and proceed. It is an application's bug if
    we can't fill infop/stat_addr (in case of VM_FAULT_OOM we have much
    more problems).

    3. // wait_task_zombie() continues

    if (p->real_parent != p->parent) {
    // Not taken, it was untraced'ed
    ...
    }

    release_task(p);

    we released the task which we shouldn't.

    Solution: check ->real_parent != ->parent before, under tasklist_lock,
    but use ptrace_unlink() instead of __ptrace_unlink() to check ->ptrace.

    This patch hopefully solves 2 and 3, the 1st bug will be fixed later, we need
    some cleanups in forget_original_parent/reparent_thread.

    However, the first race is very unlikely and not critical, so I hope it makes
    sense to fix 1 and 2 for now.

    4. Small cleanup: don't "restore" EXIT_ZOMBIE unless we know we are not going
    to realease the child.

    Signed-off-by: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A zombie must have a valid ->signal, we are going to release it and
    __exit_signal() starts with BUG_ON(!sig).

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • With or without this patch, multi-threaded init's are not fully supported,
    but do_exit() is completely wrong. This becomes a real problem when we
    support pid namespaces.

    1. do_exit() panics when the main thread of /sbin/init exits. It should not
    until the whole thread group exits. Move the code below, under the
    "if (group_dead)" check.

    Note: this means that forget_original_parent() can use an already dead
    child_reaper()'s task_struct. This is OK for /sbin/init because

    - do_wait() from alive sub-thread still can reap a zombie, we iterate
    over all sub-thread's ->children lists

    - do_notify_parent() will wakeup some alive sub-thread because it sends
    the group-wide signal

    However, we should remove choose_new_parent()->BUG_ON(reaper->exit_state)
    for this.

    2. We are playing games with ->nsproxy->pid_ns. This code is bogus today, and
    it has to be changed anyway when we really support pid namespaces, just
    remove it.

    Signed-off-by: Oleg Nesterov
    Roland McGrath
    Cc: "Eric W. Biederman"
    Cc: Sukadev Bhattiprolu
    Cc: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • With the recent changes, do_sigaction()->recalc_sigpending_and_wake() can
    never clear TIF_SIGPENDING. Instead, it can set this flag and wake up the
    thread without any reason. Harmless, but unneeded and wastes CPU.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • It is a bit annoying that do_exit() takes ->pi_lock to set PF_EXITING. All
    we need is to synchronize with lookup_pi_state() which saw this task
    without PF_EXITING under ->pi_lock.

    Change do_exit() to use spin_unlock_wait().

    Signed-off-by: Oleg Nesterov
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Add two new functions for reading the kernel log buffer. The intention is for
    them to be used by recovery/dump/debug code so the kernel log can be easily
    retrieved/parsed in a crash scenario, but they are generic enough for other
    people to dream up other fun uses.

    [akpm@linux-foundation.org: buncha fixes]
    Signed-off-by: Mike Frysinger
    Cc: Robin Getz
    Cc: Greg Ungerer
    Cc: Russell King
    Cc: Paul Mundt
    Acked-by: Tim Bird
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • This patch contains the following cleanups:
    - make the needlessly global variable rt_trace_on static
    - remove the unused global function deadlock_trace_off()

    Signed-off-by: Adrian Bunk
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch adds the /sys/module//notes/ magic directory, which has a
    file for each allocated SHT_NOTE section that appears in .ko. This
    is the counterpart for each module of /sys/kernel/notes for vmlinux.
    Reading this delivers the contents of the module's SHT_NOTE sections. This
    lets userland easily glean any detailed information about that module's
    build that was stored there at compile time (e.g. by ld --build-id).

    Signed-off-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Andy Gospodarek pointed out that because we return in the middle of the
    free_irq() function, we never actually do call the IRQ handler that just
    got deregistered. This should fix it, although I expect Andrew will want
    to convert those 'return's to 'break'. That's a separate change though.

    Signed-off-by: David Woodhouse
    Cc: Andy Gospodarek
    Cc: Fernando Luis Vzquez Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse