20 Oct, 2007

40 commits

  • This patch introduces ipcs storage into IDRs. The main changes are:
    . This ipc_ids structure is changed: the entries array is changed into a
    root idr structure.
    . The grow_ary() routine is removed: it is not needed anymore when adding
    an ipc structure, since we are now using the IDR facility.
    . The ipc_rmid() routine interface is changed:
    . there is no need for this routine to return the pointer passed in as
    argument: it is now declared as a void
    . since the id is now part of the kern_ipc_perm structure, no need to
    have it as an argument to the routine

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Some of the per-cpu counters and thus their locks are accessed from IRQ
    contexts. This can cause a deadlock if it interrupts a cpu-offline thread
    which is transferring a dead-cpu's counts to the global counter.

    Add appropriate IRQ protection in the cpu-hotplug callback path.

    Signed-off-by: Gautham R Shenoy
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     
  • cpu-hot-add should be fail if cpu is not set in cpu_possible_map. If go
    ahead, the system will panic soon.

    Especially, arch which requires additional_cpus= parameter should handle
    this. Tested on ia64.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have
    been running on that cpu.

    Currently, such a task is migrated:
    1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
    2) to any cpu which is both online and among that task's cpus_allowed

    It is typical of a multithreaded application running on a large NUMA system to
    have its tasks confined to a cpuset so as to cluster them near the memory that
    they share. Furthermore, it is typical to explicitly place such a task on a
    specific cpu in that cpuset. And in that case the task's cpus_allowed
    includes only a single cpu.

    This patch would insert a preference to migrate such a task to some cpu within
    its cpuset (and set its cpus_allowed to its entire cpuset).

    With this patch, migrate the task to:
    1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
    2) to any online cpu within the task's cpuset
    3) to any cpu which is both online and among that task's cpus_allowed

    In order to do this, move_task_off_dead_cpu() must make a call to
    cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will
    not block. (name change - per Oleg's suggestion)

    Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set
    the cpuset mutex during the whole migrate_live_tasks() and
    migrate_dead_tasks() procedure.

    [akpm@linux-foundation.org: build fix]
    [pj@sgi.com: Fix indentation and spacing]
    Signed-off-by: Cliff Wickman
    Cc: Oleg Nesterov
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Cc: Ingo Molnar
    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cliff Wickman
     
  • Replace "cont" with "cgrp" and other misc renaming

    This patch finishes some of the names that got missed in the great
    "task containers" -> "control groups" rename. Primarily it renames
    the local variable "cont" to "cgrp" in a number of places, and renames
    the CONT_* enum members to CGRP_*.

    This patch is not intended to have any effect on the generated code;
    the output of "objdump -d kernel/cgroup.o" is unchanged.

    Signed-off-by: Paul Menage
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • There are two places that do so - the cgroups subsystem and the autofs
    code.

    Signed-off-by: Pavel Emelyanov
    Cc: Ian Kent
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The sync_master_pid and sync_backup_pid are set in set_sync_pid() and are
    used later for set/not-set checks and in printk. So it is safe to use the
    global pid value in this case.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When removing the explicit task_struct->pid usage I found that
    proc_readfd_common() and proc_pident_readdir() get this field, but do not
    use it at all. So this cleanup is a cheap help with the task_struct->pid
    isolation.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • One of the easiest things to isolate is the pid printed in kernel log.
    There was a patch, that made this for arch-independent code, this one makes
    so for arch/xxx files.

    It took some time to cross-compile it, but hopefully these are all the
    printks in arch code.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Pavel Emelyanov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The task_struct->pid member is going to be deprecated, so start
    using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
    the kernel.

    The first thing to start with is the pid, printed to dmesg - in
    this case we may safely use task_pid_nr(). Besides, printks produce
    more (much more) than a half of all the explicit pid usage.

    [akpm@linux-foundation.org: git-drm went and changed lots of stuff]
    Signed-off-by: Pavel Emelyanov
    Cc: Dave Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The pgrp field is not used widely around the kernel so it is now marked as
    deprecated with appropriate comment.

    The initialization of INIT_SIGNALS is trimmed because
    a) they are set to 0 automatically;
    b) gcc cannot properly initialize two anonymous (the second one
    is the one with the session) unions. In this particular case
    to make it compile we'd have to add some field initialized
    right before the .pgrp.

    This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
    (from Cedric), but for the pgrp field.

    Some progress report:

    We have to deprecate the pid, tgid, session and pgrp fields on struct
    task_struct and struct signal_struct. The session and pgrp are already
    deprecated. The tgid value is close to being such - the worst known usage
    in in fs/locks.c and audit code. The pid field deprecation is mainly
    blocked by numerous printk-s around the kernel that print the tsk->pid to
    log.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD. A non-zero test
    is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing
    tsk->exit_state is sufficient.

    Signed-off-by: Eugene Teo
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eugene Teo
     
  • Currently, there exists no method for a process to query the resource
    limits of another process. They can be inferred via some mechanisms but
    they cannot be explicitly determined. Given that this information can be
    usefull to know during the debugging of an application, I've written this
    patch which exports all of a processes limits via /proc//limits.

    Signed-off-by: Neil Horman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • remove BITS_TO_TYPE macro

    I realized, that it is actually the same as DIV_ROUND_UP, use it instead.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • FlashPoint, use BIT instead of BITW

    BITW was an ushort variant of BIT, use BIT instead

    Signed-off-by: Jiri Slaby
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • define global BIT macro

    move all local BIT defines to the new globally define macro.

    Signed-off-by: Jiri Slaby
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Kumar Gala
    Cc: Dmitry Torokhov
    Cc: Jeff Garzik
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Cc: Russell King
    Acked-by: Ralf Baechle
    Cc: "John W. Linville"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • get rid of input BIT* duplicate defines

    use newly global defined macros for input layer. Also remove includes of
    input.h from non-input sources only for BIT macro definiton. Define the
    macro temporarily in local manner, all those local definitons will be
    removed further in this patchset (to not break bisecting).
    BIT macro will be globally defined (1<
    Cc:
    Acked-by: Jiri Kosina
    Cc:
    Acked-by: Marcel Holtmann
    Cc:
    Acked-by: Mauro Carvalho Chehab
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • define first set of BIT* macros

    - move BITOP_MASK and BITOP_WORD from asm-generic/bitops/atomic.h to
    include/linux/bitops.h and rename it to BIT_MASK and BIT_WORD
    - move BITS_TO_LONGS and BITS_PER_BYTE to bitops.h too and allow easily
    define another BITS_TO_something (e.g. in event.c) by BITS_TO_TYPE macro
    Remaining (and common) BIT macro will be defined after all occurences and
    conflicts will be sorted out in the patches.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • amba-pl011, rename BIT macro

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • s2io, rename BIT macro

    BIT macro will be global definiton of (1<
    Cc: Ramkrishna Vepa
    Cc: Rastapur Santosh
    Cc: Sivakumar Subramani
    Cc: Sreenivasa Honnur
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • i2c-pxa, rename BIT macro to PXA_BIT

    BIT macro will be global definiton of (1 << x)

    Signed-off-by: Jiri Slaby
    Cc: Nicolas Pitre
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • This patch fixes errors and warnings pointed out by the checkpatch.pl
    script.

    Antonino Daplas replaced BIT with ENCODE_BIT.

    Signed-off-by: Krzysztof Helt
    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Helt
     
  • cyber2000fb, rename BIT macro

    BIT will be global macro for (1 << x)

    Signed-off-by: Jiri Slaby
    Cc: Russell King
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • forbid asm/bitops.h direct inclusion

    Because of compile errors that may occur after bit changes if asm/bitops.h is
    included directly without e.g. linux/kernel.h which includes linux/bitops.h,
    forbid direct inclusion of asm/bitops.h. Thanks to Adrian Bunk.

    Signed-off-by: Jiri Slaby
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • remove asm/bitops.h includes

    including asm/bitops directly may cause compile errors. don't include it
    and include linux/bitops instead. next patch will deny including asm header
    directly.

    Cc: Adrian Bunk
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • fs/select, remove unused macros

    this is due to preparation for global BIT macro

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • This new version guarantees amb_bit switch in small enough intervals, so that
    the device won't stop working in the middle of a movement anymore. However it
    preserves old (openhaptics) functionality.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Wait after disabling device's interrupt until the handler finishes its work if
    still in progress.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks:

    - collect batches of tasks under tasklist_lock and then call
    set_cpus_allowed() on them outside the lock (since this can sleep).

    - add a simple generic priority heap type to allow efficient collection
    of batches of tasks to be processed without duplicating or missing any
    tasks in subsequent batches.

    - make "cpus" file update a no-op if the mask hasn't changed

    - fix race between update_cpumask() and sched_setaffinity() by making
    sched_setaffinity() post-check that it's not running on any cpus outside
    cpuset_cpus_allowed().

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Paul Menage
    Cc: Paul Jackson
    Cc: David Rientjes
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Balbir Singh
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Decrustify the kernel/cpuset.c 'cpus' and 'mems' updating code.

    Other than subtle improvements in the consistency of identifying
    white space at the beginning and end of passed in masks, this
    doesn't make any visible difference in behaviour. But it's
    one or two hundred kernel text bytes smaller, and easier to
    understand.

    [akpm@linux-foundation.org: coding-style fix]
    Signed-off-by: Paul Jackson
    Reviewed-by: Paul Menage
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Add a new per-cpuset flag called 'sched_load_balance'.

    When enabled in a cpuset (the default value) it tells the kernel scheduler
    that the scheduler should provide the normal load balancing on the CPUs in
    that cpuset, sometimes moving tasks from one CPU to a second CPU if the
    second CPU is less loaded and if that task is allowed to run there.

    When disabled (write "0" to the file) then it tells the kernel scheduler
    that load balancing is not required for the CPUs in that cpuset.

    Now even if this flag is disabled for some cpuset, the kernel may still
    have to load balance some or all the CPUs in that cpuset, if some
    overlapping cpuset has its sched_load_balance flag enabled.

    If there are some CPUs that are not in any cpuset whose sched_load_balance
    flag is enabled, the kernel scheduler will not load balance tasks to those
    CPUs.

    Moreover the kernel will partition the 'sched domains' (non-overlapping
    sets of CPUs over which load balancing is attempted) into the finest
    granularity partition that it can find, while still keeping any two CPUs
    that are in the same shed_load_balance enabled cpuset in the same element
    of the partition.

    This serves two purposes:
    1) It provides a mechanism for real time isolation of some CPUs, and
    2) it can be used to improve performance on systems with many CPUs
    by supporting configurations in which load balancing is not done
    across all CPUs at once, but rather only done in several smaller
    disjoint sets of CPUs.

    This mechanism replaces the earlier overloading of the per-cpuset
    flag 'cpu_exclusive', which overloading was removed in an earlier
    patch: cpuset-remove-sched-domain-hooks-from-cpusets

    See further the Documentation and comments in the code itself.

    [akpm@linux-foundation.org: don't be weird]
    Signed-off-by: Paul Jackson
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Since these are expanded into call to pid_nr_ns() anyway, it's OK to move
    the whole routine out-of-line. This is a cheap way to save ~100 bytes from
    vmlinux. Together with the previous two patches, it saves half-a-kilo from
    the vmlinux.

    Un-inline other (currently inlined) functions must be done with additional
    performance testing.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its
    id, depending on whic id - global or virtual - is used.

    The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the
    stack to call another function - find_pid_ns(). It turned out, that this
    dereference together with the push itself cause the kernel text size to
    grow too much.

    Move all these out-of-line. Together with the previous patch this saves a
    bit less that 400 bytes from .text section.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • With pid namespaces this field is now dangerous to use explicitly, so hide
    it behind the helpers.

    Also the pid and pgrp fields o task_struct and signal_struct are to be
    deprecated. Unfortunately this patch cannot be sent right now as this
    leads to tons of warnings, so start isolating them, and deprecate later.

    Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
    but Oleg pointed out that in case of posix cpu timers this is the same, and
    thread_group_leader() is more preferable.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Since we've switched from using pid->nr to pid->upids->nr some
    fields on struct pid are no longer needed

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The find_task_by_something is a set of macros are used to find task by pid
    depending on what kind of pid is proposed - global or virtual one. All of
    them are wrappers above the most generic one - find_task_by_pid_type_ns() -
    and just substitute some args for it.

    It turned out, that dereferencing the current->nsproxy->pid_ns construction
    and pushing one more argument on the stack inline cause kernel text size to
    grow.

    This patch moves all this stuff out-of-line into kernel/pid.c. Together
    with the next patch it saves a bit less than 400 bytes from the .text
    section.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is the largest patch in the set. Make all (I hope) the places where
    the pid is shown to or get from user operate on the virtual pids.

    The idea is:
    - all in-kernel data structures must store either struct pid itself
    or the pid's global nr, obtained with pid_nr() call;
    - when seeking the task from kernel code with the stored id one
    should use find_task_by_pid() call that works with global pids;
    - when showing pid's numerical value to the user the virtual one
    should be used, but however when one shows task's pid outside this
    task's namespace the global one is to be used;
    - when getting the pid from userspace one need to consider this as
    the virtual one and use appropriate task/pid-searching functions.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    [akpm@linux-foundation.org: yet nuther build fix]
    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Terminate all processes in a namespace when the reaper of the namespace is
    exiting. We do this by walking the pidmap of the namespace and sending
    SIGKILL to all processes.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Only the global-init process must be special - any other cgroup-init
    process must be killable to prevent run-away processes in the system.

    TODO: Ideally we should allow killing the cgroup-init only from parent
    cgroup and prevent it being killed from within the cgroup.
    But that is a more complex change and will be addressed by a follow-on
    patch. For now allow the cgroup-init to be terminated by any process
    with sufficient privileges.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu