20 Oct, 2007

40 commits

  • Begin infrastructure for kernel code samples in the samples/ directory.
    Add its Kconfig and Kbuild files.
    Source its Kconfig file in all arch/ Kconfigs.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • The marker activation functions sits in kernel/marker.c. A hash table is used
    to keep track of the registered probes and armed markers, so the markers
    within a newly loaded module that should be active can be activated at module
    load time.

    marker_query has been removed. marker_get_first, marker_get_next and
    marker_release should be used as iterators on the markers.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Mathieu Desnoyers
    Acked-by: "Frank Ch. Eigler"
    Cc: Christoph Hellwig
    Cc: Rusty Russell
    Cc: Mike Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Quoting Randy:

    "It seems sad that this patch sources Kconfig.marker, a 7-line file,
    20-something times. Yes, you (we) don't want to put those 7 lines into
    20-something different files, so sourcing is the right thing.

    However, what you did for avr32 seems more on the right track to me: make
    _one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
    and MARKERS and then use (source) that in all of the arches."

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Prior to use struct marker in the linux kernel markers, we need to clean
    two drivers which use this structure name.

    Change bonding driver types :
    - struct marker to struct bond_marker.
    - marker_t to bond_marker_t.
    - marker_header to bond_marker_header.
    - marker_header_t to bond_marker_header_t.

    Change qla4xxx struct marker_entry usage :
    - Change struct marker_entry for struct qla4_marker_entry.

    Signed-off-by: Mathieu Desnoyers
    Cc: Chad Tindel
    Cc: Jay Vosburgh
    Cc: David Somayajulu
    Cc: James Bottomley
    Cc: Ravi Anand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Enable "cgroup" (formerly containers) based fair group scheduling. This
    will let administrator create arbitrary groups of tasks (using "cgroup"
    pseudo filesystem) and control their cpu bandwidth usage.

    [akpm@linux-foundation.org: fix cpp condition]
    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Cc: Randy Dunlap
    Cc: Balbir Singh
    Cc: Paul Menage
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa Vaddagiri
     
  • This adds the documentation for the extended crashkernel syntax into
    Documentation/kdump/kdump.txt.

    Signed-off-by: Bernhard Walle
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • This patch removes the crashkernel parsing from arch/sh/kernel/machine_kexec.c
    and calls the generic function, introduced in the generic patch, in
    setup_bootmem_allocator().

    This is necessary because the amount of System RAM must be known in this
    function now because of the new syntax.

    Signed-off-by: Bernhard Walle
    Acked-by: Paul Mundt
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • This patch adapts the ppc64 code to use the generic parse_crashkernel()
    function introduced in the generic patch of that series.

    Signed-off-by: Bernhard Walle
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • This patch adapts IA64 to use the generic parse_crashkernel() function instead
    of its own parsing for the crashkernel command line.

    Because the total amount of System RAM must be known when calling this
    function, efi_memmap_init() is modified to return its accumulated total_memory
    variable.

    Also, the crashkernel handling is moved in an own function in
    arch/ia64/kernel/setup.c to make the code more readable.

    [kamalesh@linux.vnet.ibm.com: build fix]
    Signed-off-by: Bernhard Walle
    Cc: "Luck, Tony"
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Kamalesh Babulal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • This patch removes the crashkernel parsing from
    arch/x86_64/kernel/machine_kexec.c and calls the generic function, introduced
    in the last patch, in setup_bootmem_allocator().

    This is necessary because the amount of System RAM must be known in this
    function now because of the new syntax.

    Signed-off-by: Bernhard Walle
    Cc: Andi Kleen
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • This patch removes the crashkernel parsing from
    arch/i386/kernel/machine_kexec.c and calls the generic function, introduced in
    the last patch, in setup_bootmem_allocator().

    This is necessary because the amount of System RAM must be known in this
    function now because of the new syntax.

    Signed-off-by: Bernhard Walle
    Cc: Andi Kleen
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • This patch adds a extended crashkernel syntax that makes the value of reserved
    system RAM dependent on the system RAM itself:

    crashkernel=:[,:,...][@offset]
    range=start-[end]

    For example:

    crashkernel=512M-2G:64M,2G-:128M

    The motivation comes from distributors that configure their crashkernel
    command line automatically with some configuration tool (YaST, you know ;)).
    Of course that tool knows the value of System RAM, but if the user removes
    RAM, then the system becomes unbootable or at least unusable and error
    handling is very difficult.

    This series implements this change for i386, x86_64, ia64, ppc64 and sh. That
    should be all platforms that support kdump in current mainline. I tested all
    platforms except sh due to the lack of a sh processor.

    This patch:

    This is the generic part of the patch. It adds a parse_crashkernel() function
    in kernel/kexec.c that is called by the architecture specific code that
    actually reserves the memory. That function takes the whole command line and
    looks itself for "crashkernel=" in it.

    If there are multiple occurrences, then the last one is taken. The advantage
    is that if you have a bootloader like lilo or elilo which allows you to append
    a command line parameter but not to remove one (like in GRUB), then you can
    add another crashkernel value for testing at the boot command line and this
    one overwrites the command line in the configuration then.

    Signed-off-by: Bernhard Walle
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • With the use of idr to store the ipc, the case where the idr cache is
    empty, when idr_get_new is called (this may happen even if we call
    idr_pre_get() before), is not well handled: it lets
    semget()/shmget()/msgget() return ENOSPC when this cache is empty, what 1.
    does not reflect the facts and 2. does not conform to the man(s).

    This patch fixes this by retrying the whole process of allocation in this case.

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Virtualization of sysv msg queues is incomplete: msg_hdrs and msg_bytes
    variables visible from userspace are global. Let's make them
    per-namespace.

    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev
    Cc: Pierre Peiffer
    Cc: Nadia Derbey
    Cc: Serge Hallyn
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     
  • Some comments about sem_undo_list seem wrong.
    About the comment above unlock_semundo:
    "... If task2 now exits before task1 releases the lock (by calling
    unlock_semundo()), then task1 will never call spin_unlock(). ..."

    This is just wrong, I see no reason for which task1 will not call
    spin_unlock... The rest of this comment is also wrong... Unless I
    miss something (of course).

    Finally, (un)lock_semundo functions are useless, so remove them
    for simplification. (this avoids an useless if statement)

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Remvoe the unneeded parameters from ipc_checkid() and ipc_buildid()
    interfaces.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a patch that fixes the way idr_find() used to be called in ipc_lock():
    in all the paths that don't imply an update of the ipcs idr, it was called
    without the idr tree being locked.

    The changes are:
    . in ipc_ids, the mutex has been changed into a reader/writer semaphore.
    . ipc_lock() now takes the mutex as a reader during the idr_find().
    . a new routine ipc_lock_down() has been defined: it doesn't take the
    mutex, assuming that it is being held by the caller. This is the routine
    that is now called in all the update paths.

    Signed-off-by: Nadia Derbey
    Acked-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch fixes the wrong / obsolete comments in the ipc code. Also adds
    a missing lock around ipc_get_maxid() in shm_get_stat().

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a trivial patch that changes the ipc_buildid() routine into a static
    inline.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a trivial patch that changes all the (id % SEQ_MULTIPLIER) into a call
    to the ipcid_to_idx(id) macro.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch converts casts of struct kern_ipc_perm to
    . struct msg_queue
    . struct sem_array
    . struct shmid_kernel
    into the equivalent container_of() macro. It improves code maintenance
    because the code need not change if kern_ipc_perm is no longer at the
    beginning of the containing struct.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch introduces a new ipc_lock_check() routine interface:
    . each time ipc_checkid() is called, this is done after calling ipc_lock().
    ipc_checkid() is now called from inside ipc_lock_check().

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix RCU locking]
    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a trivial patch that removes the ipc_get() routine: it is replaced
    by a call to idr_find().

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch introduces a change into the sys_msgget(), sys_semget() and
    sys_shmget() routines: they now share a common code, which is better for
    maintainability.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch introduces ipcs storage into IDRs. The main changes are:
    . This ipc_ids structure is changed: the entries array is changed into a
    root idr structure.
    . The grow_ary() routine is removed: it is not needed anymore when adding
    an ipc structure, since we are now using the IDR facility.
    . The ipc_rmid() routine interface is changed:
    . there is no need for this routine to return the pointer passed in as
    argument: it is now declared as a void
    . since the id is now part of the kern_ipc_perm structure, no need to
    have it as an argument to the routine

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Some of the per-cpu counters and thus their locks are accessed from IRQ
    contexts. This can cause a deadlock if it interrupts a cpu-offline thread
    which is transferring a dead-cpu's counts to the global counter.

    Add appropriate IRQ protection in the cpu-hotplug callback path.

    Signed-off-by: Gautham R Shenoy
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     
  • cpu-hot-add should be fail if cpu is not set in cpu_possible_map. If go
    ahead, the system will panic soon.

    Especially, arch which requires additional_cpus= parameter should handle
    this. Tested on ia64.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have
    been running on that cpu.

    Currently, such a task is migrated:
    1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
    2) to any cpu which is both online and among that task's cpus_allowed

    It is typical of a multithreaded application running on a large NUMA system to
    have its tasks confined to a cpuset so as to cluster them near the memory that
    they share. Furthermore, it is typical to explicitly place such a task on a
    specific cpu in that cpuset. And in that case the task's cpus_allowed
    includes only a single cpu.

    This patch would insert a preference to migrate such a task to some cpu within
    its cpuset (and set its cpus_allowed to its entire cpuset).

    With this patch, migrate the task to:
    1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
    2) to any online cpu within the task's cpuset
    3) to any cpu which is both online and among that task's cpus_allowed

    In order to do this, move_task_off_dead_cpu() must make a call to
    cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will
    not block. (name change - per Oleg's suggestion)

    Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set
    the cpuset mutex during the whole migrate_live_tasks() and
    migrate_dead_tasks() procedure.

    [akpm@linux-foundation.org: build fix]
    [pj@sgi.com: Fix indentation and spacing]
    Signed-off-by: Cliff Wickman
    Cc: Oleg Nesterov
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Cc: Ingo Molnar
    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cliff Wickman
     
  • Replace "cont" with "cgrp" and other misc renaming

    This patch finishes some of the names that got missed in the great
    "task containers" -> "control groups" rename. Primarily it renames
    the local variable "cont" to "cgrp" in a number of places, and renames
    the CONT_* enum members to CGRP_*.

    This patch is not intended to have any effect on the generated code;
    the output of "objdump -d kernel/cgroup.o" is unchanged.

    Signed-off-by: Paul Menage
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • There are two places that do so - the cgroups subsystem and the autofs
    code.

    Signed-off-by: Pavel Emelyanov
    Cc: Ian Kent
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The sync_master_pid and sync_backup_pid are set in set_sync_pid() and are
    used later for set/not-set checks and in printk. So it is safe to use the
    global pid value in this case.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When removing the explicit task_struct->pid usage I found that
    proc_readfd_common() and proc_pident_readdir() get this field, but do not
    use it at all. So this cleanup is a cheap help with the task_struct->pid
    isolation.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • One of the easiest things to isolate is the pid printed in kernel log.
    There was a patch, that made this for arch-independent code, this one makes
    so for arch/xxx files.

    It took some time to cross-compile it, but hopefully these are all the
    printks in arch code.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Pavel Emelyanov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The task_struct->pid member is going to be deprecated, so start
    using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
    the kernel.

    The first thing to start with is the pid, printed to dmesg - in
    this case we may safely use task_pid_nr(). Besides, printks produce
    more (much more) than a half of all the explicit pid usage.

    [akpm@linux-foundation.org: git-drm went and changed lots of stuff]
    Signed-off-by: Pavel Emelyanov
    Cc: Dave Airlie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The pgrp field is not used widely around the kernel so it is now marked as
    deprecated with appropriate comment.

    The initialization of INIT_SIGNALS is trimmed because
    a) they are set to 0 automatically;
    b) gcc cannot properly initialize two anonymous (the second one
    is the one with the session) unions. In this particular case
    to make it compile we'd have to add some field initialized
    right before the .pgrp.

    This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
    (from Cedric), but for the pgrp field.

    Some progress report:

    We have to deprecate the pid, tgid, session and pgrp fields on struct
    task_struct and struct signal_struct. The session and pgrp are already
    deprecated. The tgid value is close to being such - the worst known usage
    in in fs/locks.c and audit code. The pid field deprecation is mainly
    blocked by numerous printk-s around the kernel that print the tsk->pid to
    log.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD. A non-zero test
    is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing
    tsk->exit_state is sufficient.

    Signed-off-by: Eugene Teo
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eugene Teo
     
  • Currently, there exists no method for a process to query the resource
    limits of another process. They can be inferred via some mechanisms but
    they cannot be explicitly determined. Given that this information can be
    usefull to know during the debugging of an application, I've written this
    patch which exports all of a processes limits via /proc//limits.

    Signed-off-by: Neil Horman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • remove BITS_TO_TYPE macro

    I realized, that it is actually the same as DIV_ROUND_UP, use it instead.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • FlashPoint, use BIT instead of BITW

    BITW was an ushort variant of BIT, use BIT instead

    Signed-off-by: Jiri Slaby
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • define global BIT macro

    move all local BIT defines to the new globally define macro.

    Signed-off-by: Jiri Slaby
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Kumar Gala
    Cc: Dmitry Torokhov
    Cc: Jeff Garzik
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Cc: Russell King
    Acked-by: Ralf Baechle
    Cc: "John W. Linville"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby