Doug / smarc-fsl-linux-kernel | Embedian Git Server

24 Oct, 2007

2 commits

3bdf590ea cgroup: kill unused variable ... Browse Code »

Signed-off-by: Jeff Garzik

Jeff Garzik
2007-10-24 09:28:39 +0800
a98ce5c6f Fix synchronize_irq races with IRQ handler ... Browse Code »

As it is some callers of synchronize_irq rely on memory barriers
to provide synchronisation against the IRQ handlers. For example,
the tg3 driver does

tp->irq_sync = 1;
smp_mb();
synchronize_irq();

and then in the IRQ handler:

if (!tp->irq_sync)
netif_rx_schedule(dev, &tp->napi);

Unfortunately memory barriers only work well when they come in
pairs. Because we don't actually have memory barriers on the
IRQ path, the memory barrier before the synchronize_irq() doesn't
actually protect us.

In particular, synchronize_irq() may return followed by the
result of netif_rx_schedule being made visible.

This patch (mostly written by Linus) fixes this by using spin
locks instead of memory barries on the synchronize_irq() path.

Signed-off-by: Herbert Xu
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds

Herbert Xu
2007-10-24 00:01:31 +0800

23 Oct, 2007

3 commits

481968f44 auditsc: fix kernel-doc param warnings ... Browse Code »

Fix kernel-doc for auditsc parameter changes.

Warning(linux-2.6.23-git17//kernel/auditsc.c:1623): No description found for parameter 'dentry'
Warning(linux-2.6.23-git17//kernel/auditsc.c:1666): No description found for parameter 'dentry'

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-10-23 10:40:02 +0800
0fd56c703 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()
sched: don't clear PF_VCPU in scheduler
KVM: Improve local apic timer wraparound handling
KVM: Fix local apic timer divide by zero
KVM: Move kvm_guest_exit() after local_irq_enable()
KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3
KVM: VMX: Force vm86 mode if setting flags during real mode
KVM: x86 emulator: implement 'movnti mem, reg'
KVM: VMX: Reset mmu context when entering real mode
KVM: VMX: Handle NMIs before enabling interrupts and preemption
KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()
KVM: x86 emulator: fix repne/repnz decoding
KVM: x86 emulator: fix merge screwup due to emulator split

Linus Torvalds
2007-10-23 10:24:17 +0800
5081dba65 Fix appletalk sysctl entry name ... Browse Code »

Gabriel C reported that modprobing appletalk on current git gives a
warning in dmesg :

"sysctl table check failed: /net/appletalk .3.7 procname does not match binary path procname"

Oops. My apologies it appears I made a mistake when creating my table
to check up on sysctl values.

Signed-off-by: "Eric W. Biederman"
Tested-by: Gabriel C
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-10-23 10:15:59 +0800

22 Oct, 2007

1 commit

83d87d167 sched: don't clear PF_VCPU in scheduler ... Browse Code »

KVM clears it by itself now, and for s390 this is plain wrong.

Signed-off-by: Laurent Vivier
Acked-by: Ingo Molnar
Signed-off-by: Avi Kivity

Laurent Vivier
2007-10-22 18:03:29 +0800

21 Oct, 2007

2 commits

74c3cbe33 [PATCH] audit: watching subtrees ... Browse Code »

New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is. Limitations:
* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there. New command
tells audit to recalculate the trees, trimming such sources of false
positives.

Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.

Signed-off-by: Al Viro

Al Viro
2007-10-21 14:37:45 +0800
5a190ae69 [PATCH] pass dentry to audit_inode()/audit_inode_child() ... Browse Code »

makes caller simpler *and* allows to scan ancestors

Signed-off-by: Al Viro

Al Viro
2007-10-21 14:37:18 +0800

20 Oct, 2007

32 commits

c00046c27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
fix do_sys_open() prototype
sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
Documentation: Fix typo in SubmitChecklist.
Typo: depricated -> deprecated
Add missing profile=kvm option to Documentation/kernel-parameters.txt
fix typo about TBI in e1000 comment
proc.txt: Add /proc/stat field
small documentation fixes
Fix compiler warning in smount example program from sharedsubtree.txt
docs/sysfs: add missing word to sysfs attribute explanation
documentation/ext3: grammar fixes
Documentation/java.txt: typo and grammar fixes
Documentation/filesystems/vfs.txt: typo fix
include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
trivial copy_data_pages() tidy up
Fix typo in arch/x86/kernel/tsc_32.c
file link fix for Pegasus USB net driver help
remove unused return within void return function
Typo fixes retrun -> return
x86 hpet.h: remove broken links
...

Linus Torvalds
2007-10-20 11:36:17 +0800
c1cb8e48b sysctl: Don't compile sysctl_check when !CONFIG_SYSCTL ... Browse Code »

Weird I thought I had written the makefile so this would be handled. Oh
well this should fix it.

Sorry about that.

Signed-off-by: Eric W. Biederman
Acked-and-tested-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-10-20 09:04:22 +0800
df7c48725 trivial copy_data_pages() tidy up ... Browse Code »

Change the loop style of copy_data_pages() to remove a duplicate condition.

Signed-off-by: Fengguang Wu
Acked-by: Rafael J. Wysocki
Signed-off-by: Adrian Bunk

Fengguang Wu
2007-10-20 08:26:04 +0800
6506f2aa6 fix comment: unlock_hrtimer_base is the counterpart of lock_hrtimer_base ... Browse Code »

Signed-off-by: Uwe Kleine-König
Signed-off-by: Adrian Bunk

Uwe Kleine-König
2007-10-20 07:56:53 +0800
6888c1ecd kernel/sched.c: remove bogus comment from account_user_time ... Browse Code »

hardirq_offset is no longer needed.

Signed-off-by: Michael Neuling
Signed-off-by: Adrian Bunk

Michael Neuling
2007-10-20 07:41:05 +0800
9aa5e993f trivial comment wording/typo fix regarding taint flags ... Browse Code »

Signed-off-by: Daniel Roesen
Signed-off-by: Adrian Bunk

Daniel Roesen
2007-10-20 06:30:06 +0800
3a4fa0a25 Fix misspellings of "system", "controller", "interrupt" and "necessary". ... Browse Code »

Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".

Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk

Robert P. J. Day
2007-10-20 05:10:43 +0800
c4ec20717 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 ... Browse Code »

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (41 commits)
ACPICA: hw: Don't carry spinlock over suspend
ACPICA: hw: remove use_lock flag from acpi_hw_register_{read, write}
ACPI: cpuidle: port idle timer suspend/resume workaround to cpuidle
ACPI: clean up acpi_enter_sleep_state_prep
Hibernation: Make sure that ACPI is enabled in acpi_hibernation_finish
ACPI: suppress uninitialized var warning
cpuidle: consolidate 2.6.22 cpuidle branch into one patch
ACPI: thinkpad-acpi: skip blanks before the data when parsing sysfs
ACPI: AC: Add sysfs interface
ACPI: SBS: Add sysfs alarm
ACPI: SBS: Add ACPI_PROCFS around procfs handling code.
ACPI: SBS: Add support for power_supply class (and sysfs)
ACPI: SBS: Make SBS reads table-driven.
ACPI: SBS: Simplify data structures in SBS
ACPI: SBS: Split host controller (ACPI0001) from SBS driver (ACPI0002)
ACPI: EC: Add new query handler to list head.
ACPI: Add acpi_bus_generate_event4() function
ACPI: Battery: add sysfs alarm
ACPI: Battery: Add sysfs support
ACPI: Battery: Misc clean-ups, no functional changes
...

Fix up conflicts in drivers/misc/thinkpad_acpi.[ch] manually

Linus Torvalds
2007-10-20 04:12:46 +0800
a39bc5169 Uninline fork.c/exit.c ... Browse Code »

Save ~650 bytes here.

add/remove: 4/0 grow/shrink: 0/7 up/down: 430/-1088 (-658)
function old new delta
__copy_fs_struct - 202 +202
__put_fs_struct - 112 +112
__exit_fs - 58 +58
__exit_files - 58 +58
exit_files 58 2 -56
put_fs_struct 112 5 -107
exit_fs 161 2 -159
sys_unshare 774 590 -184
copy_process 4031 3840 -191
do_exit 1791 1597 -194
copy_fs_struct 202 5 -197

No difference in lmbench lat_proc tests on 2-way Opteron 246.
Smaaaal degradation on UP P4 (within errors).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan
Cc: Arjan van de Ven
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-10-20 02:53:56 +0800
a24efe62d kernel/fork.c: remove unneeded variable initialization in copy_process() ... Browse Code »

This initialization of is not needed so just remove it.

Signed-off-by: Mariusz Kozlowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mariusz Kozlowski
2007-10-20 02:53:56 +0800
8256e47cd Linux Kernel Markers ... Browse Code »

The marker activation functions sits in kernel/marker.c. A hash table is used
to keep track of the registered probes and armed markers, so the markers
within a newly loaded module that should be active can be activated at module
load time.

marker_query has been removed. marker_get_first, marker_get_next and
marker_release should be used as iterators on the markers.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mathieu Desnoyers
Acked-by: "Frank Ch. Eigler"
Cc: Christoph Hellwig
Cc: Rusty Russell
Cc: Mike Mason
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2007-10-20 02:53:54 +0800
09cadedbd Combine instrumentation menus in kernel/Kconfig.instrumentation ... Browse Code »

Quoting Randy:

"It seems sad that this patch sources Kconfig.marker, a 7-line file,
20-something times. Yes, you (we) don't want to put those 7 lines into
20-something different files, so sourcing is the right thing.

However, what you did for avr32 seems more on the right track to me: make
_one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
and MARKERS and then use (source) that in all of the arches."

Signed-off-by: Mathieu Desnoyers
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2007-10-20 02:53:54 +0800
68318b8e0 Hook up group scheduler with control groups ... Browse Code »

Enable "cgroup" (formerly containers) based fair group scheduling. This
will let administrator create arbitrary groups of tasks (using "cgroup"
pseudo filesystem) and control their cpu bandwidth usage.

[akpm@linux-foundation.org: fix cpp condition]
Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Dhaval Giani
Cc: Randy Dunlap
Cc: Balbir Singh
Cc: Paul Menage
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srivatsa Vaddagiri
2007-10-20 02:53:51 +0800
cba63c308 Extended crashkernel command line ... Browse Code »

This patch adds a extended crashkernel syntax that makes the value of reserved
system RAM dependent on the system RAM itself:

crashkernel=:[,:,...][@offset]
range=start-[end]

For example:

crashkernel=512M-2G:64M,2G-:128M

The motivation comes from distributors that configure their crashkernel
command line automatically with some configuration tool (YaST, you know ;)).
Of course that tool knows the value of System RAM, but if the user removes
RAM, then the system becomes unbootable or at least unusable and error
handling is very difficult.

This series implements this change for i386, x86_64, ia64, ppc64 and sh. That
should be all platforms that support kdump in current mainline. I tested all
platforms except sh due to the lack of a sh processor.

This patch:

This is the generic part of the patch. It adds a parse_crashkernel() function
in kernel/kexec.c that is called by the architecture specific code that
actually reserves the memory. That function takes the whole command line and
looks itself for "crashkernel=" in it.

If there are multiple occurrences, then the last one is taken. The advantage
is that if you have a bootloader like lilo or elilo which allows you to append
a command line parameter but not to remove one (like in GRUB), then you can
add another crashkernel value for testing at the boot command line and this
one overwrites the command line in the configuration then.

Signed-off-by: Bernhard Walle
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Vivek Goyal
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bernhard Walle
2007-10-20 02:53:49 +0800
73e753a50 CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified ... Browse Code »

cpu-hot-add should be fail if cpu is not set in cpu_possible_map. If go
ahead, the system will panic soon.

Especially, arch which requires additional_cpus= parameter should handle
this. Tested on ia64.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-10-20 02:53:44 +0800
470fd6464 hotplug cpu: migrate a task within its cpuset ... Browse Code »

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have
been running on that cpu.

Currently, such a task is migrated:
1) to any cpu on the same node as the disabled cpu, which is both online
and among that task's cpus_allowed
2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system to
have its tasks confined to a cpuset so as to cluster them near the memory that
they share. Furthermore, it is typical to explicitly place such a task on a
specific cpu in that cpuset. And in that case the task's cpus_allowed
includes only a single cpu.

This patch would insert a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
1) to any cpu on the same node as the disabled cpu, which is both online
and among that task's cpus_allowed
2) to any online cpu within the task's cpuset
3) to any cpu which is both online and among that task's cpus_allowed

In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will
not block. (name change - per Oleg's suggestion)

Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set
the cpuset mutex during the whole migrate_live_tasks() and
migrate_dead_tasks() procedure.

[akpm@linux-foundation.org: build fix]
[pj@sgi.com: Fix indentation and spacing]
Signed-off-by: Cliff Wickman
Cc: Oleg Nesterov
Cc: Christoph Lameter
Cc: Paul Jackson
Cc: Ingo Molnar
Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cliff Wickman
2007-10-20 02:53:44 +0800
bd89aabc6 Control groups: Replace "cont" with "cgrp" and other misc renaming ... Browse Code »

Replace "cont" with "cgrp" and other misc renaming

This patch finishes some of the names that got missed in the great
"task containers" -> "control groups" rename. Primarily it renames
the local variable "cont" to "cgrp" in a number of places, and renames
the CONT_* enum members to CGRP_*.

This patch is not intended to have any effect on the generated code;
the output of "objdump -d kernel/cgroup.o" is unchanged.

Signed-off-by: Paul Menage
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:43 +0800
69cccb887 Use task_pid_nr() instead of pid_nr(task_pid()) ... Browse Code »

There are two places that do so - the cgroups subsystem and the autofs
code.

Signed-off-by: Pavel Emelyanov
Cc: Ian Kent
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:43 +0800
ba25f9dcc Use helpers to obtain task pid in printks ... Browse Code »

The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov
Cc: Dave Airlie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:43 +0800
9a2e70572 Isolate the explicit usage of signal->pgrp ... Browse Code »

The pgrp field is not used widely around the kernel so it is now marked as
deprecated with appropriate comment.

The initialization of INIT_SIGNALS is trimmed because
a) they are set to 0 automatically;
b) gcc cannot properly initialize two anonymous (the second one
is the one with the session) unions. In this particular case
to make it compile we'd have to add some field initialized
right before the .pgrp.

This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
(from Cedric), but for the pgrp field.

Some progress report:

We have to deprecate the pid, tgid, session and pgrp fields on struct
task_struct and struct signal_struct. The session and pgrp are already
deprecated. The tgid value is close to being such - the worst known usage
in in fs/locks.c and audit code. The pid field deprecation is mainly
blocked by numerous printk-s around the kernel that print the tsk->pid to
log.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Cedric Le Goater
Cc: Serge Hallyn
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:43 +0800
270f722d4 Fix tsk->exit_state usage ... Browse Code »

tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD. A non-zero test
is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing
tsk->exit_state is sufficient.

Signed-off-by: Eugene Teo
Cc: Roland McGrath
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eugene Teo
2007-10-20 02:53:42 +0800
8707d8b8c Fix cpusets update_cpumask ... Browse Code »

Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks:

- collect batches of tasks under tasklist_lock and then call
set_cpus_allowed() on them outside the lock (since this can sleep).

- add a simple generic priority heap type to allow efficient collection
of batches of tasks to be processed without duplicating or missing any
tasks in subsequent batches.

- make "cpus" file update a no-op if the mask hasn't changed

- fix race between update_cpumask() and sched_setaffinity() by making
sched_setaffinity() post-check that it's not running on any cpus outside
cpuset_cpus_allowed().

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: David Rientjes
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Balbir Singh
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:41 +0800
020958b62 cpusets: decrustify cpuset mask update code ... Browse Code »

Decrustify the kernel/cpuset.c 'cpus' and 'mems' updating code.

Other than subtle improvements in the consistency of identifying
white space at the beginning and end of passed in masks, this
doesn't make any visible difference in behaviour. But it's
one or two hundred kernel text bytes smaller, and easier to
understand.

[akpm@linux-foundation.org: coding-style fix]
Signed-off-by: Paul Jackson
Reviewed-by: Paul Menage
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2007-10-20 02:53:41 +0800
029190c51 cpuset sched_load_balance flag ... Browse Code »

Add a new per-cpuset flag called 'sched_load_balance'.

When enabled in a cpuset (the default value) it tells the kernel scheduler
that the scheduler should provide the normal load balancing on the CPUs in
that cpuset, sometimes moving tasks from one CPU to a second CPU if the
second CPU is less loaded and if that task is allowed to run there.

When disabled (write "0" to the file) then it tells the kernel scheduler
that load balancing is not required for the CPUs in that cpuset.

Now even if this flag is disabled for some cpuset, the kernel may still
have to load balance some or all the CPUs in that cpuset, if some
overlapping cpuset has its sched_load_balance flag enabled.

If there are some CPUs that are not in any cpuset whose sched_load_balance
flag is enabled, the kernel scheduler will not load balance tasks to those
CPUs.

Moreover the kernel will partition the 'sched domains' (non-overlapping
sets of CPUs over which load balancing is attempted) into the finest
granularity partition that it can find, while still keeping any two CPUs
that are in the same shed_load_balance enabled cpuset in the same element
of the partition.

This serves two purposes:
1) It provides a mechanism for real time isolation of some CPUs, and
2) it can be used to improve performance on systems with many CPUs
by supporting configurations in which load balancing is not done
across all CPUs at once, but rather only done in several smaller
disjoint sets of CPUs.

This mechanism replaces the earlier overloading of the per-cpuset
flag 'cpu_exclusive', which overloading was removed in an earlier
patch: cpuset-remove-sched-domain-hooks-from-cpusets

See further the Documentation and comments in the code itself.

[akpm@linux-foundation.org: don't be weird]
Signed-off-by: Paul Jackson
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2007-10-20 02:53:41 +0800
2f2a3a46f Uninline the task_xid_nr_ns() calls ... Browse Code »

Since these are expanded into call to pid_nr_ns() anyway, it's OK to move
the whole routine out-of-line. This is a cheap way to save ~100 bytes from
vmlinux. Together with the previous two patches, it saves half-a-kilo from
the vmlinux.

Un-inline other (currently inlined) functions must be done with additional
performance testing.

Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:41 +0800
8990571eb Uninline find_pid etc set of functions ... Browse Code »

The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its
id, depending on whic id - global or virtual - is used.

The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the
stack to call another function - find_pid_ns(). It turned out, that this
dereference together with the push itself cause the kernel text size to
grow too much.

Move all these out-of-line. Together with the previous patch this saves a
bit less that 400 bytes from .text section.

Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:41 +0800
bac0abd61 Isolate some explicit usage of task->tgid ... Browse Code »

With pid namespaces this field is now dangerous to use explicitly, so hide
it behind the helpers.

Also the pid and pgrp fields o task_struct and signal_struct are to be
deprecated. Unfortunately this patch cannot be sent right now as this
leads to tons of warnings, so start isolating them, and deprecate later.

Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
but Oleg pointed out that in case of posix cpu timers this is the same, and
thread_group_leader() is more preferable.

Signed-off-by: Pavel Emelyanov
Acked-by: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
19b9b9b54 pid namespaces: remove the struct pid unneeded fields ... Browse Code »

Since we've switched from using pid->nr to pid->upids->nr some
fields on struct pid are no longer needed

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
228ebcbe6 Uninline find_task_by_xxx set of functions ... Browse Code »

The find_task_by_something is a set of macros are used to find task by pid
depending on what kind of pid is proposed - global or virtual one. All of
them are wrappers above the most generic one - find_task_by_pid_type_ns() -
and just substitute some args for it.

It turned out, that dereferencing the current->nsproxy->pid_ns construction
and pushing one more argument on the stack inline cause kernel text size to
grow.

This patch moves all this stuff out-of-line into kernel/pid.c. Together
with the next patch it saves a bit less than 400 bytes from the .text
section.

Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
b488893a3 pid namespaces: changes to show virtual ids to user ... Browse Code »

This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Alexey Dobriyan
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
3eb07c8c8 pid namespaces: destroy pid namespace on init's death ... Browse Code »

Terminate all processes in a namespace when the reaper of the namespace is
exiting. We do this by walking the pidmap of the namespace and sending
SIGKILL to all processes.

Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-10-20 02:53:40 +0800
0fbc26a6c pid namespaces: allow signalling cgroup-init ... Browse Code »

Only the global-init process must be special - any other cgroup-init
process must be killable to prevent run-away processes in the system.

TODO: Ideally we should allow killing the cgroup-init only from parent
cgroup and prevent it being killed from within the cgroup.
But that is a more complex change and will be addressed by a follow-on
patch. For now allow the cgroup-init to be terminated by any process
with sufficient privileges.

Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-10-20 02:53:40 +0800