Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

18 Jul, 2007

14 commits

86313c488 usermodehelper: Tidy up waiting ... Browse Code »

Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that. I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge
Cc: James Bottomley
Cc: Randy Dunlap
Cc: Christoph Hellwig
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Johannes Berg
Cc: Ralf Baechle
Cc: Bjorn Helgaas
Cc: Joel Becker
Cc: Tony Luck
Cc: Kay Sievers
Cc: Srivatsa Vaddagiri
Cc: Oleg Nesterov
Cc: David Howells

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800
10a0a8d4e Add common orderly_poweroff() ... Browse Code »

Various pieces of code around the kernel want to be able to trigger an
orderly poweroff. This pulls them together into a single
implementation.

By default the poweroff command is /sbin/poweroff, but it can be set
via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it
can include command-line arguments.

This patch replaces four other instances of invoking either "poweroff"
or "shutdown -h now": two sbus drivers, and acpi thermal
management.

sparc64 has its own "powerd"; still need to determine whether it should
be replaced by orderly_poweroff().

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Len Brown
Signed-off-by: Chris Wright
Cc: Andrew Morton
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Al Viro
Cc: Arnd Bergmann
Cc: David S. Miller

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800
0ab4dc922 usermodehelper: split setup from execution ... Browse Code »

Rather than having hundreds of variations of call_usermodehelper for
various pieces of usermode state which could be set up, split the
info allocation and initialization from the actual process execution.

This means the general pattern becomes:
info = call_usermodehelper_setup(path, argv, envp); /* basic state */
call_usermodehelper_(info, stuff...); /* extra state */
call_usermodehelper_exec(info, wait); /* run process and free info */

This patch introduces wrappers for all the existing calling styles for
call_usermodehelper_*, but folds their implementations into one.

Signed-off-by: Jeremy Fitzhardinge
Cc: Andi Kleen
Cc: Rusty Russell
Cc: David Howells
Cc: Bj?rn Steinbrink
Cc: Randy Dunlap

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800
6f686d3d1 kernel/auditfilter: kill bogus uninit'd-var compiler warning ... Browse Code »

Kill this warning...

kernel/auditfilter.c: In function ‘audit_receive_filter’:
kernel/auditfilter.c:1213: warning: ‘ndw’ may be used uninitialized in this function
kernel/auditfilter.c:1213: warning: ‘ndp’ may be used uninitialized in this function

...with a simplification of the code. audit_put_nd() can accept NULL
arguments, just like kfree(). It is cleaner to init two existing vars
to NULL, remove the redundant test variable 'putnd_needed' branches, and call
audit_put_nd() directly.

As a desired side effect, the warning goes away.

Signed-off-by: Jeff Garzik

Jeff Garzik
2007-07-18 04:17:59 +0800
49c13b51a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits)
KVM: Use CPU_DYING for disabling virtualization
KVM: Tune hotplug/suspend IPIs
KVM: Keep track of which cpus have virtualization enabled
SMP: Allow smp_call_function_single() to current cpu
i386: Allow smp_call_function_single() to current cpu
x86_64: Allow smp_call_function_single() to current cpu
HOTPLUG: Adapt thermal throttle to CPU_DYING
HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
HOTPLUG: Add CPU_DYING notifier
KVM: Clean up #includes
KVM: Remove kvmfs in favor of the anonymous inodes source
KVM: SVM: Reliably detect if SVM was disabled by BIOS
KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
KVM: MMU: Fix Wrong tlb flush order
KVM: VMX: Reinitialize the real-mode tss when entering real mode
KVM: Avoid useless memory write when possible
KVM: Fix x86 emulator writeback
KVM: Add support for in-kernel pio handlers
KVM: VMX: Fix interrupt checking on lightweight exit
KVM: Adds support for in-kernel mmio handlers
...

Linus Torvalds
2007-07-18 02:50:26 +0800
13c22168b destroy_workqueue() can livelock ... Browse Code »

Pointed out by Michal Schmidt .

The bug was introduced in 2.6.22 by me.

cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
->worklist becomes empty. This is live-lockable, a re-niced caller can get
CPU after wake_up() and insert a new barrier before the lower-priority
cwq->thread has a chance to clear ->current_work.

Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
We can rely on the fact that run_workqueue() won't return until it flushes
all works. So it is safe to call kthread_stop() after that, the "should
stop" request won't be noticed until run_workqueue() returns.

Signed-off-by: Oleg Nesterov
Cc: Michal Schmidt
Cc: Srivatsa Vaddagiri
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-07-18 01:23:03 +0800
9281acea6 kallsyms: make KSYM_NAME_LEN include space for trailing '\0' ... Browse Code »

KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.

This patch increments KSYM_NAME_LEN by one and updates code accordingly.

* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.

* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.

Signed-off-by: Tejun Heo
Acked-by: Paulo Marques
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2007-07-18 01:23:03 +0800
62239ac2b proper prototype for proc_nr_files() ... Browse Code »

Add a proper prototype for proc_nr_files() in include/linux/fs.h

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-07-18 01:23:03 +0800
f284ce726 PTRACE_POKEDATA consolidation ... Browse Code »

Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.

AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.

Signed-off-by: Alexey Dobriyan
Cc: Christoph Hellwig
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-07-18 01:23:03 +0800
766473231 PTRACE_PEEKDATA consolidation ... Browse Code »

Identical implementations of PTRACE_PEEKDATA go into generic_ptrace_peekdata()
function.

Signed-off-by: Alexey Dobriyan
Cc: Christoph Hellwig
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-07-18 01:23:03 +0800
bcdcd8e72 Report that kernel is tainted if there was an OOPS ... Browse Code »

If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.

Signed-off-by: Pavel Emelianov
Acked-by: Randy Dunlap
Cc:
Signed-off-by: Andrew Morton
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds

Pavel Emelianov
2007-07-18 01:23:02 +0800
831441862 Freezer: make kernel threads nonfreezable by default ... Browse Code »
13

Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki
Acked-by: Nigel Cunningham
Cc: Pavel Machek
Cc: Oleg Nesterov
Cc: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-07-18 01:23:02 +0800
94f6030ca Slab allocators: Replace explicit zeroing with __GFP_ZERO ... Browse Code »

kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past. But with __GFP_ZERO it is possible now to do zeroing
while allocating.

Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-07-18 01:23:02 +0800
396faf030 Allow huge page allocations to use GFP_HIGH_MOVABLE ... Browse Code »

Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
can be used to satisfy hugepage allocations even when the system has been
running a long time. This allows an administrator to resize the hugepage pool
at runtime depending on the size of ZONE_MOVABLE.

This patch adds a new sysctl called hugepages_treat_as_movable. When a
non-zero value is written to it, future allocations for the huge page pool
will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
introduce additional external fragmentation of note as huge pages are always
the largest contiguous block we care about.

[akpm@linux-foundation.org: various fixes]
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2007-07-18 01:22:59 +0800

17 Jul, 2007

26 commits

7713a7d19 [HRTIMER] Fix cpu pointer arg to clockevents_notify() ... Browse Code »

All of the clockevent notifiers expect a pointer to
an "unsigned int" cpu argument, but hrtimer_cpu_notify()
passes in a pointer to a long.

[ Discussed with and ok by Thomas Gleixner ]

Signed-off-by: David S. Miller
Signed-off-by: Linus Torvalds

David Miller
2007-07-17 08:29:56 +0800
7144521f5 Remove duplicate comments from sysctl.c ... Browse Code »

Randy Dunlap noticed that the recent comment clarifications from Andrew
had somehow gotten duplicated. Quoth Andrew: "hm, that could have been
some late-night reject-fixing."

Fix it up.

Cc: From: Andrew Morton
Cc: Randy Dunlap
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-07-17 02:50:38 +0800
10b275ddf Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
[PATCH] sched: fix up fs/proc/array.c whitespace problems
[PATCH] sched: prettify prio_to_wmult[]
[PATCH] sched: document prio_to_wmult[]
[PATCH] sched: improve weight-array comments
[PATCH] sched: remove dead code from task_stime()

Fixed up trivial conflict in fs/proc/array.c

Linus Torvalds
2007-07-17 02:02:49 +0800
1492192b4 kernel/printk.c: document possible deadlock against scheduler ... Browse Code »

kernel/printk.c: document possible deadlock against scheduler

The printk's comment states that it can be called from every context,
which might lead to false illusion that it could be called from everywhere
without any restrictions.

This is however not true - a call to printk() could deadlock if called from
scheduler code (namely from schedule(), wake_up(), etc) on runqueue lock
when it tries to wake up klogd. Document this.

Signed-off-by: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Kosina
2007-07-17 00:05:52 +0800
24da1cbff modules: remove modlist_lock ... Browse Code »

Now we always use stop_machine for module insertion or deletion, we no
longer need the modlist_lock: merely disabling preemption is sufficient to
block against list manipulation. This avoids deadlock on OOPSen where we
can potentially grab the lock twice.

Bug: 8695
Signed-off-by: Rusty Russell
Cc: Ingo Molnar
Cc: Tobias Oed
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-17 00:05:51 +0800
1f1f642e2 make cancel_xxx_work_sync() return a boolean ... Browse Code »

Change cancel_work_sync() and cancel_delayed_work_sync() to return a boolean
indicating whether the work was actually cancelled. A zero return value means
that the work was not pending/queued.

Without that kind of change it is not possible to avoid flush_workqueue()
sometimes, see the next patch as an example.

Also, this patch unifies both functions and kills the (unlikely) busy-wait
loop.

Signed-off-by: Oleg Nesterov
Acked-by: Jarek Poplawski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-07-17 00:05:51 +0800
f5a421a45 rename cancel_rearming_delayed_work() to cancel_delayed_work_sync() ... Browse Code »

Imho, the current naming of cancel_xxx workqueue functions is very confusing.

cancel_delayed_work()
cancel_rearming_delayed_work()
cancel_rearming_delayed_workqueue() // obsolete

cancel_work_sync()

This looks as if the first 2 functions differ in "type" of their argument
which is not true any longer, nowadays the difference is the behaviour.

The semantics of cancel_rearming_delayed_work(dwork) was changed
significantly, it doesn't require that dwork rearms itself, and cancels dwork
synchronously.

Rename it to cancel_delayed_work_sync(). This matches cancel_delayed_work()
and cancel_work_sync(). Re-create cancel_rearming_delayed_work() as a simple
inline obsolete wrapper, like cancel_rearming_delayed_workqueue().

Signed-off-by: Oleg Nesterov
Acked-by: Jarek Poplawski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-07-17 00:05:51 +0800
f84d5a76c is_power_of_2: kernel/kfifo.c ... Browse Code »

Replace (n & (n-1)) with is_power_of_2()

Signed-off-by: vignesh babu
Acked-by: Stelian Pop
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

vignesh babu
2007-07-17 00:05:50 +0800
cf99abace make seccomp zerocost in schedule ... Browse Code »

This follows a suggestion from Chuck Ebbert on how to make seccomp
absolutely zerocost in schedule too. The only remaining footprint of
seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
even a few kbytes) larger, measure it if you care in the embedded.

Signed-off-by: Andrea Arcangeli
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2007-07-17 00:05:50 +0800
1d9d02fee move seccomp from /proc to a prctl ... Browse Code »

This reduces the memory footprint and it enforces that only the current
task can enable seccomp on itself (this is a requirement for a
strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2007-07-17 00:05:50 +0800
19769b762 sprint_symbol() cleanup ... Browse Code »

Remove pointless `else'.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-17 00:05:50 +0800
2be7fe075 sysctl.c: add text telling people to use CTL_UNNUMBERED ... Browse Code »

Hopefully this will help people to understand the new regime.

Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-17 00:05:49 +0800
36cf3b5c3 FUTEX: Tidy up the code ... Browse Code »

The recent PRIVATE and REQUEUE_PI changes to the futex code made it hard to
read. Tidy it up.

Signed-off-by: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2007-07-17 00:05:49 +0800
4e44f3497 sys_time() speedup ... Browse Code »

Improve performance of sys_time(). sys_time() returns time in seconds, but
it does so by calling do_gettimeofday() and then returning the tv_sec
portion of the GTOD time. But the data structure "xtime", which is updated
by every timer/scheduler tick, already offers HZ granularity time.

The patch improves the sysbench OLTP macrobenchmark significantly:

2.6.22-rc6:

#threads
1: transactions: 3733 (373.21 per sec.)
2: transactions: 6676 (667.46 per sec.)
3: transactions: 6957 (695.50 per sec.)
4: transactions: 7055 (705.48 per sec.)
5: transactions: 6596 (659.33 per sec.)

2.6.22-rc6 + sys_time.patch:

1: transactions: 4005 (400.47 per sec.)
2: transactions: 7379 (737.77 per sec.)
3: transactions: 7347 (734.49 per sec.)
4: transactions: 7468 (746.65 per sec.)
5: transactions: 7428 (742.47 per sec.)

Mixed API uses of gettimeofday() and time() are guaranteed to be coherent
via the use of a at-most-once-per-second slowpath that updates xtime.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Ingo Molnar
Cc: John Stultz
Cc: Thomas Gleixner
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2007-07-17 00:05:48 +0800
213dd266d namespace: ensure clone_flags are always stored in an unsigned long ... Browse Code »

While working on unshare support for the network namespace I noticed we
were putting clone flags in an int. Which is weird because the syscall
uses unsigned long and we at least need an unsigned to properly hold all of
the unshare flags.

So to make the code consistent, this patch updates the code to use
unsigned long instead of int for the clone flags in those places
where we get it wrong today.

Signed-off-by: Eric W. Biederman
Acked-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-07-17 00:05:48 +0800
b716395e2 diskquota: 32bit quota tools on 64bit architectures ... Browse Code »

OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
32/64bit clean, enable it for 32bit" However this isn't right. Look at
if_dqblk structure:

struct if_dqblk {
__u64 dqb_bhardlimit;
__u64 dqb_bsoftlimit;
__u64 dqb_curspace;
__u64 dqb_ihardlimit;
__u64 dqb_isoftlimit;
__u64 dqb_curinodes;
__u64 dqb_btime;
__u64 dqb_itime;
__u32 dqb_valid;
};

For 32 bit quota tools sizeof(if_dqblk) == 0x44.
But for 64 bit kernel its size is 0x48, 'cause of alignment!
Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
that handles this and related situations.

[michal.k.k.piotrowski@gmail.com: build fix]
[akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
Signed-off-by: Vasily Tarasov
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Jan Kara
Cc:
Signed-off-by: Michal Piotrowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasily Tarasov
2007-07-17 00:05:48 +0800
6d9525b52 kerneldoc fix in audit_core_dumps ... Browse Code »

Fix parameter name in audit_core_dumps for kerneldoc.

Signed-off-by: Henrik Kretzschmar
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Henrik Kretzschmar
2007-07-17 00:05:48 +0800
98c0d07cb add a kmem_cache for nsproxy objects ... Browse Code »

It should improve performance in some scenarii where a lot of
these nsproxy objects are created by unsharing namespaces. This is
a typical use of virtual servers that are being created or entered.

This is also a good tool to find leaks and gather statistics on
namespace usage.

Signed-off-by: Cedric Le Goater
Cc: Herbert Poetzl
Cc: Pavel Emelianov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
467e9f4b5 fix create_new_namespaces() return value ... Browse Code »

dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
copy_utsname().

Modify create_new_namespaces() to also use the errors returned by the
copy_*_ns routines and not to systematically return ENOMEM.

[oleg@tv-sign.ru: better changelog]
Signed-off-by: Cedric Le Goater
Cc: Serge E. Hallyn
Cc: Badari Pulavarty
Cc: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
77ec739d8 user namespace: add unshare ... Browse Code »

This patch enables the unshare of user namespaces.

It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
resets the current user_struct and adds a new root user (uid == 0)

For now, unsharing the user namespace allows a process to reset its
user_struct accounting and uid 0 in the new user namespace should be contained
using appropriate means, for instance selinux

The plan, when the full support is complete (all uid checks covered), is to
keep the original user's rights in the original namespace, and let a process
become uid 0 in the new namespace, with full capabilities to the new
namespace.

Signed-off-by: Serge E. Hallyn
Signed-off-by: Cedric Le Goater
Acked-by: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-07-17 00:05:47 +0800
acce292c8 user namespace: add the framework ... Browse Code »

Basically, it will allow a process to unshare its user_struct table,
resetting at the same time its own user_struct and all the associated
accounting.

A new root user (uid == 0) is added to the user namespace upon creation.
Such root users have full privileges and it seems that theses privileges
should be controlled through some means (process capabilities ?)

The unshare is not included in this patch.

Changes since [try #4]:
- Updated get_user_ns and put_user_ns to accept NULL, and
get_user_ns to return the namespace.

Changes since [try #3]:
- moved struct user_namespace to files user_namespace.{c,h}

Changes since [try #2]:
- removed struct user_namespace* argument from find_user()

Changes since [try #1]:
- removed struct user_namespace* argument from find_user()
- added a root_user per user namespace

Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Acked-by: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
7d69a1f4a remove CONFIG_UTS_NS and CONFIG_IPC_NS ... Browse Code »

CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
deactivate the unshare of the uts and ipc namespaces and do not improve
performance.

Signed-off-by: Cedric Le Goater
Acked-by: "Serge E. Hallyn"
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Pavel Emelianov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
522ed7767 Audit: add TTY input auditing ... Browse Code »

Add TTY input auditing, used to audit system administrator's actions. This is
required by various security standards such as DCID 6/3 and PCI to provide
non-repudiation of administrator's actions and to allow a review of past
actions if the administrator seems to overstep their duties or if the system
becomes misconfigured for unknown reasons. These requirements do not make it
necessary to audit TTY output as well.

Compared to an user-space keylogger, this approach records TTY input using the
audit subsystem, correlated with other audit events, and it is completely
transparent to the user-space application (e.g. the console ioctls still
work).

TTY input auditing works on a higher level than auditing all system calls
within the session, which would produce an overwhelming amount of mostly
useless audit events.

Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs
by process with the attribute is sent to the audit subsystem by the kernel.
The audit netlink interface is extended to allow modifying the audit_tty
attribute, and to allow sending explanatory audit events from user-space (for
example, a shell might send an event containing the final command, after the
interactive command-line editing and history expansion is performed, which
might be difficult to decipher from the TTY input alone).

Because the "audit_tty" attribute is inherited across fork (), it would be set
e.g. for sshd restarted within an audited session. To prevent this, the
audit_tty attribute is cleared when a process with no open TTY file
descriptors (e.g. after daemon startup) opens a TTY.

See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
more detailed rationale document for an older version of this patch.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Miloslav Trmac
Cc: Al Viro
Cc: Alan Cox
Cc: Paul Fulghum
Cc: Casey Schaufler
Cc: Steve Grubb
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miloslav Trmac
2007-07-17 00:05:47 +0800
4f27c00bf Improve behaviour of spurious IRQ detect ... Browse Code »

Currently we handle spurious IRQ activity based upon seeing a lot of
invalid interrupts, and we clear things back on the base of lots of valid
interrupts.

Unfortunately in some cases you get legitimate invalid interrupts caused by
timing asynchronicity between the PCI bus and the APIC bus when disabling
interrupts and pulling other tricks. In this case although the spurious
IRQs are not a problem our unhandled counters didn't clear and they act as
a slow running timebomb. (This is effectively what the serial port/tty
problem that was fixed by clearing counters when registering a handler
showed up)

It's easy enough to add a second parameter - time. This means that if we
see a regular stream of harmless spurious interrupts which are not harming
processing we don't go off and do something stupid like disable the IRQ
after a month of running. OTOH lockups and performance killers show up a
lot more than 10/second

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Alan Cox
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2007-07-17 00:05:46 +0800
b663a79c1 taskstats: add context-switch counters ... Browse Code »

Make available to the user the following task and process performance
statistics:

* Involuntary Context Switches (task_struct->nivcsw)
* Voluntary Context Switches (task_struct->nvcsw)

Statistics information is available from:
1. taskstats interface (Documentation/accounting/)
2. /proc/PID/status (task only).

This data is useful for detecting hyperactivity patterns between processes.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Maxim Uvarov
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jay Lan
Cc: Jonathan Lim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Maxim Uvarov
2007-07-17 00:05:46 +0800
aa0ac3651 Remove capability.h from mm.h ... Browse Code »

I forgot to remove capability.h from mm.h while removing sched.h! This
patch remedies that, because the only inline function which was using
CAP_something was made out of line.

Cross-compile tested without regressions on:

all powerpc defconfigs
all mips defconfigs
all m68k defconfigs
all arm defconfigs
all ia64 defconfigs

alpha alpha-allnoconfig alpha-defconfig alpha-up
arm
i386 i386-allnoconfig i386-defconfig i386-up
ia64 ia64-allnoconfig ia64-defconfig ia64-up
m68k
mips
parisc parisc-allnoconfig parisc-defconfig parisc-up
powerpc powerpc-up
s390 s390-allnoconfig s390-defconfig s390-up
sparc sparc-allnoconfig sparc-defconfig sparc-up
sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up
um-x86_64
x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-07-17 00:05:45 +0800