Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

25 Jun, 2009

3 commits

c62230482 Merge branches 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro… ... Browse Code »

…/{vfs-2.6,audit-current}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
another race fix in jfs_check_acl()
Get "no acls for this inode" right, fix shmem breakage
inline functions left without protection of ifdef (acl)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL

Linus Torvalds
2009-06-25 05:17:14 +0800
3a6a6c16b audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL ... Browse Code »

Even though one cannot make use of the audit watch code without
CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that
the audit rule filtering requires that it at least be compiled.

Thus build the audit_watch code when we build auditfilter like it was
before cfcad62c74abfef83762dc05a556d21bdf3980a2

Clearly this is a point of potential future cleanup..

Reported-by: Frans Pop
Signed-off-by: Eric Paris
Signed-off-by: Al Viro

Eric Paris
2009-06-25 04:42:05 +0800
d0725992c futex: Fix the write access fault problem for real ... Browse Code »

commit 64d1304a64 (futex: setup writeable mapping for futex ops which
modify user space data) did address only half of the problem of write
access faults.

The patch was made on two wrong assumptions:

1) access_ok(VERIFY_WRITE,...) would actually check write access.

On x86 it does _NOT_. It's a pure address range check.

2) a RW mapped region can not go away under us.

That's wrong as well. Nobody can prevent another thread to call
mprotect(PROT_READ) on that region where the futex resides. If that
call hits between the get_user_pages_fast() verification and the
actual write access in the atomic region we are toast again.

The solution is to not rely on access_ok and get_user() for any write
access related fault on private and shared futexes. Instead we need to
fault it in with verification of write access.

There is no generic non destructive write mechanism which would fault
the user page in trough a #PF, but as we already know that we will
fault we can as well call get_user_pages() directly and avoid the #PF
overhead.

If get_user_pages() returns -EFAULT we know that we can not fix it
anymore and need to bail out to user space.

Remove a bunch of confusing comments on this issue as well.

Signed-off-by: Thomas Gleixner
Cc: stable@kernel.org

Thomas Gleixner
2009-06-25 03:27:35 +0800

24 Jun, 2009

10 commits

916d75761 Fix rule eviction order for AUDIT_DIR ... Browse Code »

If syscall removes the root of subtree being watched, we
definitely do not want the rules refering that subtree
to be destroyed without the syscall in question having
a chance to match them.

Signed-off-by: Al Viro

Al Viro
2009-06-24 12:02:38 +0800
9d9609851 Audit: clean up all op= output to include string quoting ... Browse Code »

A number of places in the audit system we send an op= followed by a string
that includes spaces. Somehow this works but it's just wrong. This patch
moves all of those that I could find to be quoted.

Example:

Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule
key="number2" list=4 res=0

Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule"
key="number2" list=4 res=0

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 12:00:52 +0800
35fe4d0b1 Audit: move audit_get_nd completely into audit_watch ... Browse Code »

audit_get_nd() is only used by audit_watch and could be more cleanly
implemented by having the audit watch functions call it when needed rather
than making the generic audit rule parsing code deal with those objects.

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:51:05 +0800
cfcad62c7 audit: seperate audit inode watches into a subfile ... Browse Code »

In preparation for converting audit to use fsnotify instead of inotify we
seperate the inode watching code into it's own file. This is similar to
how the audit tree watching code is already seperated into audit_tree.c

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:59 +0800
ea7ae60bf Audit: clean up audit_receive_skb ... Browse Code »

audit_receive_skb is hard to clearly parse what it is doing to the netlink
message. Clean the function up so it is easy and clear to see what is going
on.

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:40 +0800
ee080e6ce Audit: cleanup netlink mesg handling ... Browse Code »

The audit handling of netlink messages is all over the place. Clean things
up, use predetermined macros, generally make it more readable.

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:39 +0800
038cbcf65 Audit: unify the printk of an skb when auditd not around ... Browse Code »

Remove code duplication of skb printk when auditd is not around in userspace
to deal with this message.

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:37 +0800
e85188f42 Audit: dereferencing krule as if it were an audit_watch ... Browse Code »

audit_update_watch() runs all of the rules for a given watch and duplicates
them, attaches a new watch to them, and then when it finishes that process
and has called free on all of the old rules (ok maybe still inside the rcu
grace period) it proceeds to use the last element from list_for_each_entry_safe()
as if it were a krule rather than being the audit_watch which was anchoring
the list to output a message about audit rules changing.

This patch unfies the audit message from two different places into a helper
function and calls it from the correct location in audit_update_rules(). We
will now get an audit message about the config changing for each rule (with
each rules filterkey) rather than the previous garbage.

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:36 +0800
b87ce6e41 Audit: better estimation of execve record length ... Browse Code »

The audit execve record splitting code estimates the length of the message
generated. But it forgot to include the "" that wrap each string in its
estimation. This means that execve messages with lots of tiny (1-2 byte)
arguments could still cause records greater than 8k to be emitted. Simply
fix the estimate.

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:34 +0800
35aa901c0 Audit: fix audit watch use after free ... Browse Code »

When an audit watch is added to a parent the temporary watch inside the
original krule from userspace is freed. Yet the original watch is used after
the real watch was created in audit_add_rules()

Signed-off-by: Eric Paris

Eric Paris
2009-06-24 11:50:33 +0800

23 Jun, 2009

1 commit

31950eb66 mm/init: cpu_hotplug_init() must be initialized before SLAB ... Browse Code »

SLAB uses get/put_online_cpus() which use a mutex which is itself only
initialized when cpu_hotplug_init() is called. Currently we hang suring
boot in SLAB due to doing that too late.

Reported by James Bottomley and Sachin Sant (and possibly others).
Debugged by Benjamin Herrenschmidt.

This just removes the dynamic initialization of the data structures, and
replaces it with a static one, avoiding this dependency entirely, and
removing one unnecessary special initcall.

Tested-by: Sachin Sant
Tested-by: James Bottomley
Tested-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-06-23 12:18:12 +0800

21 Jun, 2009

6 commits

2453d6ff6 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq, irq.h: Fix kernel-doc warnings
genirq: fix comment to say IRQ_WAKE_THREAD

Linus Torvalds
2009-06-21 02:30:01 +0800
12e24f34c Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
perfcounter: Handle some IO return values
perf_counter: Push perf_sample_data through the swcounter code
perf_counter tools: Define and use our own u64, s64 etc. definitions
perf_counter: Close race in perf_lock_task_context()
perf_counter, x86: Improve interactions with fast-gup
perf_counter: Simplify and fix task migration counting
perf_counter tools: Add a data file header
perf_counter: Update userspace callchain sampling uses
perf_counter: Make callchain samples extensible
perf report: Filter to parent set by default
perf_counter tools: Handle lost events
perf_counter: Add event overlow handling
fs: Provide empty .set_page_dirty() aop for anon inodes
perf_counter: tools: Makefile tweaks for 64-bit powerpc
perf_counter: powerpc: Add processor back-end for MPC7450 family
perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
perf_counter: powerpc: Change how processor-specific back-ends get selected
perf_counter: powerpc: Use unsigned long for register and constraint values
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
perf_counter tools: Add and use isprint()
...

Linus Torvalds
2009-06-21 02:29:32 +0800
1eb51c33b Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix out of scope variable access in sched_slice()
sched: Hide runqueues from direct refer at source code level
sched: Remove unneeded __ref tag
sched, x86: Fix cpufreq + sched_clock() TSC scaling

Linus Torvalds
2009-06-21 01:57:40 +0800
b0b7065b6 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
tracing/urgent: warn in case of ftrace_start_up inbalance
tracing/urgent: fix unbalanced ftrace_start_up
function-graph: add stack frame test
function-graph: disable when both x86_32 and optimize for size are configured
ring-buffer: have benchmark test print to trace buffer
ring-buffer: do not grab locks in nmi
ring-buffer: add locks around rb_per_cpu_empty
ring-buffer: check for less than two in size allocation
ring-buffer: remove useless compile check for buffer_page size
ring-buffer: remove useless warn on check
ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
tracing: update sample event documentation
tracing/filters: fix race between filter setting and module unload
tracing/filters: free filter_string in destroy_preds()
ring-buffer: use commit counters for commit pointer accounting
ring-buffer: remove unused variable
ring-buffer: have benchmark test handle discarded events
ring-buffer: prevent adding write in discarded area
tracing/filters: strloc should be unsigned short
tracing/filters: operand can be negative
...

Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually

Linus Torvalds
2009-06-21 01:56:46 +0800
38df92b8c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
NOHZ: Properly feed cpufreq ondemand governor

Linus Torvalds
2009-06-21 01:51:44 +0800
d4c403834 Merge branch 'tip/tracing/urgent-1' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/rostedt/linux-2.6-trace into tracing/urgent

Ingo Molnar
2009-06-21 00:26:48 +0800

20 Jun, 2009

5 commits

3daeb4da9 Merge branch 'tip/tracing/urgent' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/rostedt/linux-2.6-trace into tracing/urgent

Ingo Molnar
2009-06-20 23:25:49 +0800
92bf309a9 perf_counter: Push perf_sample_data through the swcounter code ... Browse Code »

Push the perf_sample_data further outwards to the swcounter interface,
to abstract it away some more.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-06-20 18:30:30 +0800
9ea1a153a tracing/urgent: warn in case of ftrace_start_up inbalance ... Browse Code »

Prevent from further ftrace_start_up inbalances so that we avoid
future nop patching omissions with dynamic ftrace.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt

Frederic Weisbecker
2009-06-20 12:52:21 +0800
c85a17e22 tracing/urgent: fix unbalanced ftrace_start_up ... Browse Code »

Perfcounter reports the following stats for a wide system
profiling:

#
# (2364 samples)
#
# Overhead Symbol
# ........ ......
#
15.40% [k] mwait_idle_with_hints
8.29% [k] read_hpet
5.75% [k] ftrace_caller
3.60% [k] ftrace_call
[...]

This snapshot has been taken while neither the function tracer nor
the function graph tracer was running.
With dynamic ftrace, such results show a wrong ftrace behaviour
because all calls to ftrace_caller or ftrace_graph_caller (the patched
calls to mcount) are supposed to be patched into nop if none of those
tracers are running.

The problem occurs after the first run of the function tracer. Once we
launch it a second time, the callsites will never be nopped back,
unless you set custom filters.
For example it happens during the self tests at boot time.
The function tracer selftest runs, and then the dynamic tracing is
tested too. After that, the callsites are left un-nopped.

This is because the reset callback of the function tracer tries to
unregister two ftrace callbacks in once: the common function tracer
and the function tracer with stack backtrace, regardless of which
one is currently in use.
It then creates an unbalance on ftrace_start_up value which is expected
to be zero when the last ftrace callback is unregistered. When it
reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
command, triggering the patching into nop. But since it becomes
unbalanced, ie becomes lower than zero, if the kernel functions
are patched again (as in every further function tracer runs), they
won't ever be nopped back.

Note that ftrace_call and ftrace_graph_call are still patched back
to ftrace_stub in the off case, but not the callers of ftrace_call
and ftrace_graph_caller. It means that the tracing is well deactivated
but we waste a useless call into every kernel function.

This patch just unregisters the right ftrace_ops for the function
tracer on its reset callback and ignores the other one which is
not registered, fixing the unbalance. The problem also happens
is .30

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
Cc: stable@kernel.org

Frederic Weisbecker
2009-06-20 12:28:46 +0800
befca9677 ptrace: wait_task_zombie: do not account traced sub-threads ... Browse Code »

The bug is ancient.

If we trace the sub-thread of our natural child and this sub-thread exits,
we update parent->signal->cxxx fields. But we should not do this until
the whole thread-group exits, otherwise we account this thread (and all
other live threads) twice.

Add the task_detached() check. No need to check thread_group_empty(),
wait_consider_task()->delay_group_leader() already did this.

Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Cc: Stanislaw Gruszka
Cc: Vitaly Mayatskikh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-20 07:46:06 +0800

19 Jun, 2009

15 commits

b49a9e7e7 perf_counter: Close race in perf_lock_task_context() ... Browse Code »

perf_lock_task_context() is buggy because it can return a dead
context.

the RCU read lock in perf_lock_task_context() only guarantees
the memory won't get freed, it doesn't guarantee the object is
valid (in our case refcount > 0).

Therefore we can return a locked object that can get freed the
moment we release the rcu read lock.

perf_pin_task_context() then increases the refcount and does an
unlock on freed memory.

That increased refcount will cause a double free, in case it
started out with 0.

Ammend this by including the get_ctx() functionality in
perf_lock_task_context() (all users already did this later
anyway), and return a NULL context when the found one is
already dead.

Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-06-19 23:57:36 +0800
e5289d4a1 perf_counter: Simplify and fix task migration counting ... Browse Code »

The task migrations counter was causing rare and hard to decypher
memory corruptions under load. After a day of debugging and bisection
we found that the problem was introduced with:

3f731ca: perf_counter: Fix cpu migration counter

Turning them off fixes the crashes. Incidentally, the whole
perf_counter_task_migration() logic can be done simpler as well,
by injecting a proper sw-counter event.

This cleanup also fixed the crashes. The precise failure mode is
not completely clear yet, but we are clearly not unhappy about
having a fix ;-)

Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Corey Ashford
Cc: Marcelo Tosatti
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-06-19 19:43:12 +0800
71e308a23 function-graph: add stack frame test ... Browse Code »

In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.

An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.

This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.

There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.

This test detected the problem and pointed out where the issue was.

This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.

Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.

Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Frederic Weisbecker
Cc: Helge Deller
Cc: Kyle McMartin
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-06-19 06:40:18 +0800
eb4a03780 function-graph: disable when both x86_32 and optimize for size are configured ... Browse Code »

On x86_32, when optimize for size is set, gcc may align the frame pointer
and make a copy of the the return address inside the stack frame.
The return address that is located in the stack frame may not be
the one used to return to the calling function. This will break the
function graph tracer.

The function graph tracer replaces the return address with a jump to a hook
function that can trace the exit of the function. If it only replaces
a copy, then the hook will not be called when the function returns.
Worse yet, when the parent function returns, the function graph tracer
will return back to the location of the child function which will
easily crash the kernel with weird results.

To see the problem, when i386 is compiled with -Os we get:

c106be03: 57 push %edi
c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi
c106be08: 83 e4 e0 and $0xffffffe0,%esp
c106be0b: ff 77 fc pushl 0xfffffffc(%edi)
c106be0e: 55 push %ebp
c106be0f: 89 e5 mov %esp,%ebp
c106be11: 57 push %edi
c106be12: 56 push %esi
c106be13: 53 push %ebx
c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp
c106be1a: e8 f5 57 fb ff call c1021614

When it is compiled with -O2 instead we get:

c10896f0: 55 push %ebp
c10896f1: 89 e5 mov %esp,%ebp
c10896f3: 83 ec 28 sub $0x28,%esp
c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp)
c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp)
c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp)
c10896ff: e8 d0 08 fa ff call c1029fd4

The compile with -Os will align the stack pointer then set up the
frame pointer (%ebp), and it copies the return address back into
the stack frame. The change to the return address in mcount is done
to the copy and not the real place holder of the return address.

Then compile with -O2 sets up the frame pointer first, this makes
the change to the return address by mcount affect where the function
will jump on exit.

Reported-by: Jake Edge
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-06-19 06:39:30 +0800
7bf99fb67 gcov: enable GCOV_PROFILE_ALL for x86_64 ... Browse Code »

Enable gcov profiling of the entire kernel on x86_64. Required changes
include disabling profiling for:

* arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
not linked to main kernel
* arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
profiling causes segfaults during boot (incompatible context)

Signed-off-by: Peter Oberparleiter
Cc: Andi Kleen
Cc: Huang Ying
Cc: Li Wei
Cc: Michael Ellerman
Cc: Ingo Molnar
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Rusty Russell
Cc: WANG Cong
Cc: Sam Ravnborg
Cc: Jeff Dike
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Oberparleiter
2009-06-19 04:03:58 +0800
2521f2c22 gcov: add gcov profiling infrastructure ... Browse Code »

Enable the use of GCC's coverage testing tool gcov [1] with the Linux
kernel. gcov may be useful for:

* debugging (has this code been reached at all?)
* test improvement (how do I change my test to cover these lines?)
* minimizing kernel configurations (do I need this option if the
associated code is never run?)

The profiling patch incorporates the following changes:

* change kbuild to include profiling flags
* provide functions needed by profiling code
* present profiling data as files in debugfs

Note that on some architectures, enabling gcc's profiling option
"-fprofile-arcs" for the entire kernel may trigger compile/link/
run-time problems, some of which are caused by toolchain bugs and
others which require adjustment of architecture code.

For this reason profiling the entire kernel is initially restricted
to those architectures for which it is known to work without changes.
This restriction can be lifted once an architecture has been tested
and found compatible with gcc's profiling. Profiling of single files
or directories is still available on all platforms (see config help
text).

[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html

Signed-off-by: Peter Oberparleiter
Cc: Andi Kleen
Cc: Huang Ying
Cc: Li Wei
Cc: Michael Ellerman
Cc: Ingo Molnar
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Rusty Russell
Cc: WANG Cong
Cc: Sam Ravnborg
Cc: Jeff Dike
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Oberparleiter
2009-06-19 04:03:57 +0800
b99b87f70 kernel: constructor support ... Browse Code »

Call constructors (gcc-generated initcall-like functions) during kernel
start and module load. Constructors are e.g. used for gcov data
initialization.

Disable constructor support for usermode Linux to prevent conflicts with
host glibc.

Signed-off-by: Peter Oberparleiter
Acked-by: Rusty Russell
Acked-by: WANG Cong
Cc: Sam Ravnborg
Cc: Jeff Dike
Cc: Andi Kleen
Cc: Huang Ying
Cc: Li Wei
Cc: Michael Ellerman
Cc: Ingo Molnar
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Oberparleiter
2009-06-19 04:03:57 +0800
90af90d7d nsproxy: extract create_nsproxy() ... Browse Code »

clone_nsproxy() does useless copying of old nsproxy -- every pointer will
be rewritten to new ns or to old ns. Remove copying, rename
clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh
nsproxy on restart.

Signed-off-by: Alexey Dobriyan
Acked-by: Serge Hallyn
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:56 +0800
4c2a7e72d utsns: extract creeate_uts_ns() ... Browse Code »

create_uts_ns() will be used by C/R to create fresh uts_ns.

Signed-off-by: Alexey Dobriyan
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:55 +0800
dca4a9796 pidns: rewrite copy_pid_ns() ... Browse Code »

copy_pid_ns() is a perfect example of a case where unwinding leads to more
code and makes it less clear. Watch the diffstat.

Signed-off-by: Alexey Dobriyan
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Reviewed-by: Serge Hallyn
Acked-by: Sukadev Bhattiprolu
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:55 +0800
ed469a63c pidns: make create_pid_namespace() accept parent pidns ... Browse Code »

create_pid_namespace() creates everything, but caller has to assign parent
pidns by hand, which is unnatural. At the moment of call new ->level has
to be taken from somewhere and parent pidns is already available.

Signed-off-by: Alexey Dobriyan
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Acked-by: Serge Hallyn
Acked-by: Sukadev Bhattiprolu
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:55 +0800
17f98dcf6 pids: clean up find_task_by_pid variants ... Browse Code »

find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
find_task_by_pid_ns to implement find_task_by_vpid.

While we're at it also remove the exports for find_task_by_pid_ns and
find_task_by_vpid - we don't have any modular callers left as the only
modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
switched to pid_task which operates on a struct pid pointer instead of a
pid_t. Given the confusion about pid_t values vs namespace that's
generally the better option anyway and I think we're better of restricting
modules to do it that way.

Signed-off-by: Christoph Hellwig
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2009-06-19 04:03:55 +0800
7338f2998 sysctl.c: remove unused variable ... Browse Code »

Remoce the unused variable 'val' from __do_proc_dointvec()

The integer has been declared and used as 'val = -val' and there is no
reference to it anywhere.

Signed-off-by: Sukanto Ghosh
Cc: Jaswinder Singh Rajput
Cc: Sukanto Ghosh
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukanto Ghosh
2009-06-19 04:03:54 +0800
371cbb387 kthreads: simplify migration_thread() exit path ... Browse Code »

Now that kthread_stop() can be used even if the task has already exited,
we can kill the "wait_to_die:" loop in migration_thread(). But we must
pin rq->migration_thread after creation.

Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
->migration_thread exit. Perhaps we can simplify this code a bit more.
migration_call() can set ->should_stop and forget about this thread. But
we need a new helper in kthred.c for that.

Signed-off-by: Oleg Nesterov
Cc: Christoph Hellwig
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Pavel Emelyanov
Cc: Rusty Russell
Cc: Vitaliy Gusev
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-19 04:03:54 +0800
63706172f kthreads: rework kthread_stop() ... Browse Code »

Based on Eric's patch which in turn was based on my patch.

kthread_stop() has the nasty problems:

- it runs unpredictably long with the global semaphore held.

- it deadlocks if kthread itself does kthread_stop() before it obeys
the kthread_should_stop() request.

- it is not useable if kthread exits on its own, see for example the
ugly "wait_to_die:" hack in migration_thread()

- it is not possible to just tell kthread it should stop, we must always
wait for its exit.

With this patch kthread() allocates all neccesary data (struct kthread) on
its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used
as a pointer into "struct kthread", this means kthread_stop() can easily
wait for kthread's exit.

Signed-off-by: Oleg Nesterov
Cc: Christoph Hellwig
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Pavel Emelyanov
Cc: Rusty Russell
Cc: Vitaliy Gusev
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-19 04:03:54 +0800