Eric Lee / smarc-fsl-linux-kernel

11 Jan, 2012

1 commit

43d2b1132 tracepoint: add tracepoints for debugging oom_score_adj ... Browse Code »

oom_score_adj is used for guarding processes from OOM-Killer. One of
problem is that it's inherited at fork(). When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.

This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
- creating new task
- renaming a task (exec)
- set oom_score_adj

To debug, users need to enable some trace pointer. Maybe filtering is useful as

# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable

output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699 [007] d..3 5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699 [007] ...1 5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729 [003] ...2 5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699 [002] ...1 5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730 [007] ...2 5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000

Signed-off-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2012-01-11 08:30:44 +0800

04 Jan, 2012

1 commit

f47ec3f28 trim fs/internal.h ... Browse Code »

some stuff in there can actually become static; some belongs to pnode.h
as it's a private interface between namespace.c and pnode.c...

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:52:35 +0800

01 Nov, 2011

1 commit

c9f01245b oom: remove oom_disable_count ... Browse Code »

This removes mm->oom_disable_count entirely since it's unnecessary and
currently buggy. The counter was intended to be per-process but it's
currently decremented in the exit path for each thread that exits, causing
it to underflow.

The count was originally intended to prevent oom killing threads that
share memory with threads that cannot be killed since it doesn't lead to
future memory freeing. The counter could be fixed to represent all
threads sharing the same mm, but it's better to remove the count since:

- it is possible that the OOM_DISABLE thread sharing memory with the
victim is waiting on that thread to exit and will actually cause
future memory freeing, and

- there is no guarantee that a thread is disabled from oom killing just
because another thread sharing its mm is oom disabled.

Signed-off-by: David Rientjes
Reported-by: Oleg Nesterov
Reviewed-by: Oleg Nesterov
Cc: Ying Han
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-11-01 08:30:45 +0800

12 Aug, 2011

1 commit

72fa59970 move RLIMIT_NPROC check from set_user() to do_execve_common() ... Browse Code »

The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only. But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges. So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag. With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve(). If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected. If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris
Signed-off-by: Vasiliy Kulikov
Acked-by: NeilBrown
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2011-08-12 02:24:42 +0800

27 Jul, 2011

7 commits

32e107f71 fs/exec.c:acct_arg_size(): ptl is no longer needed for add_mm_counter() ... Browse Code »

acct_arg_size() takes ->page_table_lock around add_mm_counter() if
!SPLIT_RSS_COUNTING. This is not needed after commit 172703b08cd0 ("mm:
delete non-atomic mm counter implementation").

Signed-off-by: Oleg Nesterov
Reviewed-by: Matt Fleming
Cc: Dave Hansen
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2011-07-27 07:49:44 +0800
b4edf8bd0 exec: do not retry load_binary method if CONFIG_MODULES=n ... Browse Code »

If CONFIG_MODULES=n, it makes no sense to retry the list of binary formats
handler because the list will not be modified by request_module().

Signed-off-by: Tetsuo Handa
Cc: Richard Weinberger
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2011-07-27 07:49:44 +0800
912193521 exec: do not call request_module() twice from search_binary_handler() ... Browse Code »
1

Currently, search_binary_handler() tries to load binary loader module
using request_module() if a loader for the requested program is not yet
loaded. But second attempt of request_module() does not affect the result
of search_binary_handler().

If request_module() triggered recursion, calling request_module() twice
causes 2 to the power of MAX_KMOD_CONCURRENT (= 50) repetitions. It is
not an infinite loop but is sufficient for users to consider as a hang up.

Therefore, this patch changes not to call request_module() twice, making 1
to the power of MAX_KMOD_CONCURRENT repetitions in case of recursion.

Signed-off-by: Tetsuo Handa
Reported-by: Richard Weinberger
Tested-by: Richard Weinberger
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2011-07-27 07:49:44 +0800
aacb3d17a fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP ... Browse Code »

Commit a8bef8ff6ea1 ("mm: migration: avoid race between
shift_arg_pages() and rmap_walk() during migration by not migrating
temporary stacks") introduced a BUG_ON() to ensure that VM_STACK_FLAGS
and VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile
time one, so BUILD_BUG_ON is more appropriate.

Signed-off-by: Michal Hocko
Cc: Mel Gorman
Cc: Richard Weinberger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-07-27 07:49:44 +0800
99b645674 do_coredump: fix the "ispipe" error check ... Browse Code »

do_coredump() assumes that if format_corename() fails it should return
-ENOMEM. This is not true, for example cn_print_exe_file() can propagate
the error from d_path. Even if it was true, this is too fragile. Change
the code to check "ispipe < 0".

Signed-off-by: Oleg Nesterov
Signed-off-by: Jiri Slaby
Reviewed-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2011-07-27 07:49:43 +0800
2c563731f coredump: escape / in hostname and comm ... Browse Code »

Change every occurence of / in comm and hostname to !. If the process
changes its name to contain /, the core is not dumped (if the directory
tree doesn't exist like that). The same with hostname being something
like myhost/3. Fix this behaviour by using the escape loop used in %E.
(We extract it to a separate function.)

Now both with comm == myprocess/1 and hostname == myhost/1, the core is
dumped like (kernel.core_pattern='core.%p.%e.%h):
core.2349.myprocess!1.myhost!1

Signed-off-by: Jiri Slaby
Cc: Alan Cox
Cc: Al Viro
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2011-07-27 07:49:43 +0800
3141c8b16 coredump: use task comm instead of (unknown) ... Browse Code »

If we don't know the file corresponding to the binary (i.e. exe_file is
unknown), use "task->comm (path unknown)" instead of simple "(unknown)"
as suggested by ak.

The fallback is the same as %e except it will append "(path unknown)".

Signed-off-by: Jiri Slaby
Cc: Alan Cox
Cc: Al Viro
Cc: Andi Kleen
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2011-07-27 07:49:43 +0800

23 Jul, 2011

2 commits

bbd9d6f7f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
isofs: Remove global fs lock
jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
mm/truncate.c: fix build for CONFIG_BLOCK not enabled
fs:update the NOTE of the file_operations structure
Remove dead code in dget_parent()
AFS: Fix silly characters in a comment
switch d_add_ci() to d_splice_alias() in "found negative" case as well
simplify gfs2_lookup()
jfs_lookup(): don't bother with . or ..
get rid of useless dget_parent() in btrfs rename() and link()
get rid of useless dget_parent() in fs/btrfs/ioctl.c
fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
drivers: fix up various ->llseek() implementations
fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
Ext4: handle SEEK_HOLE/SEEK_DATA generically
Btrfs: implement our own ->llseek
fs: add SEEK_HOLE and SEEK_DATA flags
reiserfs: make reiserfs default to barrier=flush
...

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.

Linus Torvalds
2011-07-23 10:02:39 +0800
8209f53d7 Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc ... Browse Code »

* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
connector: add an event for monitoring process tracers
ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
ptrace_init_task: initialize child->jobctl explicitly
has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
ptrace: ptrace_reparented() should check same_thread_group()
redefine thread_group_leader() as exit_signal >= 0
do not change dead_task->exit_signal
kill task_detached()
reparent_leader: check EXIT_DEAD instead of task_detached()
make do_notify_parent() __must_check, update the callers
__ptrace_detach: avoid task_detached(), check do_notify_parent()
kill tracehook_notify_death()
make do_notify_parent() return bool
ptrace: s/tracehook_tracer_task()/ptrace_parent()/
...

Linus Torvalds
2011-07-23 06:06:50 +0800

22 Jul, 2011

1 commit

eac1b5e57 ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever ... Browse Code »

Test-case:

void *tfunc(void *arg)
{
execvp("true", NULL);
return NULL;
}

int main(void)
{
int pid;

if (fork()) {
pthread_t t;

kill(getpid(), SIGSTOP);

pthread_create(&t, NULL, tfunc, NULL);

for (;;)
pause();
}

pid = getppid();
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);

while (wait(NULL) > 0)
ptrace(PTRACE_CONT, pid, 0,0);

return 0;
}

It is racy, exit_notify() does __wake_up_parent() too. But in the
likely case it triggers the problem: de_thread() does release_task()
and the old leader goes away without the notification, the tracer
sleeps in do_wait() without children/tracees.

Change de_thread() to do __wake_up_parent(traced_leader->parent).
Since it is already EXIT_DEAD we can do this without ptrace_unlink(),
EXIT_DEAD threads do not exist from do_wait's pov.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo

Oleg Nesterov
2011-07-22 21:10:49 +0800

20 Jul, 2011

1 commit

1b5d783c9 consolidate BINPRM_FLAGS_ENFORCE_NONDUMP handling ... Browse Code »

new helper: would_dump(bprm, file). Checks if we are allowed to
read the file and if we are not - sets ENFORCE_NODUMP. Exported,
used in places that previously open-coded the same logics.

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:10 +0800

02 Jul, 2011

1 commit

bb188d7e6 ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop ... Browse Code »

When multithreaded program execs under ptrace,
all traced threads report WIFEXITED status, except for
thread group leader and the thread which execs.

Unless tracer tracks thread group relationship between tracees,
which is a nontrivial task, it will not detect that
execed thread no longer exists.

This patch allows tracer to figure out which thread
performed this exec, by requesting PTRACE_GETEVENTMSG
in PTRACE_EVENT_EXEC stop.

Another, samller problem which is solved by this patch
is that tracer now can figure out which of the several
concurrent execs in multithreaded program succeeded.

Signed-off-by: Denys Vlasenko
Signed-off-by: Oleg Nesterov

Denys Vlasenko
2011-07-02 00:51:49 +0800

28 Jun, 2011

1 commit

087806b12 redefine thread_group_leader() as exit_signal >= 0 ... Browse Code »

Change de_thread() to set old_leader->exit_signal = -1. This is
good for the consistency, it is no longer the leader and all
sub-threads have exit_signal = -1 set by copy_process(CLONE_THREAD).

And this allows us to micro-optimize thread_group_leader(), it can
simply check exit_signal >= 0. This also makes sense because we
should move ->group_leader from task_struct to signal_struct.

Signed-off-by: Oleg Nesterov
Reviewed-by: Tejun Heo

Oleg Nesterov
2011-06-28 02:30:10 +0800

23 Jun, 2011

2 commits

4b9d33e6d ptrace: kill clone/exec tracehooks ... Browse Code »

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patch kills the following clone and exec related tracehooks.

tracehook_prepare_clone()
tracehook_finish_clone()
tracehook_report_clone()
tracehook_report_clone_complete()
tracehook_unsafe_exec()

The changes are mostly trivial - logic is moved to the caller and
comments are merged and adjusted appropriately.

The only exception is in check_unsafe_exec() where LSM_UNSAFE_PTRACE*
are OR'd to bprm->unsafe instead of setting it, which produces the
same result as the field is always zero on entry. It also tests
p->ptrace instead of (p->ptrace & PT_PTRACED) for consistency, which
also gives the same result.

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-23 01:26:29 +0800
a288eecce ptrace: kill trivial tracehooks ... Browse Code »

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patch kills the following trivial tracehooks.

* Ones testing whether task is ptraced. Replace with ->ptrace test.

tracehook_expect_breakpoints()
tracehook_consider_ignored_signal()
tracehook_consider_fatal_signal()

* ptrace_event() wrappers. Call directly.

tracehook_report_exec()
tracehook_report_exit()
tracehook_report_vfork_done()

* ptrace_release_task() wrapper. Call directly.

tracehook_finish_release_task()

* noop

tracehook_prepare_release_task()
tracehook_report_death()

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Cc: Martin Schwidefsky
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-23 01:26:28 +0800

18 Jun, 2011

1 commit

879669961 KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring ... Browse Code »

____call_usermodehelper() now erases any credentials set by the
subprocess_inf::init() function. The problem is that commit
17f60a7da150 ("capabilites: allow the application of capability limits
to usermode helpers") creates and commits new credentials with
prepare_kernel_cred() after the call to the init() function. This wipes
all keyrings after umh_keys_init() is called.

The best way to deal with this is to put the init() call just prior to
the commit_creds() call, and pass the cred pointer to init(). That
means that umh_keys_init() and suchlike can modify the credentials
_before_ they are published and potentially in use by the rest of the
system.

This prevents request_key() from working as it is prevented from passing
the session keyring it set up with the authorisation token to
/sbin/request-key, and so the latter can't assume the authority to
instantiate the key. This causes the in-kernel DNS resolver to fail
with ENOKEY unconditionally.

Signed-off-by: David Howells
Acked-by: Eric Paris
Tested-by: Jeff Layton
Signed-off-by: Linus Torvalds

David Howells
2011-06-18 00:40:48 +0800

16 Jun, 2011

2 commits

13fca640b Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP" ... Browse Code »

This reverts commit 7f81c8890c15a10f5220bebae3b6dfae4961962a.

It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.

The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have

arch/um/sys-x86_64/shared/sysdep/vm-flags.h:

#define VM_STACK_DEFAULT_FLAGS \
(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)

#define VM_STACK_DEFAULT_FLAGS vm_stack_flags

(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")

It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.

Reported-by: Randy Dunlap
Acked-by: Andrew Morton
Cc: Michal Hocko
Cc: Richard Weinberger
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-06-16 12:53:52 +0800
7f81c8890 fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP ... Browse Code »

Commit a8bef8ff6ea1 ("mm: migration: avoid race between shift_arg_pages()
and rmap_walk() during migration by not migrating temporary stacks")
introduced a BUG_ON() to ensure that VM_STACK_FLAGS and
VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile time
one, so BUILD_BUG_ON is more appropriate.

Signed-off-by: Michal Hocko
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-06-16 11:03:59 +0800

10 Jun, 2011

1 commit

dac853ae8 exec: delay address limit change until point of no return ... Browse Code »

Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.

Signed-off-by: Mathias Krause
Cc: Al Viro
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Mathias Krause
2011-06-10 03:50:05 +0800

05 Jun, 2011

3 commits

6dfca3298 job control: make task_clear_jobctl_pending() clear TRAPPING automatically ... Browse Code »

JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to
(re)transit into TRACED. task_clear_jobctl_pending() must be called
when either tracee enters TRACED or the transition is cancelled for
some reason. The former is achieved by explicitly calling
task_clear_jobctl_pending() in ptrace_stop() and the latter by calling
it at the end of do_signal_stop().

Calling task_clear_jobctl_trapping() at the end of do_signal_stop()
limits the scope TRAPPING can be used and is fragile in that seemingly
unrelated changes to tracee's control flow can lead to stuck TRAPPING.

We already have task_clear_jobctl_pending() calls on those cancelling
events to clear JOBCTL_STOP_PENDING. Cancellations can be handled by
making those call sites use JOBCTL_PENDING_MASK instead and updating
task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is
called automatically if no stop/trap is pending.

This patch makes the above changes and removes the fallback
task_clear_jobctl_trapping() call from do_signal_stop().

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:11 +0800
3759a0d94 job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() ... Browse Code »

This patch introduces JOBCTL_PENDING_MASK and replaces
task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
which takes an extra @mask argument.

JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
future patches will add more bits. recalc_sigpending_tsk() is updated
to use JOBCTL_PENDING_MASK instead.

task_clear_jobctl_pending() takes @mask which in subset of
JOBCTL_PENDING_MASK and clears the relevant jobctl bits. If
JOBCTL_STOP_PENDING is set, other STOP bits are cleared together. All
task_clear_jobctl_stop_pending() users are updated to call
task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
functionally identical to task_clear_jobctl_stop_pending().

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:10 +0800
a8f072c1d job control: rename signal->group_stop and flags to jobctl and update them ... Browse Code »

signal->group_stop currently hosts mostly group stop related flags;
however, it's gonna be used for wider purposes and the GROUP_STOP_
flag prefix becomes confusing. Rename signal->group_stop to
signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
defined in terms of them to allow using bitops later.

While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
future additions.

This doesn't cause any functional change.

-v2: JOBCTL_*_BIT macros added as suggested by Linus.

Signed-off-by: Tejun Heo
Cc: Linus Torvalds
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:09 +0800

27 May, 2011

2 commits

57cc083ad coredump: add support for exe_file in core name ... Browse Code »

Now, exe_file is not proc FS dependent, so we can use it to name core
file. So we add %E pattern for core file name cration which extract path
from mm_struct->exe_file. Then it converts slashes to exclamation marks
and pastes the result to the core file name itself.

This is useful for environments where binary names are longer than 16
character (the current->comm limitation). Also where there are binaries
with same name but in a different path. Further in case the binery itself
changes its current->comm after exec.

So by doing (s/$/#/ -- # is treated as git comment):

$ sysctl kernel.core_pattern='core.%p.%e.%E'
$ ln /bin/cat cat45678901234567890
$ ./cat45678901234567890
^Z
$ rm cat45678901234567890
$ fg
^\Quit (core dumped)
$ ls core*

we now get:

core.2434.cat456789012345.!root!cat45678901234567890 (deleted)

Signed-off-by: Jiri Slaby
Cc: Al Viro
Cc: Alan Cox
Reviewed-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2011-05-27 08:12:36 +0800
386460138 mm: extract exe_file handling from procfs ... Browse Code »

Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/.
This was because exe_file was needed only for /proc//exe. Since we
will need the exe_file functionality also for core dumps (so core name can
contain full binary path), built this functionality always into the
kernel.

To achieve that move that out of proc FS to the kernel/ where in fact it
should belong. By doing that we can make dup_mm_exe_file static. Also we
can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c.

Signed-off-by: Jiri Slaby
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2011-05-27 08:12:36 +0800

25 May, 2011

2 commits

d16dfc550 mm: mmu_gather rework ... Browse Code »

Rework the existing mmu_gather infrastructure.

The direct purpose of these patches was to allow preemptible mmu_gather,
but even without that I think these patches provide an improvement to the
status quo.

The first 9 patches rework the mmu_gather infrastructure. For review
purpose I've split them into generic and per-arch patches with the last of
those a generic cleanup.

The next patch provides generic RCU page-table freeing, and the followup
is a patch converting s390 to use this. I've also got 4 patches from
DaveM lined up (not included in this series) that uses this to implement
gup_fast() for sparc64.

Then there is one patch that extends the generic mmu_gather batching.

After that follow the mm preemptibility patches, these make part of the mm
a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to
mutexes which together with the mmu_gather rework makes mmu_gather
preemptible as well.

Making i_mmap_lock a mutex also enables a clean-up of the truncate code.

This also allows for preemptible mmu_notifiers, something that XPMEM I
think wants.

Furthermore, it removes the new and universially detested unmap_mutex.

This patch:

Remove the first obstacle towards a fully preemptible mmu_gather.

The current scheme assumes mmu_gather is always done with preemption
disabled and uses per-cpu storage for the page batches. Change this to
try and allocate a page for batching and in case of failure, use a small
on-stack array to make some progress.

Preemptible mmu_gather is desired in general and usable once i_mmap_lock
becomes a mutex. Doing it before the mutex conversion saves us from
having to rework the code by moving the mmu_gather bits inside the
pte_lock.

Also avoid flushing the tlb batches from under the pte lock, this is
useful even without the i_mmap_lock conversion as it significantly reduces
pte lock hold times.

[akpm@linux-foundation.org: fix comment tpyo]
Signed-off-by: Peter Zijlstra
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Reviewed-by: KAMEZAWA Hiroyuki
Acked-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:12 +0800
d05f3169c mm: make expand_downwards() symmetrical with expand_upwards() ... Browse Code »

Currently we have expand_upwards exported while expand_downwards is
accessible only via expand_stack or expand_stack_downwards.

check_stack_guard_page is a nice example of the asymmetry. It uses
expand_stack for VM_GROWSDOWN while expand_upwards is called for
VM_GROWSUP case.

Let's clean this up by exporting both functions and make those names
consistent. Let's use expand_{upwards,downwards} because expanding
doesn't always involve stack manipulation (an example is
ia64_do_page_fault which uses expand_upwards for registers backing store
expansion). expand_downwards has to be defined for both
CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards
version in the early process initialization phase for growsup
configuration.

Signed-off-by: Michal Hocko
Acked-by: Hugh Dickins
Cc: James Bottomley
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-05-25 23:39:12 +0800

24 May, 2011

1 commit

99dff5856 Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 ... Browse Code »

* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (48 commits)
serial: 8250_pci: add support for Cronyx Omega PCI multiserial board.
tty/serial: Fix break handling for PORT_TEGRA
tty/serial: Add explicit PORT_TEGRA type
n_tracerouter and n_tracesink ldisc additions.
Intel PTI implementaiton of MIPI 1149.7.
Kernel documentation for the PTI feature.
export kernel call get_task_comm().
tty: Remove to support serial for S5P6442
pch_phub: Support new device ML7223
8250_pci: Add support for the Digi/IBM PCIe 2-port Adapter
ASoC: Update cx20442 for TTY API change
pch_uart: Support new device ML7223 IOH
parport: Use request_muxed_region for IT87 probe and lock
tty/serial: add support for Xilinx PS UART
n_gsm: Use print_hex_dump_bytes
drivers/tty/moxa.c: Put correct tty value
TTY: tty_io, annotate locking functions
TTY: serial_core, remove superfluous set_task_state
TTY: serial_core, remove invalid test
Char: moxa, fix locking in moxa_write
...

Fix up trivial conflicts in drivers/bluetooth/hci_ldisc.c and
drivers/tty/serial/Makefile.

I did the hci_ldisc thing as an evil merge, cleaning things up.

Linus Torvalds
2011-05-24 03:23:20 +0800

23 May, 2011

1 commit

4d9dec4db Merge branch 'exec_rm_compat' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc ... Browse Code »

* 'exec_rm_compat' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc:
exec: document acct_arg_size()
exec: unify do_execve/compat_do_execve code
exec: introduce struct user_arg_ptr
exec: introduce get_user_arg_ptr() helper

Linus Torvalds
2011-05-23 23:28:34 +0800

14 May, 2011

1 commit

7d74f492e export kernel call get_task_comm(). ... Browse Code »

This allows drivers who call this function to be compiled modularly.
Otherwise, a driver who is interested in this type of functionality
has to implement their own get_task_comm() call, causing code
duplication in the Linux source tree.

Signed-off-by: J Freyensee
Acked-by: David Rientjes
Signed-off-by: Greg Kroah-Hartman

J Freyensee
2011-05-14 07:30:59 +0800

09 Apr, 2011

4 commits

ae6b585ee exec: document acct_arg_size() ... Browse Code »

Add the comment to explain acct_arg_size().

Signed-off-by: Oleg Nesterov
Reviewed-by: KOSAKI Motohiro

Oleg Nesterov
2011-04-09 21:53:57 +0800
0e028465d exec: unify do_execve/compat_do_execve code ... Browse Code »

Add the appropriate members into struct user_arg_ptr and teach
get_user_arg_ptr() to handle is_compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(is_compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov
Reviewed-by: KOSAKI Motohiro
Tested-by: KOSAKI Motohiro

Oleg Nesterov
2011-04-09 21:53:56 +0800
ba2d01629 exec: introduce struct user_arg_ptr ... Browse Code »

No functional changes, preparation.

Introduce struct user_arg_ptr, change do_execve() paths to use it
instead of "char __user * const __user *argv".

This makes the argv/envp arguments opaque, we are ready to handle the
compat case which needs argv pointing to compat_uptr_t.

Suggested-by: Linus Torvalds
Signed-off-by: Oleg Nesterov
Reviewed-by: KOSAKI Motohiro
Tested-by: KOSAKI Motohiro

Oleg Nesterov
2011-04-09 21:53:53 +0800
1d1dbf813 exec: introduce get_user_arg_ptr() helper ... Browse Code »

Introduce get_user_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov
Reviewed-by: KOSAKI Motohiro
Tested-by: KOSAKI Motohiro

Oleg Nesterov
2011-04-09 21:53:45 +0800

23 Mar, 2011

1 commit

39efa3ef3 signal: Use GROUP_STOP_PENDING to stop once for a single group stop ... Browse Code »

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop. However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

Conversely, if a ptraced task is in TASK_TRACED state, the debugger
won't get notified of group stops which is inconsistent compared to
the ptraced task in any other state.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress. The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, and consulted whenever whether the task should
participate in a group stop needs to be determined. Note that now
tasks in TASK_TRACED also participate in group stop.

This results in the following behavior changes.

* For a single group stop, a ptracer would see at most one stop
reported.

* A ptracee in TASK_TRACED now also participates in group stop and the
tracer would get the notification. However, as a ptraced task could
be in TASK_STOPPED state or any ptrace trap could consume group
stop, the notification may still be missing. These will be
addressed with further patches.

* A ptracee may start a group stop while one is still in progress if
the tracer let it continue with stop signal delivery. Group stop
code handles this correctly.

Oleg:

* Spotted that a task might skip signal check even when its
GROUP_STOP_PENDING is set. Fixed by updating
recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
group_stop_count.

* Pointed out that task->group_stop should be cleared whenever
task->signal->group_stop_count is cleared. Fixed accordingly.

* Pointed out the behavior inconsistency between TASK_TRACED and
RUNNING and the last behavior change.

Signed-off-by: Tejun Heo
Acked-by: Oleg Nesterov
Cc: Roland McGrath

Tejun Heo
2011-03-23 17:37:00 +0800

21 Mar, 2011

1 commit

1bef82917 Small typo fix... ... Browse Code »

Hi,

I was backporting the coredump over pipe feature and noticed this small typo,
I wish I would have something bigger to contribute...

>From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
From: Holger Hans Peter Freyther
Date: Thu, 24 Feb 2011 17:42:50 +0100
Subject: [PATCH] fs: Fix a small typo in the comment

The function is called umh_pipe_setup not uhm_pipe_setup.

Signed-off-by: Holger Hans Peter Freyther
Signed-off-by: Al Viro

Holger Hans Peter Freyther
2011-03-21 12:16:09 +0800

14 Mar, 2011

1 commit

47c805dc2 switch do_filp_open() to struct open_flags ... Browse Code »

take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.

Signed-off-by: Al Viro

Al Viro
2011-03-14 21:15:25 +0800