Eric Lee / smarc-fsl-linux-kernel

23 Mar, 2016

1 commit

378c6520e fs/coredump: prevent fsuid=0 dumps into user-controlled directories ... Browse Code »

This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Andy Lutomirski
Cc: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2016-03-23 06:36:02 +0800

21 Jan, 2016

1 commit

ac94b6e3b fs/coredump: prevent "" / "." / ".." core path components ... Browse Code »

Let %h and %e print empty values as "!", "." as "!" and
".." as "!.".

This prevents hostnames and comm values that are empty or consist of one
or two dots from changing the directory level at which the corefile will
be stored.

Consider the case where someone decides to sort coredumps by hostname
with a core pattern like "/cores/%h/core.%e.%p.%t" or so. In this
case, hostnames "" and "." would cause the coredump to land directly in
/cores, which is not what the intent behind the core pattern is, and
".." would cause the coredump to land in /.

Yeah, there probably aren't many people who do that, but I still don't
want this edgecase to be kind of broken.

It seems very unlikely that this caused security issues anywhere, so I'm
not requesting a stable backport.

[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2016-01-21 09:09:18 +0800

07 Dec, 2015

1 commit

03927c8ac coredump: Use 64bit time for unix time of coredump ... Browse Code »

struct timeval on 32-bit systems will have its tv_sec
value overflow in year 2038 and beyond.
Use a 64 bit value to print time of the coredump in seconds.
ktime_get_real_seconds is chosen here for efficiency reasons.

Suggested by: Arnd Bergmann
Signed-off-by: Tina Ruchandani
Signed-off-by: Arnd Bergmann
Signed-off-by: Al Viro

Arnd Bergmann
2015-12-07 10:17:17 +0800

07 Nov, 2015

2 commits

d61ba5895 coredump: change zap_threads() and zap_process() to use for_each_thread() ... Browse Code »

Change zap_threads() paths to use for_each_thread() rather than
while_each_thread().

While at it, change zap_threads() to avoid the nested if's to make the
code more readable and lessen the indentation.

Signed-off-by: Oleg Nesterov
Cc: David Rientjes
Cc: Kyle Walker
Cc: Michal Hocko
Cc: Stanislav Kozina
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-11-07 09:50:42 +0800
5fa534c98 coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP ... Browse Code »

task_will_free_mem() is wrong in many ways, and in particular the
SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
coredumping without SIGNAL_GROUP_COREDUMP bit set.

change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
other CLONE_VM processes can't react to SIGKILL. Fortunately, at least
oom-kill case if fine; it kills all tasks sharing the same mm, so it
should also kill the process which actually dumps the core.

The change in prepare_signal() is not strictly necessary, it just ensures
that the patch does not bring another subtle behavioural change. But it
reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.

Signed-off-by: Oleg Nesterov
Cc: David Rientjes
Cc: Kyle Walker
Acked-by: Michal Hocko
Cc: Stanislav Kozina
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-11-07 09:50:42 +0800

11 Sep, 2015

2 commits

40f705a73 fs: Don't dump core if the corefile would become world-readable. ... Browse Code »

On a filesystem like vfat, all files are created with the same owner
and mode independent of who created the file. When a vfat filesystem
is mounted with root as owner of all files and read access for everyone,
root's processes left world-readable coredumps on it (but other
users' processes only left empty corefiles when given write access
because of the uid mismatch).

Given that the old behavior was inconsistent and insecure, I don't see
a problem with changing it. Now, all processes refuse to dump core unless
the resulting corefile will only be readable by their owner.

Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2015-09-11 04:29:01 +0800
fbb181694 fs: if a coredump already exists, unlink and recreate with O_EXCL ... Browse Code »

It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory. Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps. Fix
that issue by never writing coredumps into existing files.

Requirements for the attack:
- The attack only applies if the victim's process has a nonzero
RLIMIT_CORE and is dumpable.
- The attacker can trick the victim into coredumping into an
attacker-writable directory D, either because the core_pattern is
relative and the victim's cwd is attacker-writable or because an
absolute core_pattern pointing to a world-writable directory is used.
- The attacker has one of these:
A: on a system with protected_hardlinks=0:
execute access to a folder containing a victim-owned,
attacker-readable file on the same partition as D, and the
victim-owned file will be deleted before the main part of the attack
takes place. (In practice, there are lots of files that fulfill
this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
This does not apply to most Linux systems because most distros set
protected_hardlinks=1.
B: on a system with protected_hardlinks=1:
execute access to a folder containing a victim-owned,
attacker-readable and attacker-writable file on the same partition
as D, and the victim-owned file will be deleted before the main part
of the attack takes place.
(This seems to be uncommon.)
C: on any system, independent of protected_hardlinks:
write access to a non-sticky folder containing a victim-owned,
attacker-readable file on the same partition as D
(This seems to be uncommon.)

The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core. The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.

If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).

A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.

On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps. (This could theoretically lead to a
privilege escalation.)

Signed-off-by: Jann Horn
Cc: Kees Cook
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2015-09-11 04:29:01 +0800

05 Jul, 2015

1 commit

1dc51b828 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...

Linus Torvalds
2015-07-05 10:36:06 +0800

26 Jun, 2015

2 commits

b4176b7c1 coredump: add __printf attribute to cn_*printf functions ... Browse Code »

This allows detecting improper format string at build time, like:

fs/coredump.c:225:5: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
err = cn_printf(cn, "%ld", cprm->siginfo->si_signo);
^

As si_signo is always an int, the format should be %d here.

Signed-off-by: Nicolas Iooss
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Iooss
2015-06-26 08:00:43 +0800
5202efe54 coredump: use from_kuid/kgid when formatting corename ... Browse Code »

When adding __printf attribute to cn_printf, gcc reports some issues:

fs/coredump.c:213:5: warning: format '%d' expects argument of type
'int', but argument 3 has type 'kuid_t' [-Wformat=]
err = cn_printf(cn, "%d", cred->uid);
^
fs/coredump.c:217:5: warning: format '%d' expects argument of type
'int', but argument 3 has type 'kgid_t' [-Wformat=]
err = cn_printf(cn, "%d", cred->gid);
^

These warnings come from the fact that the value of uid/gid needs to be
extracted from the kuid_t/kgid_t structure before being used as an
integer. More precisely, cred->uid and cred->gid need to be converted to
either user-namespace uid/gid or to init_user_ns uid/gid.

Use init_user_ns in order not to break existing ABI, and document this in
Documentation/sysctl/kernel.txt.

While at it, format uid and gid values with %u instead of %d because
uid_t/__kernel_uid32_t and gid_t/__kernel_gid32_t are unsigned int.

Signed-off-by: Nicolas Iooss
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Iooss
2015-06-26 08:00:43 +0800

24 Jun, 2015

1 commit

9bf39ab2a vfs: add file_path() helper ... Browse Code »

Turn
d_path(&file->f_path, ...);
into
file_path(file, ...);

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2015-06-24 06:00:05 +0800

12 Apr, 2015

1 commit

86cc05840 coredump: accept any write method ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:29:39 +0800

20 Feb, 2015

1 commit

fcbc32bc6 coredump: Fix typo in comment ... Browse Code »

Signed-off-by: Bastien Nocera
Signed-off-by: Al Viro

Bastien Nocera
2015-02-20 17:56:44 +0800

14 Oct, 2014

1 commit

b03023ecb coredump: add %i/%I in core_pattern to report the tid of the crashed thread ... Browse Code »

format_corename() can only pass the leader's pid to the core handler,
but there is no simple way to figure out which thread originated the
coredump.

As Jan explains, this also means that there is no simple way to create
the backtrace of the crashed process:

As programs are mostly compiled with implicit gcc -fomit-frame-pointer
one needs program's .eh_frame section (equivalently PT_GNU_EH_FRAME
segment) or .debug_frame section. .debug_frame usually is present only
in separate debug info files usually not even installed on the system.
While .eh_frame is a part of the executable/library (and it is even
always mapped for C++ exceptions unwinding) it no longer has to be
present anywhere on the disk as the program could be upgraded in the
meantime and the running instance has its executable file already
unlinked from disk.

One possibility is to echo 0x3f >/proc/*/coredump_filter and dump all
the file-backed memory including the executable's .eh_frame section.
But that can create huge core files, for example even due to mmapped
data files.

Other possibility would be to read .eh_frame from /proc/PID/mem at the
core_pattern handler time of the core dump. For the backtrace one needs
to read the register state first which can be done from core_pattern
handler:

ptrace(PTRACE_SEIZE, tid, 0, PTRACE_O_TRACEEXIT)
close(0); // close pipe fd to resume the sleeping dumper
waitpid(); // should report EXIT
PTRACE_GETREGS or other requests

The remaining problem is how to get the 'tid' value of the crashed
thread. It could be read from the first NT_PRSTATUS note of the core
file but that makes the core_pattern handler complicated.

Unfortunately %t is already used so this patch uses %i/%I.

Automatic Bug Reporting Tool (https://github.com/abrt/abrt/wiki/overview)
is experimenting with this. It is using the elfutils
(https://fedorahosted.org/elfutils/) unwinder for generating the
backtraces. Apart from not needing matching executables as mentioned
above, another advantage is that we can get the backtrace without saving
the core (which might be quite large) to disk.

[mmilata@redhat.com: final paragraph of changelog]
Signed-off-by: Jan Kratochvil
Signed-off-by: Oleg Nesterov
Cc: Alexander Viro
Cc: Denys Vlasenko
Cc: Jan Kratochvil
Cc: Mark Wielaard
Cc: Martin Milata
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-10-14 08:18:21 +0800

24 Jul, 2014

1 commit

aed8adb76 coredump: fix the setting of PF_DUMPCORE ... Browse Code »

Commit 079148b919d0 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags. This causes issues during
core generation when tsk->flags is checked again (eg. for PF_USED_MATH
to dump floating point registers). Fix this.

Signed-off-by: Silesh C V
Acked-by: Oleg Nesterov
Cc: Mandeep Singh Baines
Cc: [3.10+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Silesh C V
2014-07-24 06:10:54 +0800

20 Apr, 2014

1 commit

404ca80eb coredump: fix va_list corruption ... Browse Code »

A va_list needs to be copied in case it needs to be used twice.

Thanks to Hugh for debugging this issue, leading to various panics.

Tested:

lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern

'produce_core' is simply : main() { *(int *)0 = 1;}

lpq84:~# ./produce_core
Segmentation fault (core dumped)
lpq84:~# dmesg | tail -1
[ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed

Notice the last argument was replaced by a NULL (we were lucky enough to
not crash, but do not try this on your production machine !)

After fix :

lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
lpq83:~# ./produce_core
Segmentation fault
lpq83:~# dmesg | tail -1
[ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed

Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
Signed-off-by: Eric Dumazet
Diagnosed-by: Hugh Dickins
Acked-by: Oleg Nesterov
Cc: Neil Horman
Cc: Andrew Morton
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Linus Torvalds

Eric Dumazet
2014-04-20 04:23:31 +0800

24 Jan, 2014

1 commit

942be3875 coredump: make __get_dumpable/get_dumpable inline, kill fs/coredump.h ... Browse Code »

1. Remove fs/coredump.h. It is not clear why do we need it,
it only declares __get_dumpable(), signal.c includes it
for no reason.

2. Now that get_dumpable() and __get_dumpable() are really
trivial make them inline in linux/sched.h.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Alex Kelly
Cc: "Eric W. Biederman"
Cc: Josh Triplett
Cc: Petr Matousek
Cc: Vasily Kulikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-01-24 08:37:01 +0800

16 Nov, 2013

2 commits

52da40ae6 dump_emit(): use __kernel_write(), not vfs_write() ... Browse Code »

the caller has already done file_start_write()...

Signed-off-by: Al Viro

Al Viro
2013-11-16 11:04:09 +0800
db51242d8 dump_align(): fix the dumb braino ... Browse Code »

Mea culpa - original variant used 64-by-32-bit division,
which got caught very late. Getting rid of that wasn't
hard, but I'd managed to botch the calling conventions
in process ;-/

Signed-off-by: Al Viro

Al Viro
2013-11-16 11:04:09 +0800

09 Nov, 2013

5 commits

ec57941e0 constify do_coredump() argument ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:29 +0800
22a8cb824 new helper: dump_align() ... Browse Code »

dump_skip to given alignment...

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:27 +0800
9b56d5438 dump_skip(): dump_seek() replacement taking coredump_params ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:26 +0800
2507a4fbd make dump_emit() use vfs_write() instead of banging at ->f_op->write directly ... Browse Code »

... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:26 +0800
ecc8c7725 new helper: dump_emit() ... Browse Code »

dump_write() analog, takes core_dump_params instead of file,
keeps track of the amount written in cprm->written and checks for
cprm->limit. Start using it in binfmt_elf.c...

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:22 +0800

25 Oct, 2013

1 commit

72c2d5319 file->f_op is never NULL... ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-10-25 11:34:54 +0800

12 Sep, 2013

1 commit

65aafb1e7 coredump: add new %P variable in core_pattern ... Browse Code »

Add a new %P variable to be used in core_pattern. This variable contains
the global PID (PID in the init namespace) as %p contains the PID in the
current namespace which isn't always what we want.

The main use for this is to make it easier to handle crashes that happened
within a container. With that new variables it's possible to have the
crashes dumped into the container or forwarded to the host with the right
PID (from the host's point of view).

Signed-off-by: Stéphane Graber
Reported-by: Hans Feldt
Cc: Alexander Viro
Cc: Eric W. Biederman
Cc: Andy Whitcroft
Acked-by: Serge E. Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stéphane Graber
2013-09-12 06:59:01 +0800

04 Jul, 2013

6 commits

888ffc592 coredump: '% at the end' shouldn't bypass core_uses_pid logic ... Browse Code »

"goto end" should not bypass the "Backward compatibility with
core_uses_pid" code, move this label up.

While at it,

- It is ugly to copy '|' into cn->corename and then inc
the pointer for argv_split().

Change format_corename() to increment pat_ptr instead.

- Remove the dead "if (*pat_ptr == 0)" in format_corename(),
we already checked it is not zero.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Colin Walters
Cc: Denys Vlasenko
Cc: Jiri Slaby
Cc: Lennart Poettering
Cc: Lucas De Marchi
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
3ceadcf6d coredump: kill call_count, add core_name_size ... Browse Code »

Imho, "atomic_t call_count" is ugly and should die. It buys nothing and
in fact it can grow more than necessary, expand doesn't check if it was
already incremented by another task.

Kill it, and introduce "static int core_name_size" updated by
expand_corename(). This is obviously racy too but harmless, and
core_name_size never grows for no reason.

We do not bother to to calculate the "right" new size, we simply do
kmalloc(size_we_need) and use ksize() to rely on kmalloc_index's decision.

Finally change format_corename() to use expand_corename(), krealloc(NULL)
is fine.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Colin Walters
Cc: Denys Vlasenko
Cc: Jiri Slaby
Cc: Lennart Poettering
Cc: Lucas De Marchi
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
923bed030 coredump: kill cn_escape(), introduce cn_esc_printf() ... Browse Code »

The usage of cn_escape() looks really annoying, imho this sequence needs a
wrapper. And it is buggy. If cn_printf() does expand_corename()
cn_escape() writes to the freed memory.

Introduce cn_esc_printf() which hopefully does this all right. It records
the index before cn_vprintf(), not "char *" which is no longer valid (in
general) after krealloc().

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Colin Walters
Cc: Denys Vlasenko
Cc: Jiri Slaby
Cc: Lennart Poettering
Cc: Lucas De Marchi
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
5fe9d8ca2 coredump: cn_vprintf() has no reason to call vsnprintf() twice ... Browse Code »

cn_vprintf() looks really overcomplicated and sub-optimal. We do not need
vsnprintf(NULL) to calculate the size we need, we can simply try to print
into the current buffer and expand/retry only if necessary.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Colin Walters
Cc: Denys Vlasenko
Cc: Jiri Slaby
Cc: Lennart Poettering
Cc: Lucas De Marchi
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
bc03c691a coredump: introduce cn_vprintf() ... Browse Code »

Turn cn_printf(...) into cn_vprintf(va_list args), reintroduce
cn_printf() as a trivial wrapper.

This simplifies the next change and cn_vprintf() will have more
callers.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Colin Walters
Cc: Denys Vlasenko
Cc: Jiri Slaby
Cc: Lennart Poettering
Cc: Lucas De Marchi
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
e7fd1549a coredump: format_corename() can leak cn->corename ... Browse Code »

do_coredump() assumes that format_corename() can only fail if
expand_corename() fails and frees cn->corename. This is not true, for
example cn_print_exe_file() can fail and in this case nobody frees
cn->corename.

Change do_coredump() to always do kfree(cn->corename) after it calls
format_corename() (NULL is fine), change expand_corename() to do nothing
if kmalloc() fails.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Colin Walters
Cc: Denys Vlasenko
Cc: Jiri Slaby
Cc: Lennart Poettering
Cc: Lucas De Marchi
Acked-by: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800

05 May, 2013

1 commit

e86d35c38 do_coredump(): don't wait for thaw if coredump has already been interrupted ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-05-05 02:45:54 +0800

02 May, 2013

1 commit

20b4fb485 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...

Linus Torvalds
2013-05-02 08:51:54 +0800

01 May, 2013

6 commits

dc7ee2aac coredump: change wait_for_dump_helpers() to use wait_event_interruptible() ... Browse Code »

wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
wait_event-like loop. This is not needed and in fact this is not
strictly correct, we can/should do this only once after we change
pipe->writers. We could even check if it becomes zero.

Change this code to use use wait_event_interruptible(), this can also
help to make this wait freezable.

With this patch we check pipe->readers without pipe_lock(), this is
fine. Once we see pipe->readers == 1 we know that the handler
decremented the counter, this is all we need.

Signed-off-by: Oleg Nesterov
Acked-by: Mandeep Singh Baines
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
079148b91 coredump: factor out the setting of PF_DUMPCORE ... Browse Code »

Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
zap_threads() called by do_coredump().

Signed-off-by: Oleg Nesterov
Acked-by: Mandeep Singh Baines
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
528f827ee coredump: introduce dump_interrupted() ... Browse Code »

By discussion with Mandeep.

Change dump_write(), dump_seek() and do_coredump() to check
signal_pending() and abort if it is true. dump_seek() does this only
before f_op->llseek(), otherwise it relies on dump_write().

We need this change to ensure that the coredump won't delay suspend, and
to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
lot of time. In particular this can help oom-killer.

We add the new trivial helper, dump_interrupted() to add the comments and
to simplify the potential freezer changes. Perhaps it will have more
callers.

Ideally it should do try_to_freeze() but then we need the unpleasant
changes in dump_write() and wait_for_dump_helpers(). It is not trivial to
change dump_write() to restart if f_op->write() fails because of
freezing(). We need to handle the short writes, we need to clear
TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
it to check PF_DUMPCORE). And if the buggy f_op->write() sets
TIF_SIGPENDING we can not distinguish this case from the race with
freeze_task() + __thaw_task().

So we simply accept the fact that the freezer can truncate a core-dump but
at least you can reliably suspend. Hopefully we can tolerate this
unlikely case and the necessary complications doesn't worth a trouble.
But if we decide to make the coredumping freezable later we can do this on
top of this change.

Signed-off-by: Oleg Nesterov
Acked-by: Mandeep Singh Baines
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
acdedd99b coredump: sanitize the setting of signal->group_exit_code ... Browse Code »

Now that the coredumping process can be SIGKILL'ed, the setting of
->group_exit_code in do_coredump() can race with complete_signal() and
SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
SIGKILL | 0x80.

But the main problem is that it is not clear to me what should we do if
binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
comes as a separate change.

This patch adds 0x80 if ->core_dump() succeeds and the process was not
killed. But perhaps we can (should?) re-set ->group_exit_code changed by
SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

Signed-off-by: Oleg Nesterov
Tested-by: Mandeep Singh Baines
Cc: Ingo Molnar
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Roland McGrath
Cc: Tejun Heo
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
6cd8f0aca coredump: ensure that SIGKILL always kills the dumping thread ... Browse Code »

prepare_signal() blesses SIGKILL sent to the dumping process but this
signal can be "lost" anyway. The problems is, complete_signal() sees
SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the
dumping process is single-threaded (so the target is always "correct"),
the group-wide SIGKILL is not recorded in task->pending and thus
__fatal_signal_pending() won't be true. A multi-threaded case has even
more problems.

And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
right to me. This coredumping process is not exiting yet, it can do a lot
of work dumping the core.

With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
signal->group_exit_task instead. This makes signal_group_exit() true and
thus this should equally close the races with exit/exec/stop but allows to
kill the dumping thread reliably.

Notes:
- It is not clear what should we do with ->group_exit_code
if the dumper was killed, see the next change.

- we need more (hopefully straightforward) changes to ensure
that SIGKILL actually interrupts the coredump. Basically we
need to check __fatal_signal_pending() in dump_write() and
dump_seek().

Signed-off-by: Oleg Nesterov
Tested-by: Mandeep Singh Baines
Cc: Ingo Molnar
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Roland McGrath
Cc: Tejun Heo
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800
403bad72b coredump: only SIGKILL should interrupt the coredumping task ... Browse Code »

There are 2 well known and ancient problems with coredump/signals, and a
lot of related bug reports:

- do_coredump() clears TIF_SIGPENDING but of course this can't help
if, say, SIGCHLD comes after that.

In this case the coredump can fail unexpectedly. See for example
wait_for_dump_helper()->signal_pending() check but there are other
reasons.

- At the same time, dumping a huge core on the slow media can take a
lot of time/resources and there is no way to kill the coredumping
task reliably. In particular this is not oom_kill-friendly.

This patch tries to fix the 1st problem, and makes the preparation for the
next changes.

We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
that this process dumps the core. prepare_signal() checks this flag and
nacks any signal except SIGKILL.

Note that this check tries to be conservative, in the long term we should
probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
discussion. See marc.info/?l=linux-kernel&m=120508897917439

Notes:
- recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
The patch assumes that dump_write/etc paths should never
call it, but we can change it as well.

- There is another source of TIF_SIGPENDING, freezer. This
will be addressed separately.

Signed-off-by: Oleg Nesterov
Tested-by: Mandeep Singh Baines
Cc: Ingo Molnar
Cc: Neil Horman
Cc: "Rafael J. Wysocki"
Cc: Roland McGrath
Cc: Tejun Heo
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:06 +0800