Eric Lee / smarc-fsl-linux-kernel

21 Dec, 2012

3 commits

4c9a44aeb Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.

That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."

* emailed patches from Andrew Morton : (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...

Linus Torvalds
2012-12-21 12:00:43 +0800
54d46ea99 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.

Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure

Linus Torvalds
2012-12-21 10:05:28 +0800
b66c59840 exec: do not leave bprm->interp on stack ... Browse Code »

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook
Cc: halfdog
Cc: P J P
Cc: Alexander Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-12-21 09:40:19 +0800

20 Dec, 2012

1 commit

ae903caae Bury the conditionals from kernel_thread/kernel_execve series ... Browse Code »

All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:38 +0800

18 Dec, 2012

3 commits

848b81415 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc patches from Andrew Morton:
"Incoming:

- lots of misc stuff

- backlight tree updates

- lib/ updates

- Oleg's percpu-rwsem changes

- checkpatch

- rtc

- aoe

- more checkpoint/restart support

I still have a pile of MM stuff pending - Pekka should be merging
later today after which that is good to go. A number of other things
are twiddling thumbs awaiting maintainer merges."

* emailed patches from Andrew Morton : (180 commits)
scatterlist: don't BUG when we can trivially return a proper error.
docs: update documentation about /proc//fdinfo/ fanotify output
fs, fanotify: add @mflags field to fanotify output
docs: add documentation about /proc//fdinfo/ output
fs, notify: add procfs fdinfo helper
fs, exportfs: add exportfs_encode_inode_fh() helper
fs, exportfs: escape nil dereference if no s_export_op present
fs, epoll: add procfs fdinfo helper
fs, eventfd: add procfs fdinfo helper
procfs: add ability to plug in auxiliary fdinfo providers
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
breakpoint selftests: print failure status instead of cause make error
kcmp selftests: print fail status instead of cause make error
kcmp selftests: make run_tests fix
mem-hotplug selftests: print failure status instead of cause make error
cpu-hotplug selftests: print failure status instead of cause make error
mqueue selftests: print failure status instead of cause make error
vm selftests: print failure status instead of cause make error
ubifs: use prandom_bytes
mtd: nandsim: use prandom_bytes
...

Linus Torvalds
2012-12-18 12:58:12 +0800
d74026986 exec: use -ELOOP for max recursion depth ... Browse Code »

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

if (cmd != path_bshell && errno == ENOEXEC) {
*argv-- = cmd;
*argv = cmd = path_bshell;
goto repeat;
}

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook
Cc: halfdog
Cc: P J P
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-12-18 09:15:23 +0800
6a2b60b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace changes from Eric Biederman:
"While small this set of changes is very significant with respect to
containers in general and user namespaces in particular. The user
space interface is now complete.

This set of changes adds support for unprivileged users to create user
namespaces and as a user namespace root to create other namespaces.
The tyranny of supporting suid root preventing unprivileged users from
using cool new kernel features is broken.

This set of changes completes the work on setns, adding support for
the pid, user, mount namespaces.

This set of changes includes a bunch of basic pid namespace
cleanups/simplifications. Of particular significance is the rework of
the pid namespace cleanup so it no longer requires sending out
tendrils into all kinds of unexpected cleanup paths for operation. At
least one case of broken error handling is fixed by this cleanup.

The files under /proc//ns/ have been converted from regular files
to magic symlinks which prevents incorrect caching by the VFS,
ensuring the files always refer to the namespace the process is
currently using and ensuring that the ptrace_mayaccess permission
checks are always applied.

The files under /proc//ns/ have been given stable inode numbers
so it is now possible to see if different processes share the same
namespaces.

Through the David Miller's net tree are changes to relax many of the
permission checks in the networking stack to allowing the user
namespace root to usefully use the networking stack. Similar changes
for the mount namespace and the pid namespace are coming through my
tree.

Two small changes to add user namespace support were commited here adn
in David Miller's -net tree so that I could complete the work on the
/proc//ns/ files in this tree.

Work remains to make it safe to build user namespaces and 9p, afs,
ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
Kconfig guard remains in place preventing that user namespaces from
being built when any of those filesystems are enabled.

Future design work remains to allow root users outside of the initial
user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
proc: Usable inode numbers for the namespace file descriptors.
proc: Fix the namespace inode permission checks.
proc: Generalize proc inode allocation
userns: Allow unprivilged mounts of proc and sysfs
userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
procfs: Print task uids and gids in the userns that opened the proc file
userns: Implement unshare of the user namespace
userns: Implent proc namespace operations
userns: Kill task_user_ns
userns: Make create_new_namespaces take a user_ns parameter
userns: Allow unprivileged use of setns.
userns: Allow unprivileged users to create new namespaces
userns: Allow setting a userns mapping to your current uid.
userns: Allow chown and setgid preservation
userns: Allow unprivileged users to create user namespaces.
userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
userns: fix return value on mntns_install() failure
vfs: Allow unprivileged manipulation of the mount namespace.
vfs: Only support slave subtrees across different user namespaces
vfs: Add a user namespace reference from struct mnt_namespace
...

Linus Torvalds
2012-12-18 07:44:47 +0800

29 Nov, 2012

5 commits

71613c3b8 get rid of pt_regs argument of ->load_binary() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:38 +0800
3c456bfc4 get rid of pt_regs argument of search_binary_handler() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:38 +0800
835ab32df get rid of pt_regs argument of do_execve_common() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:37 +0800
da3d4c5fa get rid of pt_regs argument of do_execve() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:37 +0800
d03d26e58 make compat_do_execve() static, lose pt_regs argument ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:37 +0800

19 Nov, 2012

1 commit

3cdf5b45f userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped ... Browse Code »

When performing an exec where the binary lives in one user namespace and
the execing process lives in another usre namespace there is the possibility
that the target uids can not be represented.

Instead of failing the exec simply ignore the suid/sgid bits and run
the binary with lower privileges. We already do this in the case
of MNT_NOSUID so this should be a well tested code path.

As the user and group are not changed this should not introduce any
security issues.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-11-19 21:59:23 +0800

26 Oct, 2012

1 commit

b40a79591 freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD ... Browse Code »

flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo
Cc: stable@vger.kernel.org
Signed-off-by: Rafael J. Wysocki

Oleg Nesterov
2012-10-26 04:28:12 +0800

13 Oct, 2012

2 commits

669abf4e5 vfs: make path_openat take a struct filename pointer ... Browse Code »

...and fix up the callers. For do_file_open_root, just declare a
struct filename on the stack and fill out the .name field. For
do_filp_open, make it also take a struct filename pointer, and fix up its
callers to call it appropriately.

For filp_open, add a variant that takes a struct filename pointer and turn
filp_open into a wrapper around it.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:09 +0800
91a27b2a7 vfs: define struct filename and have getname() return it ... Browse Code »

getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:14:55 +0800

10 Oct, 2012

1 commit

42859eea9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull generic execve() changes from Al Viro:
"This introduces the generic kernel_thread() and kernel_execve()
functions, and switches x86, arm, alpha, um and s390 over to them."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
s390: convert to generic kernel_execve()
s390: switch to generic kernel_thread()
s390: fold kernel_thread_helper() into ret_from_fork()
s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
um: switch to generic kernel_thread()
x86, um/x86: switch to generic sys_execve and kernel_execve
x86: split ret_from_fork
alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
alpha: switch to generic kernel_thread()
alpha: switch to generic sys_execve()
arm: get rid of execve wrapper, switch to generic execve() implementation
arm: optimized current_pt_regs()
arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
generic sys_execve()
generic kernel_execve()
new helper: current_pt_regs()
preparation for generic kernel_thread()
um: kill thread->forking
um: let signal_delivered() do SIGTRAP on singlestepping into handler
...

Linus Torvalds
2012-10-10 11:02:25 +0800

09 Oct, 2012

2 commits

38a76013a mm: avoid taking rmap locks in move_ptes() ... Browse Code »

During mremap(), the destination VMA is generally placed after the
original vma in rmap traversal order: in move_vma(), we always have
new_pgoff >= vma->vm_pgoff, and as a result new_vma->vm_pgoff >=
vma->vm_pgoff unless vma_merge() merged the new vma with an adjacent one.

When the destination VMA is placed after the original in rmap traversal
order, we can avoid taking the rmap locks in move_ptes().

Essentially, this reintroduces the optimization that had been disabled in
"mm anon rmap: remove anon_vma_moveto_tail". The difference is that we
don't try to impose the rmap traversal order; instead we just rely on
things being in the desired order in the common case and fall back to
taking locks in the uncommon case. Also we skip the i_mmap_mutex in
addition to the anon_vma lock: in both cases, the vmas are traversed in
increasing vm_pgoff order with ties resolved in tree insertion order.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:42 +0800
d5bbd43d5 exec: make de_thread() killable ... Browse Code »

Change de_thread() to use KILLABLE rather than UNINTERRUPTIBLE while
waiting for other threads. The only complication is that we should
clear ->group_exit_task and ->notify_count before we return, and we
should do this under tasklist_lock. -EAGAIN is used to match the
initial signal_group_exit() check/return, it doesn't really matter.

This fixes the (unlikely) race with coredump. de_thread() checks
signal_group_exit() before it starts to kill the subthreads, but this
can't help if another CLONE_VM (but non CLONE_THREAD) task starts the
coredumping after de_thread() unlocks ->siglock. In this case the
killed sub-thread can block in exit_mm() waiting for coredump_finish(),
execing thread waits for that sub-thead, and the coredumping thread
waits for execing thread. Deadlock.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-10-09 05:53:20 +0800

06 Oct, 2012

2 commits

0f4cfb2e4 coredump: use SUID_DUMPABLE_ENABLED rather than hardcoded 1 ... Browse Code »

Cosmetic. Change setup_new_exec() and task_dumpable() to use
SUID_DUMPABLE_ENABLED for /bin/grep.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-10-06 02:05:16 +0800
179899fd5 coredump: update coredump-related headers ... Browse Code »

Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c. It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.

Signed-off-by: Alex Kelly
Reviewed-by: Josh Triplett
Acked-by: Serge Hallyn
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Kelly
2012-10-06 02:05:15 +0800

03 Oct, 2012

1 commit

10c28d937 coredump: move core dump functionality into its own file ... Browse Code »

This prepares for making core dump functionality optional.

The variable "suid_dumpable" and associated functions are left in fs/exec.c
because they're used elsewhere, such as in ptrace.

Signed-off-by: Alex Kelly
Reviewed-by: Josh Triplett
Acked-by: Serge Hallyn
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Alex Kelly
2012-10-03 09:35:55 +0800

01 Oct, 2012

2 commits

38b983b34 generic sys_execve() ... Browse Code »

Selected by __ARCH_WANT_SYS_EXECVE in unistd.h. Requires
* working current_pt_regs()
* *NOT* doing a syscall-in-kernel kind of kernel_execve()
implementation. Using generic kernel_execve() is fine.

Signed-off-by: Al Viro

Al Viro
2012-10-01 10:20:51 +0800
282124d18 generic kernel_execve() ... Browse Code »

based mostly on arm and alpha versions. Architectures can define
__ARCH_WANT_KERNEL_EXECVE and use it, provided that
* they have working current_pt_regs(), even for kernel threads.
* kernel_thread-spawned threads do have space for pt_regs
in the normal location. Normally that's as simple as switching to
generic kernel_thread() and making sure that kernel threads do *not*
go through return from syscall path; call the payload from equivalent
of ret_from_fork if we are in a kernel thread (or just have separate
ret_from_kernel_thread and make copy_thread() use it instead of
ret_from_fork in kernel thread case).
* they have ret_from_kernel_execve(); it is called after
successful do_execve() done by kernel_execve() and gets normal
pt_regs location passed to it as argument. It's essentially
a longjmp() analog - it should set sp, etc. to the situation
expected at the return for syscall and go there. Eventually
the need for that sucker will disappear, but that'll take some
surgery on kernel_thread() payloads.

Signed-off-by: Al Viro

Al Viro
2012-10-01 01:36:39 +0800

27 Sep, 2012

3 commits

179e037fc do_coredump(): make sure that descriptor table isn't shared ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:09:59 +0800
8280d1617 new helper: replace_fd() ... Browse Code »

analog of dup2(), except that it takes struct file * as source.

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:09:57 +0800
6a6d27de3 take close-on-exec logics to fs/file.c, clean it up a bit ... Browse Code »

... and add cond_resched() there, while we are at it. We can
get large latencies as is...

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:09:56 +0800

20 Sep, 2012

1 commit

826eba4db the only place that needs to include asm/exec.h is linux/binfmts.h ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-20 21:51:13 +0800

02 Aug, 2012

1 commit

a0e881b7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.

Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...

Linus Torvalds
2012-08-02 01:26:23 +0800

31 Jul, 2012

3 commits

108ceeb02 coredump: fix wrong comments on core limits of pipe coredump case ... Browse Code »

In commit 898b374af6f7 ("exec: replace call_usermodehelper_pipe with use
of umh init function and resolve limit"), the core limits recursive
check value was changed from 0 to 1, but the corresponding comments were
not updated.

Signed-off-by: Jovi Zhang
Cc: Oleg Nesterov
Cc: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jovi Zhang
2012-07-31 08:25:20 +0800
54b501992 coredump: warn about unsafe suid_dumpable / core_pattern combo ... Browse Code »

When suid_dumpable=2, detect unsafe core_pattern settings and warn when
they are seen.

Signed-off-by: Kees Cook
Suggested-by: Andrew Morton
Cc: Alexander Viro
Cc: Alan Cox
Cc: "Eric W. Biederman"
Cc: Doug Ledford
Cc: Serge Hallyn
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-07-31 08:25:11 +0800
9520628e8 fs: make dumpable=2 require fully qualified path ... Browse Code »

When the suid_dumpable sysctl is set to "2", and there is no core dump
pipe defined in the core_pattern sysctl, a local user can cause core files
to be written to root-writable directories, potentially with
user-controlled content.

This means an admin can unknowningly reintroduce a variation of
CVE-2006-2451, allowing local users to gain root privileges.

$ cat /proc/sys/fs/suid_dumpable
2
$ cat /proc/sys/kernel/core_pattern
core
$ ulimit -c unlimited
$ cd /
$ ls -l core
ls: cannot access core: No such file or directory
$ touch core
touch: cannot touch `core': Permission denied
$ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 &
$ pid=$!
$ sleep 1
$ kill -SEGV $pid
$ ls -l core
-rw------- 1 root kees 458752 Jun 21 11:35 core
$ sudo strings core | grep evil
OHAI=evil-string-here

While cron has been fixed to abort reading a file when there is any
parse error, there are still other sensitive directories that will read
any file present and skip unparsable lines.

Instead of introducing a suid_dumpable=3 mode and breaking all users of
mode 2, this only disables the unsafe portion of mode 2 (writing to disk
via relative path). Most users of mode 2 (e.g. Chrome OS) already use
a core dump pipe handler, so this change will not break them. For the
situations where a pipe handler is not defined but mode 2 is still
active, crash dumps will only be written to fully qualified paths. If a
relative path is defined (e.g. the default "core" pattern), dump
attempts will trigger a printk yelling about the lack of a fully
qualified path.

Signed-off-by: Kees Cook
Cc: Alexander Viro
Cc: Alan Cox
Cc: "Eric W. Biederman"
Cc: Doug Ledford
Cc: Serge Hallyn
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-07-31 08:25:11 +0800

30 Jul, 2012

1 commit

e4fad8e5d consolidate pipe file creation ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:19 +0800

27 Jul, 2012

1 commit

8ded2bbc1 posix_types.h: Cleanup stale __NFDBITS and related definitions ... Browse Code »

Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
after including . A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.

It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely. The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines. Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f5f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.

Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.

Reported-by: Jeff Law
Suggested-by: Linus Torvalds
CC:
Signed-off-by: Josh Boyer
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds

Josh Boyer
2012-07-27 04:36:43 +0800

21 Jun, 2012

1 commit

4fe7efdbd mm: correctly synchronize rss-counters at exit/exec ... Browse Code »

do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task->rss_stat and thus make
mm->rss_stat inconsistent. This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf
Signed-off-by: Konstantin Khlebnikov
Cc: Oleg Nesterov
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-06-21 05:39:36 +0800

08 Jun, 2012

2 commits

48d212a2e Revert "mm: correctly synchronize rss-counters at exit/exec" ... Browse Code »

This reverts commit 40af1bbdca47e5c8a2044039bb78ca8fd8b20f94.

It's horribly and utterly broken for at least the following reasons:

- calling sync_mm_rss() from mmput() is fundamentally wrong, because
there's absolutely no reason to believe that the task that does the
mmput() always does it on its own VM. Example: fork, ptrace, /proc -
you name it.

- calling it *after* having done mmdrop() on it is doubly insane, since
the mm struct may well be gone now.

- testing mm against NULL before you call it is insane too, since a
NULL mm there would have caused oopses long before.

.. and those are just the three bugs I found before I decided to give up
looking for me and revert it asap. I should have caught it before I
even took it, but I trusted Andrew too much.

Cc: Konstantin Khlebnikov
Cc: Markus Trippelsdorf
Cc: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Cc: Oleg Nesterov
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-06-08 08:54:07 +0800
40af1bbdc mm: correctly synchronize rss-counters at exit/exec ... Browse Code »
43

mm->rss_stat counters have per-task delta: task->rss_stat. Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats,
audit and other stuff. Unfortunately the kernel does this before
calling mm_release(), which can call put_user() for processing
task->clear_child_tid. So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier. After mm_release() there should
be no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov
Reported-by: Markus Trippelsdorf
Cc: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Cc: Oleg Nesterov
Cc: [3.4.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-06-08 05:43:55 +0800

01 Jun, 2012

1 commit

e5467859f split ->file_mmap() into ->mmap_addr()/->mmap_file() ... Browse Code »

... i.e. file-dependent and address-dependent checks.

Signed-off-by: Al Viro

Al Viro
2012-06-01 01:11:54 +0800

24 May, 2012

2 commits

644473e9c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.

Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.

- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.

- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.

- With the user namespaces compiled out the performance is as good or
better than it is today.

- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.

- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.

- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.

- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.

My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...

Linus Torvalds
2012-05-24 08:42:39 +0800
ec0d7f18a Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull fpu state cleanups from Ingo Molnar:
"This tree streamlines further aspects of FPU handling by eliminating
the prepare_to_copy() complication and moving that logic to
arch_dup_task_struct().

It also fixes the FPU dumps in threaded core dumps, removes and old
(and now invalid) assumption plus micro-optimizes the exit path by
avoiding an FPU save for dead tasks."

Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu: drop the fpu state during thread exit
x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
coredump: ensure the fpu state is flushed for proper multi-threaded core dump
fork: move the real prepare_to_copy() users to arch_dup_task_struct()

Linus Torvalds
2012-05-24 01:59:07 +0800