Eric Lee / smarc-fsl-linux-kernel

22 Nov, 2013

1 commit

3eaded86a Merge git://git.infradead.org/users/eparis/audit ... Browse Code »

Pull audit updates from Eric Paris:
"Nothing amazing. Formatting, small bug fixes, couple of fixes where
we didn't get records due to some old VFS changes, and a change to how
we collect execve info..."

Fixed conflict in fs/exec.c as per Eric and linux-next.

* git://git.infradead.org/users/eparis/audit: (28 commits)
audit: fix type of sessionid in audit_set_loginuid()
audit: call audit_bprm() only once to add AUDIT_EXECVE information
audit: move audit_aux_data_execve contents into audit_context union
audit: remove unused envc member of audit_aux_data_execve
audit: Kill the unused struct audit_aux_data_capset
audit: do not reject all AUDIT_INODE filter types
audit: suppress stock memalloc failure warnings since already managed
audit: log the audit_names record type
audit: add child record before the create to handle case where create fails
audit: use given values in tty_audit enable api
audit: use nlmsg_len() to get message payload length
audit: use memset instead of trying to initialize field by field
audit: fix info leak in AUDIT_GET requests
audit: update AUDIT_INODE filter rule to comparator function
audit: audit feature to set loginuid immutable
audit: audit feature to only allow unsetting the loginuid
audit: allow unsetting the loginuid (with priv)
audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
audit: loginuid functions coding style
selinux: apply selinux checks on new audit message types
...

Linus Torvalds
2013-11-22 11:18:14 +0800

13 Nov, 2013

3 commits

5cbb3d216 Merge branch 'akpm' (patches from Andrew Morton) ... Browse Code »

Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:

- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"

* emailed patches from Andrew Morton : (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...

Linus Torvalds
2013-11-13 14:45:43 +0800
9bc9ccd7d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:

- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes

plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...

Linus Torvalds
2013-11-13 14:34:18 +0800
d049f74f2 exec/ptrace: fix get_dumpable() incorrect tests ... Browse Code »

The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.

Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }

Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }

Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)

The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.

CVE-2013-2929

Reported-by: Vasily Kulikov
Signed-off-by: Kees Cook
Cc: "Luck, Tony"
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-11-13 11:09:33 +0800

06 Nov, 2013

1 commit

9410d228a audit: call audit_bprm() only once to add AUDIT_EXECVE information ... Browse Code »

Move the audit_bprm() call from search_binary_handler() to exec_binprm(). This
allows us to get rid of the mm member of struct audit_aux_data_execve since
bprm->mm will equal current->mm.

This also mitigates the issue that ->argc could be modified by the
load_binary() call in search_binary_handler().

audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
context every time search_binary_handler() was recursively called. Only one
reference is necessary.

Reported-by: Oleg Nesterov
Cc: Eric Paris
Signed-off-by: Richard Guy Briggs
Signed-off-by: Eric Paris
---
This patch is against 3.11, but was developed on Oleg's post-3.11 patches that
introduce exec_binprm().

Richard Guy Briggs
2013-11-06 00:15:03 +0800

25 Oct, 2013

1 commit

72c2d5319 file->f_op is never NULL... ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-10-25 11:34:54 +0800

09 Oct, 2013

1 commit

82727018b sched/numa: Call task_numa_free() from do_execve() ... Browse Code »

It is possible for a task in a numa group to call exec, and
have the new (unrelated) executable inherit the numa group
association from its former self.

This has the potential to break numa grouping, and is trivial
to fix.

Signed-off-by: Rik van Riel
Signed-off-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Srikar Dronamraju
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar

Rik van Riel
2013-10-09 20:48:00 +0800

12 Sep, 2013

9 commits

6b3c538f5 exec: cleanup the error handling in search_binary_handler() ... Browse Code »

The error hanling and ret-from-loop look confusing and inconsistent.

- "retval >= 0" simply returns

- "!bprm->file" returns too but with read_unlock() because
binfmt_lock was already re-acquired

- "retval != -ENOEXEC || bprm->mm == NULL" does "break" and
relies on the same check after the main loop

Consolidate these checks into a single if/return statement.

need_retry still checks "retval == -ENOEXEC", but this and -ENOENT before
the main loop are not needed. This is only for pathological and
impossible list_empty(&formats) case.

It is not clear why do we check "bprm->mm == NULL", probably this
should be removed.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:09 +0800
4e0621a07 exec: don't retry if request_module() fails ... Browse Code »

A separate one-liner for better documentation.

It doesn't make sense to retry if request_module() fails to exec
/sbin/modprobe, add the additional "request_module() < 0" check.

However, this logic still doesn't look exactly right:

1. It would be better to check "request_module() != 0", the user
space modprobe process should report the correct exit code.
But I didn't dare to add the user-visible change.

2. The whole ENOEXEC logic looks suboptimal. Suppose that we try
to exec a "#!path-to-unsupported-binary" script. In this case
request_module() + "retry" will be done twice: first by the
"depth == 1" code, and then again by the "depth == 0" caller
which doesn't make sense.

3. And note that in the case above bprm->buf was already changed
by load_script()->prepare_binprm(), so this looks even more
ugly.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:07 +0800
cb7b6b1cb exec: cleanup the CONFIG_MODULES logic ... Browse Code »

search_binary_handler() uses "for (try=0; try
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:05 +0800
92eaa565a exec: kill ->load_binary != NULL check in search_binary_handler() ... Browse Code »

search_binary_handler() checks ->load_binary != NULL for no reason, this
method should be always defined. Turn this check into WARN_ON() and move
it into __register_binfmt().

Also, kill the function pointer. The current code looks confusing, as if
->load_binary can go away after read_unlock(&binfmt_lock). But we rely on
module_get(fmt->module), this fmt can't be changed or unregistered,
otherwise this code is buggy anyway.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:05 +0800
52f14282b exec: move allow_write_access/fput to exec_binprm() ... Browse Code »

When search_binary_handler() succeeds it does allow_write_access() and
fput(), then it clears bprm->file to ensure the caller will not do the
same.

We can simply move this code to exec_binprm() which is called only once.
In fact we could move this to free_bprm() and remove the same code in
do_execve_common's error path.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:05 +0800
9beb266f2 exec: proc_exec_connector() should be called only once ... Browse Code »

A separate one-liner with the minor fix.

PROC_EVENT_EXEC reports the "exec" event, but this message is sent at
least twice if search_binary_handler() is called by ->load_binary()
recursively, say, load_script().

Move it to exec_binprm(), this is "depth == 0" code too.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:05 +0800
131b2f9f1 exec: kill "int depth" in search_binary_handler() ... Browse Code »

Nobody except search_binary_handler() should touch ->recursion_depth, "int
depth" buys nothing but complicates the code, kill it.

Probably we should also kill "fn" and the !NULL check, ->load_binary
should be always defined. And it can not go away after read_unlock() or
this code is buggy anyway.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:04 +0800
5d1baf3b6 exec: introduce exec_binprm() for "depth == 0" code ... Browse Code »

task_pid_nr_ns() and trace/ptrace code in the middle of the recursive
search_binary_handler() looks confusing and imho annoying. We only need
this code if "depth == 0", lets add a simple helper which calls
search_binary_handler() and does trace_sched_process_exec() +
ptrace_event().

The patch also moves the setting of task->did_exec, we need to do this
only once.

Note: we can kill either task->did_exec or PF_FORKNOEXEC.

Signed-off-by: Oleg Nesterov
Acked-by: Kees Cook
Cc: Al Viro
Cc: Evgeniy Polyakov
Cc: Zach Levis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:59:03 +0800
d9104d1ca mm: track vma changes with VM_SOFTDIRTY bit ... Browse Code »

Pavel reported that in case if vma area get unmapped and then mapped (or
expanded) in-place, the soft dirty tracker won't be able to recognize this
situation since it works on pte level and ptes are get zapped on unmap,
loosing soft dirty bit of course.

So to resolve this situation we need to track actions on vma level, there
VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded)
we set this bit, and keep it here until application calls for clearing
soft dirty bit.

Thus when user space application track memory changes now it can detect if
vma area is renewed.

Reported-by: Pavel Emelyanov
Signed-off-by: Cyrill Gorcunov
Cc: Andy Lutomirski
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Cc: Peter Zijlstra
Cc: "Aneesh Kumar K.V"
Cc: Rob Landley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2013-09-12 06:57:56 +0800

16 Aug, 2013

1 commit

2b047252d Fix TLB gather virtual address range invalidation corner cases ... Browse Code »

Ben Tebulin reported:

"Since v3.7.2 on two independent machines a very specific Git
repository fails in 9/10 cases on git-fsck due to an SHA1/memory
failures. This only occurs on a very specific repository and can be
reproduced stably on two independent laptops. Git mailing list ran
out of ideas and for me this looks like some very exotic kernel issue"

and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

That commit itself is not actually buggy, but what it does is to make it
much more likely to hit the partial TLB invalidation case, since it
introduces a new case in tlb_next_batch() that previously only ever
happened when running out of memory.

The real bug is that the TLB gather virtual memory range setup is subtly
buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
enable tlb flush range in generic mmu_gather"), and the range handling
was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
range flushed when __tlb_remove_page() runs out of slots"), but that fix
was not complete.

The problem with the TLB gather virtual address range is that it isn't
set up by the initial tlb_gather_mmu() initialization (which didn't get
the TLB range information), but it is set up ad-hoc later by the
functions that actually flush the TLB. And so any such case that forgot
to update the TLB range entries would potentially miss TLB invalidates.

Rather than try to figure out exactly which particular ad-hoc range
setup was missing (I personally suspect it's the hugetlb case in
zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
did), this patch just gets rid of the problem at the source: make the
TLB range information available to tlb_gather_mmu(), and initialize it
when initializing all the other tlb gather fields.

This makes the patch larger, but conceptually much simpler. And the end
result is much more understandable; even if you want to play games with
partial ranges when invalidating the TLB contents in chunks, now the
range information is always there, and anybody who doesn't want to
bother with it won't introduce subtle bugs.

Ben verified that this fixes his problem.

Reported-bisected-and-tested-by: Ben Tebulin
Build-testing-by: Stephen Rothwell
Build-testing-by: Richard Weinberger
Reviewed-by: Michal Hocko
Acked-by: Peter Zijlstra
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-08-16 23:52:46 +0800

04 Jul, 2013

4 commits

266b7a021 fs/exec.c:de_thread: mt-exec should update ->real_start_time ... Browse Code »

924b42d5 ("Use boot based time for process start time and boot time in
/proc") updated copy_process/do_task_stat but forgot about de_thread().
This breaks "ps axOT" if a sub-thread execs.

Note: I think that task->start_time should die.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: John Stultz
Cc: Tomas Janousek
Cc: Tomas Smetana
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:03 +0800
bd9d43f47 fs/exec.c: do_execve_common(): use current_user() ... Browse Code »

Trivial cleanup. do_execve_common() can use current_user() and avoid the
unnecessary "struct cred *cred" var.

Signed-off-by: Oleg Nesterov
Cc: Vasiliy Kulikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:03 +0800
3f4185483 fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pid ... Browse Code »

de_thread() can use change_pid() instead of detach + attach. This looks
better and this ensures that, say, next_thread() can never see a task with
->pid == NULL.

Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Michal Hocko
Cc: Pavel Emelyanov
Cc: Sergey Dyasly
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
790eac564 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second set of VFS changes from Al Viro:
"Assorted f_pos race fixes, making do_splice_direct() safe to call with
i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
->d_hash/->d_compare calling conventions changes from Linus, misc
stuff all over the place."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
Document ->tmpfile()
ext4: ->tmpfile() support
vfs: export lseek_execute() to modules
lseek_execute() doesn't need an inode passed to it
block_dev: switch to fixed_size_llseek()
cpqphp_sysfs: switch to fixed_size_llseek()
tile-srom: switch to fixed_size_llseek()
proc_powerpc: switch to fixed_size_llseek()
ubi/cdev: switch to fixed_size_llseek()
pci/proc: switch to fixed_size_llseek()
isapnp: switch to fixed_size_llseek()
lpfc: switch to fixed_size_llseek()
locks: give the blocked_hash its own spinlock
locks: add a new "lm_owner_key" lock operation
locks: turn the blocked_list into a hashtable
locks: convert fl_link to a hlist_node
locks: avoid taking global lock if possible when waking up blocked waiters
locks: protect most of the file_lock handling with i_lock
locks: encapsulate the fl_link list handling
locks: make "added" in __posix_lock_file a bool
...

Linus Torvalds
2013-07-04 00:10:19 +0800

29 Jun, 2013

1 commit

f9652e10c allow build_open_flags() to return an error ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:57:09 +0800

26 Jun, 2013

1 commit

2976b10f0 perf: Disable monitoring on setuid processes for regular users ... Browse Code »

There was a a bug in setup_new_exec(), whereby
the test to disabled perf monitoring was not
correct because the new credentials for the
process were not yet committed and therefore
the get_dumpable() test was never firing.

The patch fixes the problem by moving the
perf_event test until after the credentials
are committed.

Signed-off-by: Stephane Eranian
Tested-by: Jiri Olsa
Acked-by: Peter Zijlstra
Cc:
Signed-off-by: Ingo Molnar

Stephane Eranian
2013-06-26 17:40:18 +0800

02 May, 2013

1 commit

20b4fb485 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...

Linus Torvalds
2013-05-02 08:51:54 +0800

01 May, 2013

2 commits

e56fb2874 exec: do not abuse ->cred_guard_mutex in threadgroup_lock() ... Browse Code »

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones
Acked-by: Tejun Heo
Acked-by: Li Zefan
Signed-off-by: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:07 +0800
12eaaf309 set_task_comm: kill the pointless memset() + wmb() ... Browse Code »

set_task_comm() does memset() + wmb() before strlcpy(). This buys
nothing and to add to the confusion, the comment is wrong.

- We do not need memset() to be "safe from non-terminating string
reads", the final char is always zero and we never change it.

- wmb() is paired with nothing, it cannot prevent from printing
the mixture of the old/new data unless the reader takes the lock.

Signed-off-by: Oleg Nesterov
Cc: Andi Kleen
Cc: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-05-01 08:04:07 +0800

30 Apr, 2013

2 commits

6ee8630e0 mm: allow arch code to control the user page table ceiling ... Browse Code »

On architectures where a pgd entry may be shared between user and kernel
(e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0.
This patch introduces a generic USER_PGTABLES_CEILING that arch code can
override. It is the responsibility of the arch code setting the ceiling
to ensure the complete freeing of the page tables (usually in
pgd_free()).

[catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
Signed-off-by: Hugh Dickins
Signed-off-by: Catalin Marinas
Cc: Russell King
Cc: [3.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-04-30 06:54:34 +0800
3dc20cb28 new helper: read_code() ... Browse Code »

switch binfmts that use ->read() to that (and to kernel_read()
in several cases in binfmt_flat - sure, it's nommu, but still,
doing ->read() into kmalloc'ed buffer...)

Signed-off-by: Al Viro

Al Viro
2013-04-30 03:40:23 +0800

28 Feb, 2013

1 commit

e579d2c25 coredump: remove redundant defines for dumpable states ... Browse Code »

The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
defines introduced in 54b501992dd2 ("coredump: warn about unsafe
suid_dumpable / core_pattern combo"). Remove the new ones, and use the
prior values instead.

Signed-off-by: Kees Cook
Reported-by: Chen Gang
Cc: Alexander Viro
Cc: Alan Cox
Cc: "Eric W. Biederman"
Cc: Doug Ledford
Cc: Serge Hallyn
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-02-28 11:10:11 +0800

26 Feb, 2013

1 commit

9cc64ceaa fs/exec.c: make bprm_mm_init() static ... Browse Code »

There is only one user of bprm_mm_init, and it's inside the same file.

Signed-off-by: Yuanhan Liu
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Yuanhan Liu
2013-02-26 15:46:13 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

12 Jan, 2013

1 commit

6d92d4f6a fs/exec.c: work around icc miscompilation ... Browse Code »

The tricky problem is this check:

if (i++ >= max)

icc (mis)optimizes this check as:

if (++i > max)

The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).

This is "allowed" by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior. This
optimization effectively reverts the previous commit 362e6663ef23
("exec.c, compat.c: fix count(), compat_count() bounds checking") that
tries to fix the check.

This patch simply moves ++ after the check.

Signed-off-by: Xi Wang
Cc: Jason Baron
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xi Wang
2013-01-12 06:54:55 +0800

21 Dec, 2012

3 commits

4c9a44aeb Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.

That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."

* emailed patches from Andrew Morton : (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...

Linus Torvalds
2012-12-21 12:00:43 +0800
54d46ea99 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.

Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure

Linus Torvalds
2012-12-21 10:05:28 +0800
b66c59840 exec: do not leave bprm->interp on stack ... Browse Code »

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook
Cc: halfdog
Cc: P J P
Cc: Alexander Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-12-21 09:40:19 +0800

20 Dec, 2012

1 commit

ae903caae Bury the conditionals from kernel_thread/kernel_execve series ... Browse Code »

All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:38 +0800

18 Dec, 2012

3 commits

848b81415 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc patches from Andrew Morton:
"Incoming:

- lots of misc stuff

- backlight tree updates

- lib/ updates

- Oleg's percpu-rwsem changes

- checkpatch

- rtc

- aoe

- more checkpoint/restart support

I still have a pile of MM stuff pending - Pekka should be merging
later today after which that is good to go. A number of other things
are twiddling thumbs awaiting maintainer merges."

* emailed patches from Andrew Morton : (180 commits)
scatterlist: don't BUG when we can trivially return a proper error.
docs: update documentation about /proc//fdinfo/ fanotify output
fs, fanotify: add @mflags field to fanotify output
docs: add documentation about /proc//fdinfo/ output
fs, notify: add procfs fdinfo helper
fs, exportfs: add exportfs_encode_inode_fh() helper
fs, exportfs: escape nil dereference if no s_export_op present
fs, epoll: add procfs fdinfo helper
fs, eventfd: add procfs fdinfo helper
procfs: add ability to plug in auxiliary fdinfo providers
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
breakpoint selftests: print failure status instead of cause make error
kcmp selftests: print fail status instead of cause make error
kcmp selftests: make run_tests fix
mem-hotplug selftests: print failure status instead of cause make error
cpu-hotplug selftests: print failure status instead of cause make error
mqueue selftests: print failure status instead of cause make error
vm selftests: print failure status instead of cause make error
ubifs: use prandom_bytes
mtd: nandsim: use prandom_bytes
...

Linus Torvalds
2012-12-18 12:58:12 +0800
d74026986 exec: use -ELOOP for max recursion depth ... Browse Code »

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

if (cmd != path_bshell && errno == ENOEXEC) {
*argv-- = cmd;
*argv = cmd = path_bshell;
goto repeat;
}

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook
Cc: halfdog
Cc: P J P
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-12-18 09:15:23 +0800
6a2b60b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace changes from Eric Biederman:
"While small this set of changes is very significant with respect to
containers in general and user namespaces in particular. The user
space interface is now complete.

This set of changes adds support for unprivileged users to create user
namespaces and as a user namespace root to create other namespaces.
The tyranny of supporting suid root preventing unprivileged users from
using cool new kernel features is broken.

This set of changes completes the work on setns, adding support for
the pid, user, mount namespaces.

This set of changes includes a bunch of basic pid namespace
cleanups/simplifications. Of particular significance is the rework of
the pid namespace cleanup so it no longer requires sending out
tendrils into all kinds of unexpected cleanup paths for operation. At
least one case of broken error handling is fixed by this cleanup.

The files under /proc//ns/ have been converted from regular files
to magic symlinks which prevents incorrect caching by the VFS,
ensuring the files always refer to the namespace the process is
currently using and ensuring that the ptrace_mayaccess permission
checks are always applied.

The files under /proc//ns/ have been given stable inode numbers
so it is now possible to see if different processes share the same
namespaces.

Through the David Miller's net tree are changes to relax many of the
permission checks in the networking stack to allowing the user
namespace root to usefully use the networking stack. Similar changes
for the mount namespace and the pid namespace are coming through my
tree.

Two small changes to add user namespace support were commited here adn
in David Miller's -net tree so that I could complete the work on the
/proc//ns/ files in this tree.

Work remains to make it safe to build user namespaces and 9p, afs,
ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
Kconfig guard remains in place preventing that user namespaces from
being built when any of those filesystems are enabled.

Future design work remains to allow root users outside of the initial
user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
proc: Usable inode numbers for the namespace file descriptors.
proc: Fix the namespace inode permission checks.
proc: Generalize proc inode allocation
userns: Allow unprivilged mounts of proc and sysfs
userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
procfs: Print task uids and gids in the userns that opened the proc file
userns: Implement unshare of the user namespace
userns: Implent proc namespace operations
userns: Kill task_user_ns
userns: Make create_new_namespaces take a user_ns parameter
userns: Allow unprivileged use of setns.
userns: Allow unprivileged users to create new namespaces
userns: Allow setting a userns mapping to your current uid.
userns: Allow chown and setgid preservation
userns: Allow unprivileged users to create user namespaces.
userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
userns: fix return value on mntns_install() failure
vfs: Allow unprivileged manipulation of the mount namespace.
vfs: Only support slave subtrees across different user namespaces
vfs: Add a user namespace reference from struct mnt_namespace
...

Linus Torvalds
2012-12-18 07:44:47 +0800

29 Nov, 2012

1 commit

71613c3b8 get rid of pt_regs argument of ->load_binary() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:38 +0800