Eric Lee / smarc-fsl-linux-kernel

20 Jul, 2011

1 commit

1b5d783c9 consolidate BINPRM_FLAGS_ENFORCE_NONDUMP handling ... Browse Code »

new helper: would_dump(bprm, file). Checks if we are allowed to
read the file and if we are not - sets ENFORCE_NODUMP. Exported,
used in places that previously open-coded the same logics.

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:10 +0800

07 Jul, 2011

1 commit

bcb65a797 FDPIC: Fix memory leak ... Browse Code »

The shdr4extnum variable isn't being freed in the cleanup process of
elf_fdpic_core_dump().

Signed-off-by: Davidlohr Bueso
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2011-07-07 03:15:16 +0800

01 Jun, 2010

1 commit

e30c7c3b3 binfmt_elf_fdpic: Fix clear_user() error handling ... Browse Code »

clear_user() returns the number of bytes that could not be copied rather than
an error code. So we should return -EFAULT rather than directly returning the
results.

Without this patch, positive values may be returned to elf_fdpic_map_file()
and the following error handlings do not function as expected.

1.
ret = elf_fdpic_map_file_constdisp_on_uclinux(params, file, mm);
if (ret < 0)
return ret;
2.
ret = elf_fdpic_map_file_by_direct_mmap(params, file, mm);
if (ret < 0)
return ret;

Signed-off-by: Takuya Yoshikawa
Signed-off-by: David Howells
Acked-by: Mike Frysinger
CC: Alexander Viro
CC: Andrew Morton
CC: Daisuke HATAYAMA
CC: Paul Mundt
Signed-off-by: Linus Torvalds

Takuya Yoshikawa
2010-06-01 23:11:06 +0800

28 Apr, 2010

1 commit

16a5b3c41 Remove redundant check for CONFIG_MMU ... Browse Code »

The checks for CONFIG_MMU at this location are duplicated as all the code is
located inside a #ifndef CONFIG_MMU block. So the first conditional block will
always be included while the second never will.

Signed-off-by: Christoph Egger
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Christoph Egger
2010-04-28 00:01:26 +0800

25 Mar, 2010

1 commit

47568d4c5 FDPIC: For-loop in elf_core_vma_data_size() is incorrect ... Browse Code »

Fix an incorrect for-loop in elf_core_vma_data_size(). The advance-pointer
statement lacks an assignment:

CC fs/binfmt_elf_fdpic.o
fs/binfmt_elf_fdpic.c: In function 'elf_core_vma_data_size':
fs/binfmt_elf_fdpic.c:1593: warning: statement with no effect

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-03-25 07:43:29 +0800

08 Mar, 2010

1 commit

318ae2edc Merge branch 'for-next' into for-linus ... Browse Code »

Conflicts:
Documentation/filesystems/proc.txt
arch/arm/mach-u300/include/mach/debug-macro.S
drivers/net/qlge/qlge_ethtool.c
drivers/net/qlge/qlge_main.c
drivers/net/typhoon.c

Jiri Kosina
2010-03-08 23:55:37 +0800

07 Mar, 2010

6 commits

30736a4d4 coredump: pass mm->flags as a coredump parameter for consistency ... Browse Code »

Pass mm->flags as a coredump parameter for consistency.

---
1787 if (mm->core_state || !get_dumpable(mm)) { mmap_sem);
1789 put_cred(cred);
1790 goto fail;
1791 }
1792
[...]
1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ fsuid = 0; /* Dump root private */
1801 }
---

Since dumpable bits are not protected by lock, there is a chance to change
these bits between (1) and (2).

To solve this issue, this patch copies mm->flags to
coredump_params.mm_flags at the beginning of do_coredump() and uses it
instead of get_dumpable() while dumping core.

This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
dump_filter bits in mm->flags.

[akpm@linux-foundation.org: fix merge]
Signed-off-by: Masami Hiramatsu
Acked-by: Roland McGrath
Cc: Hidehiro Kawai
Cc: Oleg Nesterov
Cc: Ingo Molnar
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masami Hiramatsu
2010-03-07 03:26:46 +0800
8d9032bbe elf coredump: add extended numbering support ... Browse Code »

The current ELF dumper implementation can produce broken corefiles if
program headers exceed 65535. This number is determined by the number of
vmas which the process have. In particular, some extreme programs may use
more than 65535 vmas. (If you google max_map_count, you can find some
users facing this problem.) This kind of program never be able to generate
correct coredumps.

This patch implements ``extended numbering'' that uses sh_info field of
the first section header instead of e_phnum field in order to represent
upto 4294967295 vmas.

This is supported by
AMD64-ABI(http://www.x86-64.org/documentation.html) and
Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
Of course, we are preparing patches for gdb and binutils.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:46 +0800
93eb211e6 elf coredump: make offset calculation process and writing process explicit ... Browse Code »

By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
extended numbering and so will produce the corefiles with section header
table in a special case.

The problem is the process of writing a file header offset of the section
header table into e_shoff field of the ELF header. ELF header is
positioned at the beginning of the corefile, while section header at the
end. So, we need to take which of the following ways:

1. Seek backward to retry writing operation for ELF header
after writing process for a whole part

2. Make offset calculation process and writing process
totally sequential

The clause 1. is not always possible: one cannot assume that file system
supports seek function. Consider the no_llseek case.

Therefore, this patch adopts the clause 2.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:46 +0800
1fcccbac8 elf coredump: replace ELF_CORE_EXTRA_* macros by functions ... Browse Code »

elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
macro for hiding _multiline_ logics in functions. This patch removes
#ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
order to reduce a range of modification.

This cleanup is for my next patches, but I think this cleanup itself is
worth doing regardless of my firnal purpose.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:45 +0800
088e7af73 coredump: move dump_write() and dump_seek() into a header file ... Browse Code »

My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
them into other newly created *.c files. Then, each files will contain
dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
same. So, this patch moves them into a header file with dump_seek().
Also, the patch deletes confusing DUMP_WRITE macros in each files.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:45 +0800
05f47fda9 coredump: unify dump_seek() implementations for each binfmt_*.c ... Browse Code »

The current ELF dumper can produce broken corefiles if program headers
exceed 65535. In particular, the program in 64-bit environment often
demands more than 65535 mmaps. If you google max_map_count, then you can
find many users facing this problem.

Solaris has already dealt with this issue, and other OSes have also
adopted the same method as in Solaris. Currently, Sun's document and AMD
64 ABI include the description for the extension, where they call the
extension Extended Numbering. See Reference for further information.

I believe that linux kernel should adopt the same way as they did, so I've
written this patch.

I am also preparing for patches of GDB and binutils.

How to fix
==========

In new dumping process, there are two cases according to weather or
not the number of program headers is equal to or more than 65535.

- if less than 65535, the produced corefile format is exactly the same
as the ordinary one.

- if equal to or more than 65535, then e_phnum field is set to newly
introduced constant PN_XNUM(0xffff) and the actual number of program
headers is set to sh_info field of the section header at index 0.

Compatibility Concern
=====================

* As already mentioned in Summary, Sun and AMD64 has already adopted
this. See Reference.

* There are four combinations according to whether kernel and userland
tools are respectively modified or not. The next table summarizes
shortly for each combination.

---------------------------------------------
Original Kernel | Modified Kernel
---------------------------------------------
< 65535 | >= 65535 | < 65535 | >= 65535
-------------------------------------------------------------
Original Tools | OK | broken | OK | broken (#)
-------------------------------------------------------------
Modified Tools | OK | broken | OK | OK
-------------------------------------------------------------

Note that there is no case that `OK' changes to `broken'.

(#) Although this case remains broken, O-M behaves better than
O-O. That is, while in O-O case e_phnum field would be extremely
small due to integer overflow, in O-M case it is guaranteed to be at
least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
actual correct value than the O-O case.

Test Program
============

Here is a test program mkmmaps.c that is useful to produce the
corefile with many mmaps. To use this, please take the following
steps:

$ ulimit -c unlimited
$ sysctl vm.max_map_count=70000 # default 65530 is too small
$ sysctl fs.file-max=70000
$ mkmmaps 65535

Then, the program will abort and a corefile will be generated.

If failed, there are two cases according to the error message
displayed.

* ``out of memory'' means vm.max_map_count is still smaller

* ``too many open files'' means fs.file-max is still smaller

So, please change it to a larger value, and then retry it.

mkmmaps.c
==
#include
#include
#include
#include
#include
int main(int argc, char **argv)
{
int maps_num;
if (argc < 2) {
fprintf(stderr, "mkmmaps [number of maps to be created]\n");
exit(1);
}
if (sscanf(argv[1], "%d", &maps_num) == EOF) {
perror("sscanf");
exit(2);
}
if (maps_num < 0) {
fprintf(stderr, "%d is invalid\n", maps_num);
exit(3);
}
for (; maps_num > 0; --maps_num) {
if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
MAP_SHARED | MAP_ANONYMOUS, (int) -1,
(off_t) NULL)) {
perror("mmap");
exit(4);
}
}
abort();
{
char buffer[128];
sprintf(buffer, "wc -l /proc/%u/maps", getpid());
system(buffer);
}
return 0;
}

Tested on i386, ia64 and um/sys-i386.
Built on sh4 (which covers fs/binfmt_elf_fdpic.c)

References
==========

- Sun microsystems: Linker and Libraries.
Part No: 817-1984-17, September 2008.
URL: http://docs.sun.com/app/docs/doc/817-1984

- System V ABI AMD64 Architecture Processor Supplement
Draft Version 0.99., May 11, 2009.
URL: http://www.x86-64.org/

This patch:

There are three different definitions for dump_seek() functions in
binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The
only for binfmt_elf.c.

My next patch will move dump_seek() into a header file in order to share
the same implementations for dump_write() and dump_seek(). As the first
step, this patch unify these three definitions for dump_seek() by applying
the past commits that have been applied only for binfmt_elf.c.

Specifically, the modification made here is part of the following commits:

* d025c9db7f31fc0554ce7fb2dfc78d35a77f3487
* 7f14daa19ea36b200d237ad3ac5826ae25360461

This patch does not change a shape of corefiles.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:45 +0800

09 Feb, 2010

1 commit

3ad2f3fbb tree-wide: Assorted spelling fixes ... Browse Code »

In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack
Cc: Joe Perches
Cc: Junio C Hamano
Signed-off-by: Jiri Kosina

Daniel Mack
2010-02-09 18:13:56 +0800

30 Jan, 2010

1 commit

221af7f87 Split 'flush_old_exec' into two functions ... Browse Code »

'flush_old_exec()' is the point of no return when doing an execve(), and
it is pretty badly misnamed. It doesn't just flush the old executable
environment, it also starts up the new one.

Which is very inconvenient for things like setting up the new
personality, because we want the new personality to affect the starting
of the new environment, but at the same time we do _not_ want the new
personality to take effect if flushing the old one fails.

As a result, the x86-64 '32-bit' personality is actually done using this
insane "I'm going to change the ABI, but I haven't done it yet" bit
(TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
personality, but just the "pending" bit, so that "flush_thread()" can do
the actual personality magic.

This patch in no way changes any of that insanity, but it does split the
'flush_old_exec()' function up into a preparatory part that can fail
(still called flush_old_exec()), and a new part that will actually set
up the new exec environment (setup_new_exec()). All callers are changed
to trivially comply with the new world order.

Signed-off-by: H. Peter Anvin
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-01-30 00:22:01 +0800

07 Jan, 2010

1 commit

04e4f2b18 FDPIC: Respect PT_GNU_STACK exec protection markings when creating NOMMU stack ... Browse Code »

The current code will load the stack size and protection markings, but
then only use the markings in the MMU code path. The NOMMU code path
always passes PROT_EXEC to the mmap() call. While this doesn't matter
to most people whilst the code is running, it will cause a pointless
icache flush when starting every FDPIC application. Typically this
icache flush will be of a region on the order of 128KB in size, or may
be the entire icache, depending on the facilities available on the CPU.

In the case where the arch default behaviour seems to be desired
(EXSTACK_DEFAULT), we probe VM_STACK_FLAGS for VM_EXEC to determine
whether we should be setting PROT_EXEC or not.

For arches that support an MPU (Memory Protection Unit - an MMU without
the virtual mapping capability), setting PROT_EXEC or not will make an
important difference.

It should be noted that this change also affects the executability of
the brk region, since ELF-FDPIC has that share with the stack. However,
this is probably irrelevant as NOMMU programs aren't likely to use the
brk region, preferring instead allocation via mmap().

Signed-off-by: Mike Frysinger
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Mike Frysinger
2010-01-07 10:16:02 +0800

04 Jan, 2010

1 commit

2f48912d1 binfmt_elf_fdpic: Fix build breakage introduced by coredump changes. ... Browse Code »

Commit f6151dfea21496d43dbaba32cfcd9c9f404769bc introduces build
breakage, so this patch fixes it together with some printk formatting
cleanup.

Signed-off-by: Daisuke HATAYAMA
Signed-off-by: Paul Mundt

Daisuke HATAYAMA
2010-01-04 14:42:14 +0800

18 Dec, 2009

1 commit

f6151dfea mm: introduce coredump parameter structure ... Browse Code »

Introduce coredump parameter data structure (struct coredump_params) to
simplify binfmt->core_dump() arguments.

Signed-off-by: Masami Hiramatsu
Suggested-by: Ingo Molnar
Cc: Hidehiro Kawai
Cc: Oleg Nesterov
Cc: Roland McGrath
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masami Hiramatsu
2009-12-18 07:45:31 +0800

16 Dec, 2009

2 commits

698ba7b5a elf: kill USE_ELF_CORE_DUMP ... Browse Code »

Currently all architectures but microblaze unconditionally define
USE_ELF_CORE_DUMP. The microblaze omission seems like an error to me, so
let's kill this ifdef and make sure we are the same everywhere.

Signed-off-by: Christoph Hellwig
Acked-by: Hugh Dickins
Cc:
Cc: Michal Simek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2009-12-16 23:20:12 +0800
ea6376395 nommu: fix malloc performance by adding uninitialized flag ... Browse Code »

The NOMMU code currently clears all anonymous mmapped memory. While this
is what we want in the default case, all memory allocation from userspace
under NOMMU has to go through this interface, including malloc() which is
allowed to return uninitialized memory. This can easily be a significant
performance penalty. So for constrained embedded systems were security is
irrelevant, allow people to avoid clearing memory unnecessarily.

This also alters the ELF-FDPIC binfmt such that it obtains uninitialised
memory for the brk and stack region.

Signed-off-by: Jie Zhang
Signed-off-by: Robin Getz
Signed-off-by: Mike Frysinger
Signed-off-by: David Howells
Acked-by: Paul Mundt
Acked-by: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jie Zhang
2009-12-16 00:53:24 +0800

24 Sep, 2009

1 commit

8e8b63a68 fdpic: ignore the loader's PT_GNU_STACK when calculating the stack size ... Browse Code »

Ignore the loader's PT_GNU_STACK when calculating the stack size, and only
consider the executable's PT_GNU_STACK, assuming the executable has one.

Currently the behaviour is to take the largest stack size and use that,
but that means you can't reduce the stack size in the executable. The
loader's stack size should probably only be used when executing the loader
directly.

WARNING: This patch is slightly dangerous - it may render a system
inoperable if the loader's stack size is larger than that of important
executables, and the system relies unknowingly on this increasing the size
of the stack.

Signed-off-by: David Howells
Signed-off-by: Mike Frysinger
Acked-by: Paul Mundt
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2009-09-24 22:21:02 +0800

22 Sep, 2009

1 commit

f3e8fccd0 mm: add get_dump_page ... Browse Code »

In preparation for the next patch, add a simple get_dump_page(addr)
interface for the CONFIG_ELF_CORE dumpers to use, instead of calling
get_user_pages() directly. They're not interested in errors: they
just want to use holes as much as possible, to save space and make
sure that the data is aligned where the headers said it would be.

Oh, and don't use that horrid DUMP_SEEK(off) macro!

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Mel Gorman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-09-22 22:17:40 +0800

19 Jun, 2009

1 commit

3b34fc588 elf_core_dump: use rcu_read_lock() to access ->real_parent ... Browse Code »

In theory it is not safe to dereference ->parent/real_parent without
tasklist or rcu lock, we can race with re-parenting.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-19 04:03:52 +0800

03 May, 2009

1 commit

0ae05fb25 ptrace: s/parent/real_parent/ in binfmt_elf_fdpic.c ... Browse Code »

->real_parent is the parent. ->parent may be the tracer.

Signed-off-by: Oleg Nesterov
Acked-by: David Howells
Acked-by: Roland McGrath
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-05-03 06:36:10 +0800

03 Apr, 2009

1 commit

ab4ad5551 bin_elf_fdpic: check the return value of clear_user ... Browse Code »

Signed-off-by: Mike Frysinger
Signed-off-by: Bryan Wu
Cc: David Howells
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Frysinger
2009-04-03 10:05:01 +0800

08 Jan, 2009

2 commits

f4bbf5105 FDPIC: Don't attempt to expand the userspace stack to fill the space allocated ... Browse Code »

Stop the ELF-FDPIC binfmt from attempting to expand the userspace stack and brk
segments to fill the space actually allocated for it. The space allocated may
be rounded up by mmap(), and may be wasted.

However, finding out how much space we actually obtained uses the contentious
kobjsize() function which we'd like to get rid of as it doesn't necessarily
work for all slab allocators.

Signed-off-by: David Howells
Tested-by: Mike Frysinger
Acked-by: Paul Mundt

David Howells
2009-01-08 20:04:47 +0800
8feae1311 NOMMU: Make VMAs per MM as for MMU-mode linux ... Browse Code »

Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

(1) In SYSV SHM where nattch for a segment does not reflect the number of
shmat's (and forks) done.

(2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
that a VMA might be shared and already have its vm_mm assigned to another
process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

(1) Regions are now allocated with alloc_pages() rather than kmalloc() and
with no recourse to __GFP_COMP, so the pages are not composite. Instead,
each page has a reference on it held by the region. Anything else that is
interested in such a page will have to get a reference on it to retain it.
When the pages are released due to unmapping, each page is passed to
put_page() and will be freed when the page usage count reaches zero.

(2) Excess pages are trimmed after an allocation as the allocation must be
made as a power-of-2 quantity of pages.

(3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
end up with overlapping VMAs within the tree, the VMA struct address is
appended to the sort key.

(4) Non-anonymous VMAs are now added to the backing inode's prio list.

(5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
the backing region. The VMA and region structs will be split if
necessary.

(6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
segment instead of all the attachments at that addresss. Multiple
shmat()'s return the same address under NOMMU-mode instead of different
virtual addresses as under MMU-mode.

(7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

(8) /proc/maps is now the global list of mapped regions, and may list bits
that aren't actually mapped anywhere.

(9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
of RAM currently allocated by mmap to hold mappable regions that can't be
mapped directly. These are copies of the backing device or file if not
anonymous.

These changes make NOMMU mode more similar to MMU mode. The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).

Signed-off-by: David Howells
Tested-by: Mike Frysinger
Acked-by: Paul Mundt

David Howells
2009-01-08 20:04:47 +0800

14 Nov, 2008

5 commits

a6f76f23d CRED: Make execve() take advantage of copy-on-write credentials ... Browse Code »

Make execve() take advantage of copy-on-write credentials, allowing it to set
up the credentials in advance, and then commit the whole lot after the point
of no return.

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

(1) execve().

The credential bits from struct linux_binprm are, for the most part,
replaced with a single credentials pointer (bprm->cred). This means that
all the creds can be calculated in advance and then applied at the point
of no return with no possibility of failure.

I would like to replace bprm->cap_effective with:

cap_isclear(bprm->cap_effective)

but this seems impossible due to special behaviour for processes of pid 1
(they always retain their parent's capability masks where normally they'd
be changed - see cap_bprm_set_creds()).

The following sequence of events now happens:

(a) At the start of do_execve, the current task's cred_exec_mutex is
locked to prevent PTRACE_ATTACH from obsoleting the calculation of
creds that we make.

(a) prepare_exec_creds() is then called to make a copy of the current
task's credentials and prepare it. This copy is then assigned to
bprm->cred.

This renders security_bprm_alloc() and security_bprm_free()
unnecessary, and so they've been removed.

(b) The determination of unsafe execution is now performed immediately
after (a) rather than later on in the code. The result is stored in
bprm->unsafe for future reference.

(c) prepare_binprm() is called, possibly multiple times.

(i) This applies the result of set[ug]id binaries to the new creds
attached to bprm->cred. Personality bit clearance is recorded,
but now deferred on the basis that the exec procedure may yet
fail.

(ii) This then calls the new security_bprm_set_creds(). This should
calculate the new LSM and capability credentials into *bprm->cred.

This folds together security_bprm_set() and parts of
security_bprm_apply_creds() (these two have been removed).
Anything that might fail must be done at this point.

(iii) bprm->cred_prepared is set to 1.

bprm->cred_prepared is 0 on the first pass of the security
calculations, and 1 on all subsequent passes. This allows SELinux
in (ii) to base its calculations only on the initial script and
not on the interpreter.

(d) flush_old_exec() is called to commit the task to execution. This
performs the following steps with regard to credentials:

(i) Clear pdeath_signal and set dumpable on certain circumstances that
may not be covered by commit_creds().

(ii) Clear any bits in current->personality that were deferred from
(c.i).

(e) install_exec_creds() [compute_creds() as was] is called to install the
new credentials. This performs the following steps with regard to
credentials:

(i) Calls security_bprm_committing_creds() to apply any security
requirements, such as flushing unauthorised files in SELinux, that
must be done before the credentials are changed.

This is made up of bits of security_bprm_apply_creds() and
security_bprm_post_apply_creds(), both of which have been removed.
This function is not allowed to fail; anything that might fail
must have been done in (c.ii).

(ii) Calls commit_creds() to apply the new credentials in a single
assignment (more or less). Possibly pdeath_signal and dumpable
should be part of struct creds.

(iii) Unlocks the task's cred_replace_mutex, thus allowing
PTRACE_ATTACH to take place.

(iv) Clears The bprm->cred pointer as the credentials it was holding
are now immutable.

(v) Calls security_bprm_committed_creds() to apply any security
alterations that must be done after the creds have been changed.
SELinux uses this to flush signals and signal handlers.

(f) If an error occurs before (d.i), bprm_free() will call abort_creds()
to destroy the proposed new credentials and will then unlock
cred_replace_mutex. No changes to the credentials will have been
made.

(2) LSM interface.

A number of functions have been changed, added or removed:

(*) security_bprm_alloc(), ->bprm_alloc_security()
(*) security_bprm_free(), ->bprm_free_security()

Removed in favour of preparing new credentials and modifying those.

(*) security_bprm_apply_creds(), ->bprm_apply_creds()
(*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

Removed; split between security_bprm_set_creds(),
security_bprm_committing_creds() and security_bprm_committed_creds().

(*) security_bprm_set(), ->bprm_set_security()

Removed; folded into security_bprm_set_creds().

(*) security_bprm_set_creds(), ->bprm_set_creds()

New. The new credentials in bprm->creds should be checked and set up
as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the
second and subsequent calls.

(*) security_bprm_committing_creds(), ->bprm_committing_creds()
(*) security_bprm_committed_creds(), ->bprm_committed_creds()

New. Apply the security effects of the new credentials. This
includes closing unauthorised files in SELinux. This function may not
fail. When the former is called, the creds haven't yet been applied
to the process; when the latter is called, they have.

The former may access bprm->cred, the latter may not.

(3) SELinux.

SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:

(a) The bprm_security_struct struct has been removed in favour of using
the credentials-under-construction approach.

(c) flush_unauthorized_files() now takes a cred pointer and passes it on
to inode_has_perm(), file_has_perm() and dentry_open().

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:24 +0800
c69e8d9c0 CRED: Use RCU to access another task's creds and to release a task's own creds ... Browse Code »

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:19 +0800
86a264abe CRED: Wrap current->cred and a few other accessors ... Browse Code »

Wrap current->cred and a few other accessors to hide their actual
implementation.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:18 +0800
b6dff3ec5 CRED: Separate task security context from task_struct ... Browse Code »

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:16 +0800
da9592ede CRED: Wrap task credential accesses in the filesystem subsystem ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:05 +0800

21 Oct, 2008

1 commit

2515ddc6d binfmt_elf_fdpic: Update for cputime changes. ... Browse Code »

Commit f06febc96ba8e0af80bcc3eaec0a109e88275fac ("timers: fix itimer/
many thread hang") introduced a new task_cputime interface and
subsequently only converted binfmt_elf over to it. This results in the
build for binfmt_elf_fdpic blowing up given that p->signal->{u,s}time
have disappeared from underneath us.

Apply the same trivial fix from binfmt_elf to binfmt_elf_fdpic.

Signed-off-by: Paul Mundt
Signed-off-by: Linus Torvalds

Paul Mundt
2008-10-21 11:17:18 +0800

17 Oct, 2008

3 commits

5edc2a512 binfmt_elf_fdpic: wire up AT_EXECFD, AT_EXECFN, AT_SECURE ... Browse Code »

These auxvec entries are the only ones left unhandled out of the current
base implementation. This syncs up binfmt_elf_fdpic with linux/auxvec.h
and current binfmt_elf.

Signed-off-by: Paul Mundt
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mundt
2008-10-17 02:21:46 +0800
c7637941d binfmt_elf_fdpic: convert initial stack alignment to arch_align_stack() ... Browse Code »

binfmt_elf_fdpic seems to have grabbed a hard-coded hack from an ancient
version of binfmt_elf in order to try and fix up initial stack alignment
on multi-threaded x86, which while in addition to being unused, was also
pushed down beyond the first set of operations on the stack pointer,
negating the entire purpose.

These days, we have an architecture independent arch_align_stack(), so we
switch to using that instead. Move the initial alignment up before the
initial stores while we're at it.

Signed-off-by: Paul Mundt
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mundt
2008-10-17 02:21:46 +0800
ec23847d6 binfmt_elf_fdpic: support auxvec base platform string ... Browse Code »

Commit 483fad1c3fa1060d7e6710e84a065ad514571739 ("ELF loader support for
auxvec base platform string") introduced AT_BASE_PLATFORM, but only
implemented it for binfmt_elf.

Given that AT_VECTOR_SIZE_BASE is unconditionally enlarged for us, and
it's only optionally added in for the platforms that set
ELF_BASE_PLATFORM, wire it up for binfmt_elf_fdpic, too.

Signed-off-by: Paul Mundt
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mundt
2008-10-17 02:21:46 +0800

28 Jul, 2008

1 commit

9b14ec35f binfmt_elf_fdpic: Magical stack pointer index, for NEW_AUX_ENT compat. ... Browse Code »

While implementing binfmt_elf_fdpic on SH it quickly became apparent
that SH was the first platform to support both binfmt_elf_fdpic and
binfmt_elf, as well as the only of the FDPIC platforms to make use of the
auxvt.

Currently binfmt_elf_fdpic uses a special version of NEW_AUX_ENT() where
the first argument is the entry displacement after csp has been adjusted,
being reset after each adjustment. As we have no ability to sort this out
through the platform's ARCH_DLINFO, this index needs to be managed
entirely in create_elf_fdpic_tables(). Presently none of the platforms
that set their own auxvt entries are able to do so through their
respective ARCH_DLINFOs when using binfmt_elf_fdpic.

In addition to this, binfmt_elf_fdpic has been looking at
DLINFO_ARCH_ITEMS for the number of architecture-specific entries in the
auxvt. This is legacy cruft, and is not defined by any platforms in-tree,
even those that make heavy use of the auxvt. AT_VECTOR_SIZE_ARCH is
always available, and contains the number that is of interest here, so we
switch to using that unconditionally as well.

As this has direct bearing on how much stack is used, platforms that have
configurable (or dynamically adjustable) NEW_AUX_ENT calls need to either
make AT_VECTOR_SIZE_ARCH more fine-grained, or leave it as a worst-case
and live with some lost stack space if those entries aren't pushed (some
platforms may also need to purposely sacrifice some space here for
alignment considerations, as noted in the code -- although not an issue
for any FDPIC-capable platform today).

Signed-off-by: Paul Mundt
Acked-by: David Howells

Paul Mundt
2008-07-28 17:10:28 +0800

27 Jul, 2008

1 commit

6341c393f tracehook: exec ... Browse Code »

This moves all the ptrace hooks related to exec into tracehook.h inlines.

This also lifts the calls for tracing out of the binfmt load_binary hooks
into search_binary_handler() after it calls into the binfmt module. This
change has no effect, since all the binfmt modules' load_binary functions
did the call at the end on success, and now search_binary_handler() does
it immediately after return if successful. We consolidate the repeated
code, and binfmt modules no longer need to import ptrace_notify().

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Reviewed-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-07-27 03:00:08 +0800

26 Jul, 2008

2 commits

182c515fd coredump: elf_fdpic_core_dump: use core_state->dumper list ... Browse Code »

Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list
encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/.

This patch allows futher cleanups in binfmt_elf_fdpic.c.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:40 +0800
24d5288f0 coredump: elf_core_dump: skip kernel threads ... Browse Code »

linux_binfmt->core_dump() runs before the process does exit_aio(), this
means that we can hit the kernel thread which shares the same ->mm.
Afaics, nothing really bad can happen, but perhaps it makes sense to fix
this minor bug.

It is sad we have to iterate over all threads in system and use
GFP_ATOMIC. Hopefully we can kill theses ugly do_each_thread()s, but this
needs some nontrivial changes in mm_struct and do_coredump.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:39 +0800

07 Jun, 2008

1 commit

d100d148a nommu: fix ksize() abuse ... Browse Code »

The nommu binfmt code uses ksize() for pointers returned from do_mmap()
which is wrong. This converts the call-sites to use the nommu specific
kobjsize() function which works as expected.

Cc: Christoph Lameter
Cc: Matt Mackall
Acked-by: Paul Mundt
Acked-by: David Howells
Signed-off-by: Pekka Enberg
Acked-by: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2008-06-07 02:29:13 +0800