Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2007

1 commit

557ed1fa2 remove ZERO_PAGE ... Browse Code »

The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note

A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
(and thus mapcounted and count towards shared rss). These writes to
the struct page could cause excessive cacheline bouncing on big
systems. There are a number of ways this could be addressed if it is
an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see
much use to them except on benchmarks. All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus "snif" Torvalds

Nick Piggin
2007-10-17 00:42:53 +0800

19 Sep, 2007

1 commit

e55014923 [POWERPC] spufs: Cleanup ELF coredump extra notes logic ... Browse Code »

To start with, arch_notes_size() etc. is a little too ambiguous a name for
my liking, so change the function names to be more explicit.

Calling through macros is ugly, especially with hidden parameters, so don't
do that, call the routines directly.

Use ARCH_HAVE_EXTRA_ELF_NOTES as the only flag, and based on it decide
whether we want the extern declarations or the empty versions.

Since we have empty routines, actually use them in the coredump code to
save a few #ifdefs.

We want to change the handling of foffset so that the write routine updates
foffset as it goes, instead of using file->f_pos (so that writing to a pipe
works). So pass foffset to the write routine, and for now just set it to
file->f_pos at the end of writing.

It should also be possible for the write routine to fail, so change it to
return int and treat a non-zero return as failure.

Signed-off-by: Michael Ellerman
Signed-off-by: Jeremy Kerr
Signed-off-by: Paul Mackerras

Michael Ellerman
2007-09-19 13:12:19 +0800

22 Jul, 2007

1 commit

d4e3cc387 revert "PIE randomization" ... Browse Code »

There are reports of this causing userspace failures
(http://lkml.org/lkml/2007/7/20/421).

Revert.

Cc: Jan Kratochvil
Cc: Jiri Kosina
Cc: Ingo Molnar
Cc: Roland McGrath
Cc: Jakub Jelinek
Cc: Ulrich Kunitz
Cc: "H. Peter Anvin"
Cc: "Bret Towe"
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-22 08:49:14 +0800

20 Jul, 2007

2 commits

a1b59e802 coredump masking: ELF: enable core dump filtering ... Browse Code »

This patch enables core dump filtering for ELF-formatted core file.

Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kawai, Hidehiro
2007-07-20 01:04:47 +0800
b6a2fea39 mm: variable length argument support ... Browse Code »

Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.

We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.

It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.

[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild
Signed-off-by: Peter Zijlstra
Cc:
Cc: Hugh Dickins
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ollie Wild
2007-07-20 01:04:45 +0800

17 Jul, 2007

2 commits

4d3b573ad binfmt_elf warning fix ... Browse Code »

fs/binfmt_elf.c: In function 'load_elf_binary':
fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function

The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't
use the resulting value.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-17 00:05:47 +0800
60bfba7e8 PIE randomization ... Browse Code »

This patch is using mmap()'s randomization functionality in such a way that
it maps the main executable of (specially compiled/linked -pie/-fpie)
ET_DYN binaries onto a random address (in cases in which mmap() is allowed
to perform a randomization).

Origin of this patch is in exec-shield
(http://people.redhat.com/mingo/exec-shield/)

[jkosina@suse.cz: pie randomization: fix BAD_ADDR macro]
Signed-off-by: Jan Kratochvil
Signed-off-by: Jiri Kosina
Cc: Ingo Molnar
Cc: Roland McGrath
Cc: Jakub Jelinek
Signed-off-by: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kratochvil
2007-07-17 00:05:42 +0800

07 Jul, 2007

1 commit

ef7320edb Fix elf_core_dump() when writing arch specific notes (spu coredumps) ... Browse Code »

elf_core_dump() supports dumping arch specific ELF notes, via the #define
ELF_CORE_WRITE_EXTRA_NOTES. Currently the only user of this is the powerpc
spu coredump code.

There is a bug in the handling of foffset WRT the arch notes, which causes
us to erroneously increment foffset by the size of the arch notes, leaving
a block of zeroes in the file, and causing all subsequent data in the file
to be at + . eg:

LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000

Tells us we should have a chunk of data at 0x50000. The truth is the data
is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes).

This bug prevents gdb from reading the core file correctly.

The simplest fix is to simply remember the size of the arch notes, and add
it to foffset after we've written the arch notes. The only drawback is
that if the arch code doesn't write as many bytes as it said it would, we
end up with a broken core dump again. For now I think that's a reasonable
requirement.

Tested on a Cell blade, gdb no longer complains about the core file being
bogus.

While I'm here I should point out that the spu coredump code does not work
if we're dumping to a pipe - we'll have to wait for 23 to fix that.

Signed-off-by: Michael Ellerman
Acked-by: Arnd Bergmann
Acked-by: Benjamin Herrenschmidt
Acked-by: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Ellerman
2007-07-07 01:23:43 +0800

09 May, 2007

3 commits

b140f2510 Invalid return value of execve() resulting in oopses ... Browse Code »

When elf loader fails to map executable (due to memory shortage or because
binary is malformed), it can return 0. Normally, this is invisible because
process is killed with SIGKILL and it never returns to user space.

But if exec() is called from kernel thread (hotplug, whatever)
consequences are more interesting and vary depending on architecture.

i386. Nothing especially interesting, execve() just returns
with "success" :-)

x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
with zeros, ergo... double fault.

ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it
oopses due to return to zero PC, sometimes it sees NaT in
rXX and oopses due to NaT consumption.

Signed-off-by: Alexey Kuznetsov
Signed-off-by: Kirill Korotaev
Signed-off-by: Pavel Emelianov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Kuznetsov
2007-05-09 02:15:15 +0800
7e80d0d0b i386: sched.h inclusion from module.h is baack ... Browse Code »

linux/module.h
-> linux/elf.h
-> asm-i386/elf.h
-> linux/utsname.h
-> linux/sched.h

Noticeably cut the number of files which are rebuild upon touching sched.h
and cut down pulled junk from every module.h inclusion.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-05-09 02:15:08 +0800
e63340ae6 header cleaning: don't include smp_lock.h when not used ... Browse Code »

Remove includes of where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-05-09 02:15:07 +0800

03 Apr, 2007

1 commit

032217026 [PATCH] fix page leak during core dump ... Browse Code »

When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.

Signed-off-by: Brian Pomerantz
Cc: Hugh Dickins
Cc: Nick Piggin
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brian Pomerantz
2007-04-03 01:06:08 +0800

17 Mar, 2007

1 commit

d1cabd632 [PATCH] fix process crash caused by randomisation and 64k pages ... Browse Code »

This bug was seen on ppc64, but it could have occurred on any
architecture with a page size of 64k or above. The problem is that in
fs/binfmt_elf.c:randomize_stack_top() randomizes the stack to within
0x7ff pages. On 4k page machines, this is 8MB; on 64k page boxes, this
is 128MB.

The problem is that the new binary layout (selected in
arch_pick_mmap_layout) places the mapping segment 128MB or the stack
rlimit away from the top of the process memory, whichever is larger. If
you chose an rlimit of less than 128MB (most defaults are in the 8Mb
range) then you can end up having your entire stack randomized away.

The fix is to make randomize_stack_top() only steal at most 8MB, which this
patch does. However, I have to point out that even with this, your stack
rlimit might not be exactly what you get if it's > 128MB, because you're
still losing the random offset of up to 8MB.

The true fix should be to leave an explicit gap for the randomization plus
a buffer when determining mmap_base, but that would involve fixing all the
architectures.

Cc: Arjan van de Ven
Cc: Ingo Molnar
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

James Bottomley
2007-03-17 10:25:06 +0800

13 Feb, 2007

1 commit

9fbbd4dd1 [PATCH] x86: Don't require the vDSO for handling a.out signals ... Browse Code »

and in other strange binfmts. vDSO is not necessarily mapped there.

Signed-off-by: Andi Kleen

Andi Kleen
2007-02-13 20:26:26 +0800

27 Jan, 2007

3 commits

1fb844961 [PATCH] core-dumping unreadable binaries via PT_INTERP ... Browse Code »

Proposed patch to fix #5 in
http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt
aka
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073

To reproduce, do
* grab poc at the end of advisory.
* add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;"
where first "4096" is something equal to or greater than 4096.
* ./poc /usr/bin/sudo && ls -l

Here I get with 2.6.20-rc5:

-rw------- 1 ad ad 102400 2007-01-15 19:17 core
---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudo

Check for MAY_READ like binfmt_misc.c does.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-01-27 05:51:00 +0800
f47aef55d [PATCH] i386 vDSO: use VM_ALWAYSDUMP ... Browse Code »

This patch fixes core dumps to include the vDSO vma, which is left out now.
It removes the special-case core writing macros, which were not doing the
right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the
vma; there is no need for the fixmap page to be installed. It handles the
CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
get_gate_vma after real vmas in the same way the /proc/PID/maps code does.

This changes core dumps so they no longer include the non-PT_LOAD phdrs from
the vDSO. I made the change to add them in the first place, but in turned out
that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner
to leave them out, and just let the phdrs inside the vDSO image speak for
themselves.

Signed-off-by: Roland McGrath
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2007-01-27 05:50:58 +0800
e5b97dde5 [PATCH] Add VM_ALWAYSDUMP ... Browse Code »

This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct. This
provides a clean explicit way to have a vma always included in core dumps, as
is needed for vDSO's.

Signed-off-by: Roland McGrath
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2007-01-27 05:50:58 +0800

07 Jan, 2007

1 commit

90cb28e8f Revert "[PATCH] binfmt_elf: randomize PIE binaries (2nd try)" ... Browse Code »

This reverts commit 59287c0913cc9a6c75712a775f6c1c1ef418ef3b.

Hugh Dickins reports that it causes random failures on x86 with SuSE
10.2, and points out

"Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE,
sure to place the ET_DYN from time to time just where the comment
says it's trying to avoid? I assume that somehow results in the error
reported."

(where the comment in question is the existing comment in the source
code about mmap/brk clashes).

Suggested-by: Hugh Dickins
Acked-by: Marcus Meissner
Cc: Andrew Morton
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Dave Jones
Cc: Arjan van de Ven
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-01-07 05:28:21 +0800

09 Dec, 2006

2 commits

937949d9e [PATCH] add process_session() helper routine ... Browse Code »

Replace occurences of task->signal->session by a new process_session() helper
routine.

It will be useful for pid namespaces to abstract the session pid number.

Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2006-12-09 00:28:51 +0800
0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

08 Dec, 2006

4 commits

8de61e69c [PATCH] fs: remove unused variable ... Browse Code »

Removed unused 'have_pt_gnu_stack' variable.

Reported by David Binderman

Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2006-12-08 00:39:44 +0800
386d9a7ed [PATCH] elf: Always define elf_addr_t in linux/elf.h ... Browse Code »

Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.

Signed-off-by: Magnus Damm
Cc: Daniel Jacobowitz
Cc: Roland McGrath
Cc: Jakub Jelinek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Magnus Damm
2006-12-08 00:39:38 +0800
841d5fb7c [PATCH] binfmt: fix uaccess handling ... Browse Code »

Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2006-12-08 00:39:33 +0800
59287c091 [PATCH] binfmt_elf: randomize PIE binaries (2nd try) ... Browse Code »

Randomizes -pie compiled binaries from 64k (0x10000) up to ELF_ET_DYN_BASE.

0 -> 64k is excluded to allow NULL ptr accesses to fail.

Signed-off-by: Marcus Meissner
Cc: Ingo Molnar
Cc: Dave Jones
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Marcus Meissner
2006-12-08 00:39:33 +0800

04 Dec, 2006

1 commit

bf1ab978b [POWERPC] coredump: Add SPU elf notes to coredump. ... Browse Code »

This patch adds SPU elf notes to the coredump. It creates a separate note
for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1,
/signal1_type, /signal2, /signal2_type, /event_mask, /event_status,
/mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id.

A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to
specify they have extra elf core notes.

A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the
additional notes could be calculated and added to the notes phdr entry.

A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes
would be written after the existing notes.

The SPU coredump code resides in spufs. Stub functions are provided in the
kernel which are hooked into the spufs code which does the actual work via
register_arch_coredump_calls().

A new set of __spufs__read/get() functions was provided to allow the
coredump code to read from the spufs files without having to lock the
SPU context for each file read from.

Cc:
Signed-off-by: Dwayne Grant McConnell
Signed-off-by: Arnd Bergmann

Dwayne Grant McConnell
2006-12-04 17:40:19 +0800

16 Oct, 2006

1 commit

a7a0d86f5 [PATCH] Fix core files so they make sense to gdb... ... Browse Code »

It is silly to use non-static variable for writting zeroes to the file.

And more seriously, foffset in core dump file dump function was incremented
too much, so some parts of core dump were shifted by size of few phdrs and
notes down, so although gdb was able to load that file, it did not make lot
of sense - in my test case data pages were shifted down by about 900 bytes.

Signed-off-by: Petr Vandrovec
Signed-off-by: Linus Torvalds

Petr Vandrovec
2006-10-16 02:24:49 +0800

13 Oct, 2006

1 commit

7f14daa19 [PATCH] Get core dump code to work... ... Browse Code »

The file based core dump code was broken by pipe changes - a relative
llseek returns the absolute file position on success, not the relative
one, so dump_seek() always failed when invoked with non-zero current
position.

Only success/failure can be tested with relative lseek, we have to trust
kernel that on success we've got right file offset. With this fix in
place I have finally real core files instead of 1KB fragments...

Signed-off-by: Petr Vandrovec
[ Cleaned it up a bit while here - use SEEK_CUR instead of hardcoding 1 ]
Signed-off-by: Linus Torvalds

Petr Vandrovec
2006-10-13 23:13:34 +0800

01 Oct, 2006

2 commits

d025c9db7 [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern ... Browse Code »

Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run. The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks. The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks. They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour. I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps? I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way. I ran out of time before doing that though.

The demo prototype scripts weren't very good. If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance). If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

I don't believe that piping as such as neccessarily the right model, but
the ability to intercept and processes core dumps from user space is asked
for by many enterprise users as well. They want to know about, capture,
analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2006-10-01 15:39:33 +0800
07f3f05c1 [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6] ... Browse Code »

Create a new header file, fs/internal.h, for common definitions local to the
sources in the fs/ directory.

Move extern definitions that should be in header files from fs/*.c to
fs/internal.h or other main header files where they span directories.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:18 +0800

30 Sep, 2006

2 commits

486ccb05f [PATCH] elf_core_dump: don't take tasklist_lock ... Browse Code »

do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.

Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:14 +0800
3b9b8ab65 [PATCH] Fix unserialized task->files changing ... Browse Code »

Fixed race on put_files_struct on exec with proc. Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-09-30 00:18:12 +0800

27 Sep, 2006

1 commit

b27824083 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 ... Browse Code »

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...

Linus Torvalds
2006-09-27 04:07:55 +0800

26 Sep, 2006

2 commits

8d6b5eeea [PATCH] binfmt_elf: consistently use loff_t ... Browse Code »

As David Howells points out, binfmt_elf sometimes uses
off_t, sometimes uses loff_t. Use loff_t throughout.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-26 23:48:53 +0800
c16b63e09 [PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set ... Browse Code »

Based on patch from Frank van Maarseveen , but
extended.

Signed-off-by: Andi Kleen

Andi Kleen
2006-09-26 16:52:28 +0800

11 Jul, 2006

1 commit

b4cac1a02 [PATCH] FDPIC: Move roundup() into linux/kernel.h ... Browse Code »

Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.

[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-07-11 04:24:22 +0800

04 Jul, 2006

1 commit

ce51059be [PATCH] binfmt_elf: fix checks for bad address ... Browse Code »

Fix check for bad address; use macro instead of open-coding two checks.

Taken from RHEL4 kernel update.

From: Ernie Petrides

For background, the BAD_ADDR() macro should return TRUE if the address is
TASK_SIZE, because that's the lowest address that is *not* valid for
user-space mappings. The macro was correct in binfmt_aout.c but was wrong
for the "equal to" case in binfmt_elf.c. There were two in-line validations
of user-space addresses in binfmt_elf.c, which have been appropriately
converted to use the corrected BAD_ADDR() macro in the patch you posted
yesterday. Note that the size checks against TASK_SIZE are okay as coded.

The additional changes that I propose are below. These are in the error
paths for bad ELF entry addresses once load_elf_binary() has already
committed to exec'ing the new image (following the tearing down of the
task's original address space).

The 1st hunk deals with the interp-side of the outer "if". There were two
problems here. The printk() should be removed because this path can be
triggered at will by a bogus interpreter image created and used by a
malicious user. Further, the error code should not be ENOEXEC, because that
causes the loop in search_binary_handler() to continue trying other exec
handlers (twice, in fact). But it's too late for this to work correctly,
because the user address space has already been torn down, and an exec()
failure cannot be returned to the user code because the code no longer
exists. The only recovery is to force a SIGSEGV, but it's best to terminate
the search loop immediately. I somewhat arbitrarily chose EINVAL as a
fallback error code, but any error returned by load_elf_interp() will
override that (but this value will never be seen by user-space).

The 2nd hunk deals with the non-interp-side of the outer "if". There were
two problems here as well. The SIGSEGV needs to be forced, because a prior
sigaction() syscall might have set the associated disposition to SIG_IGN.
And the ENOEXEC should be changed to EINVAL as described above.

Signed-off-by: Chuck Ebbert
Signed-off-by: Ernie Petrides
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chuck Ebbert
2006-07-04 06:26:59 +0800

23 Jun, 2006

3 commits

785d55708 [PATCH] binflt_elf: remove more casts ... Browse Code »

Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-06-23 22:43:05 +0800
f4e5cc2c4 [PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless casts ... Browse Code »

Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless
casts of kmalloc() return values in the same file.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-06-23 22:43:05 +0800
c89681ed7 [PATCH] remove steal_locks() ... Browse Code »

This patch removes the steal_locks() function.

steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.

In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().

The steal_locks() function has an effect on applications doing:

clone(CLONE_FILES)
/* in child */
lock
execve
lock

POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".

According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.

Apps using NPTL are not affected, since all other threads are killed before
execve.

Apps using LinuxThreads are only affected if they

- have multiple threads during exec (LinuxThreads doesn't kill other
threads, the app may do it with pthread_kill_other_threads_np())
- rely on POSIX locks being inherited across exec

Both conditions are documented, but not their interaction.

Apps using clone() natively are affected if they

- use clone(CLONE_FILES)
- rely on POSIX locks being inherited across exec

The above scenarios are unlikely, but possible.

If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.

That would add more complexity though, so this solution seems to be
preferred by most people.

Signed-off-by: Miklos Szeredi
Cc: Trond Myklebust
Cc: Matthew Wilcox
Cc: Chris Wright
Cc: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2006-06-23 06:05:57 +0800

26 Mar, 2006

1 commit

913bd9060 [PATCH] x86_64: Increase the variability of the process stack on 64bit architectures ... Browse Code »

8MB is not really very random, use 1GB (or more with larger page sizes)
instead.

Also use the low bits of the random generator output now instead of
throwing them away.

Only enabled on x86-64 right now. Other architectures need to add
a suitable STACK_RND_MASK

Cc: mingo@elte.hu
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2006-03-26 01:10:52 +0800