Doug / smarc-fsl-linux-kernel | Embedian Git Server

01 Oct, 2006

2 commits

d025c9db7 [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern ... Browse Code »

Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run. The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks. The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks. They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour. I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps? I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way. I ran out of time before doing that though.

The demo prototype scripts weren't very good. If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance). If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

I don't believe that piping as such as neccessarily the right model, but
the ability to intercept and processes core dumps from user space is asked
for by many enterprise users as well. They want to know about, capture,
analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2006-10-01 15:39:33 +0800
07f3f05c1 [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6] ... Browse Code »

Create a new header file, fs/internal.h, for common definitions local to the
sources in the fs/ directory.

Move extern definitions that should be in header files from fs/*.c to
fs/internal.h or other main header files where they span directories.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:18 +0800

30 Sep, 2006

2 commits

486ccb05f [PATCH] elf_core_dump: don't take tasklist_lock ... Browse Code »

do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.

Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:14 +0800
3b9b8ab65 [PATCH] Fix unserialized task->files changing ... Browse Code »

Fixed race on put_files_struct on exec with proc. Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-09-30 00:18:12 +0800

27 Sep, 2006

1 commit

b27824083 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 ... Browse Code »

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...

Linus Torvalds
2006-09-27 04:07:55 +0800

26 Sep, 2006

2 commits

8d6b5eeea [PATCH] binfmt_elf: consistently use loff_t ... Browse Code »

As David Howells points out, binfmt_elf sometimes uses
off_t, sometimes uses loff_t. Use loff_t throughout.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-26 23:48:53 +0800
c16b63e09 [PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set ... Browse Code »

Based on patch from Frank van Maarseveen , but
extended.

Signed-off-by: Andi Kleen

Andi Kleen
2006-09-26 16:52:28 +0800

11 Jul, 2006

1 commit

b4cac1a02 [PATCH] FDPIC: Move roundup() into linux/kernel.h ... Browse Code »

Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.

[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-07-11 04:24:22 +0800

04 Jul, 2006

1 commit

ce51059be [PATCH] binfmt_elf: fix checks for bad address ... Browse Code »

Fix check for bad address; use macro instead of open-coding two checks.

Taken from RHEL4 kernel update.

From: Ernie Petrides

For background, the BAD_ADDR() macro should return TRUE if the address is
TASK_SIZE, because that's the lowest address that is *not* valid for
user-space mappings. The macro was correct in binfmt_aout.c but was wrong
for the "equal to" case in binfmt_elf.c. There were two in-line validations
of user-space addresses in binfmt_elf.c, which have been appropriately
converted to use the corrected BAD_ADDR() macro in the patch you posted
yesterday. Note that the size checks against TASK_SIZE are okay as coded.

The additional changes that I propose are below. These are in the error
paths for bad ELF entry addresses once load_elf_binary() has already
committed to exec'ing the new image (following the tearing down of the
task's original address space).

The 1st hunk deals with the interp-side of the outer "if". There were two
problems here. The printk() should be removed because this path can be
triggered at will by a bogus interpreter image created and used by a
malicious user. Further, the error code should not be ENOEXEC, because that
causes the loop in search_binary_handler() to continue trying other exec
handlers (twice, in fact). But it's too late for this to work correctly,
because the user address space has already been torn down, and an exec()
failure cannot be returned to the user code because the code no longer
exists. The only recovery is to force a SIGSEGV, but it's best to terminate
the search loop immediately. I somewhat arbitrarily chose EINVAL as a
fallback error code, but any error returned by load_elf_interp() will
override that (but this value will never be seen by user-space).

The 2nd hunk deals with the non-interp-side of the outer "if". There were
two problems here as well. The SIGSEGV needs to be forced, because a prior
sigaction() syscall might have set the associated disposition to SIG_IGN.
And the ENOEXEC should be changed to EINVAL as described above.

Signed-off-by: Chuck Ebbert
Signed-off-by: Ernie Petrides
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chuck Ebbert
2006-07-04 06:26:59 +0800

23 Jun, 2006

3 commits

785d55708 [PATCH] binflt_elf: remove more casts ... Browse Code »

Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-06-23 22:43:05 +0800
f4e5cc2c4 [PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless casts ... Browse Code »

Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless
casts of kmalloc() return values in the same file.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-06-23 22:43:05 +0800
c89681ed7 [PATCH] remove steal_locks() ... Browse Code »

This patch removes the steal_locks() function.

steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.

In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().

The steal_locks() function has an effect on applications doing:

clone(CLONE_FILES)
/* in child */
lock
execve
lock

POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".

According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.

Apps using NPTL are not affected, since all other threads are killed before
execve.

Apps using LinuxThreads are only affected if they

- have multiple threads during exec (LinuxThreads doesn't kill other
threads, the app may do it with pthread_kill_other_threads_np())
- rely on POSIX locks being inherited across exec

Both conditions are documented, but not their interaction.

Apps using clone() natively are affected if they

- use clone(CLONE_FILES)
- rely on POSIX locks being inherited across exec

The above scenarios are unlikely, but possible.

If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.

That would add more complexity though, so this solution seems to be
preferred by most people.

Signed-off-by: Miklos Szeredi
Cc: Trond Myklebust
Cc: Matthew Wilcox
Cc: Chris Wright
Cc: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2006-06-23 06:05:57 +0800

26 Mar, 2006

3 commits

913bd9060 [PATCH] x86_64: Increase the variability of the process stack on 64bit architectures ... Browse Code »

8MB is not really very random, use 1GB (or more with larger page sizes)
instead.

Also use the low bits of the random generator output now instead of
throwing them away.

Only enabled on x86-64 right now. Other architectures need to add
a suitable STACK_RND_MASK

Cc: mingo@elte.hu
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2006-03-26 01:10:52 +0800
551485481 [PATCH] remove needless check in binfmt_elf.c ... Browse Code »

Local variable i is unsigned int and thus cannot be negative.

(akpm: unsigneds shouldn't be called `i'. This value cannot possibly be
negative anyway).

Signed-off-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Carsten Otte
2006-03-26 00:23:01 +0800
11b0b5abb [PATCH] use kzalloc and kcalloc in core fs code ... Browse Code »

Signed-off-by: Oliver Neukum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oliver Neukum
2006-03-26 00:23:00 +0800

27 Feb, 2006

1 commit

5342fba54 [PATCH] x86_64: Check for bad elf entry address. ... Browse Code »

Fixes a local DOS on Intel systems that lead to an endless
recursive fault. AMD machines don't seem to be affected.

Signed-off-by: Suresh Siddha
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Suresh Siddha
2006-02-27 01:53:30 +0800

15 Jan, 2006

1 commit

858119e15 [PATCH] Unlinline a bunch of other functions ... Browse Code »

Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb

Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-01-15 10:27:06 +0800

11 Jan, 2006

2 commits

74da6cd06 missing printk loglevel and tiny tiny whitespace change in binfmt_elf() ... Browse Code »

Patch adds a mising printk loglevel (I think KERN_WARNING is appropriate
here) in fs/binfmt_elf.c, and while I was there I made some tiny tiny tiny
adjustments to whitespacing in the neighborhood.

Signed-off-by: Jesper Juhl
Signed-off-by: Adrian Bunk

Jesper Juhl
2006-01-11 08:51:26 +0800
792db3af3 [PATCH] fs/binfmt_elf: Remove unneeded kmalloc() return value casts ... Browse Code »

Remove unneeded casts of kmalloc() return value in binfmt_elf.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-01-11 00:02:01 +0800

09 Jan, 2006

2 commits

708e9a794 [PATCH] tiny: Configure ELF core dump support ... Browse Code »

configurable support for ELF core dumps

text data bss dec hex filename
3330172 529036 190556 4049764 3dcb64 vmlinux-baseline
3325552 528912 190556 4045020 3db8dc vmlinux-no-elf

add/remove: 0/8 grow/shrink: 0/0 up/down: 0/-4424 (-4424)
function old new delta
fill_note 32 - -32
maydump 58 - -58
dump_seek 67 - -67
writenote 180 - -180
elf_dump_thread_status 274 - -274
fill_psinfo 308 - -308
fill_prstatus 466 - -466
elf_core_dump 3039 - -3039

Signed-off-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Mackall
2006-01-09 12:14:11 +0800
dda6ebde9 [PATCH] Fix handling of ELF segments with zero filesize ... Browse Code »

mmap() returns -EINVAL if given a zero length, and thus elf_map() in
binfmt_elf.c does likewise if it attempts to map a (page-aligned) ELF
segment with zero filesize. Such a situation never arises with the default
linker scripts, but there's nothing inherently wrong with zero-filesize
(but non-zero memsize) ELF segments. Custom linker scripts can generate
them, and the kernel should be able to map them; this patch makes it so.

Signed-off-by: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2006-01-09 12:13:58 +0800

07 Nov, 2005

1 commit

f99d49adf [PATCH] kfree cleanup: fs ... Browse Code »

This is the fs/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in fs/.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-11-07 23:54:06 +0800

31 Oct, 2005

1 commit

a92897286 [PATCH] Don't uselessly export task_struct to userspace in core dumps ... Browse Code »

task_struct is an internal structure to the kernel with a lot of good
information, that is probably interesting in core dumps. However there is
no way for user space to know what format that information is in making it
useless.

I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not
used anywhere else. So I would be surprised if anyone notices it is
missing.

In addition exporting kernel pointers to all the interesting kernel data
structures sounds like the very definition of an information leak. I
haven't a clue what someone with evil intentions could do with that
information, but in any attack against the kernel it looks like this is the
perfect tool for aiming that attack.

So since NT_TASKSTRUCT is useless as currently defined and is potentially
dangerous, let's just not export it.

(akpm: Daniel Jacobowitz "would be amazed" if anything was
using NT_TASKSTRUCT).

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2005-10-31 09:37:18 +0800

30 Oct, 2005

1 commit

404351e67 [PATCH] mm: mm_init set_mm_counters ... Browse Code »

How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type. And how is rss
initialized? By set_mm_counter, all over the place. Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:38 +0800

12 Oct, 2005

1 commit

6de505173 [PATCH] binfmt_elf bss padding fix ... Browse Code »

Nir Tzachar points out that if an ELF file specifies a
zero-length bss at a whacky address, we cannot load that binary because
padzero() tries to zero out the end of the page at the whacky address, and
that may not be writeable.

See also http://bugzilla.kernel.org/show_bug.cgi?id=5411

So teach load_elf_binary() to skip the bss settng altogether if the elf file
has a zero-length bss segment.

Cc: Roland McGrath
Cc: Daniel Jacobowitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@osdl.org
2005-10-12 00:46:54 +0800

22 Jun, 2005

1 commit

1363c3cd8 [PATCH] Avoiding mmap fragmentation ... Browse Code »

Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.

So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.

2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander
Credit-to: "Richard Purdie"
Signed-off-by: Ken Chen
Acked-by: Ingo Molnar (partly)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfgang Wander
2005-06-22 09:46:16 +0800

17 Jun, 2005

1 commit

5db92850d [PATCH] Fix large core dumps with a 32-bit off_t ... Browse Code »

The ELF core dump code has one use of off_t when writing out segments.
Some of the segments may be passed the 2GB limit of an off_t, even on a
32-bit system, so it's important to use loff_t instead. This fixes a
corrupted core dump in the bigcore test in GDB's testsuite.

Signed-off-by: Daniel Jacobowitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Jacobowitz
2005-06-17 00:02:59 +0800

17 May, 2005

1 commit

a84a50595 [PATCH] fix Linux kernel ELF core dump privilege elevation ... Browse Code »

As reported by Paul Starzetz

Reference: CAN-2005-1263

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2005-05-17 12:07:05 +0800

29 Apr, 2005

1 commit

18c8baff8 [PATCH] Fix error recovery path for arch_setup_additional_pages ... Browse Code »

If arch_setup_additional_pages fails, the error path will do some double-frees.
This fixes it.

Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds

Roland McGrath
2005-04-29 06:17:19 +0800

17 Apr, 2005

2 commits

547ee84ce [PATCH] ppc64: Improve mapping of vDSO ... Browse Code »

This patch reworks the way the ppc64 is mapped in user memory by the kernel
to make it more robust against possible collisions with executable
segments. Instead of just whacking a VMA at 1Mb, I now use
get_unmapped_area() with a hint, and I moved the mapping of the vDSO to
after the mapping of the various ELF segments and of the interpreter, so
that conflicts get caught properly (it still has to be before
create_elf_tables since the later will fill the AT_SYSINFO_EHDR with the
proper address).

While I was at it, I also changed the 32 and 64 bits vDSO's to link at
their "natural" address of 1Mb instead of 0. This is the address where
they are normally mapped in absence of conflict. By doing so, it should be
possible to properly prelink one it's been verified to work on glibc.

Signed-off-by: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2005-04-17 06:24:35 +0800
1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800