20 Jul, 2007

2 commits

  • This patch enables core dump filtering for ELF-formatted core file.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
    the old mm into the new mm.

    We create the new mm before the binfmt code runs, and place the new stack at
    the very top of the address space. Once the binfmt code runs and figures out
    where the stack should be, we move it downwards.

    It is a bit peculiar in that we have one task with two mm's, one of which is
    inactive.

    [a.p.zijlstra@chello.nl: limit stack size]
    Signed-off-by: Ollie Wild
    Signed-off-by: Peter Zijlstra
    Cc:
    Cc: Hugh Dickins
    [bunk@stusta.de: unexport bprm_mm_init]
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ollie Wild
     

17 Jul, 2007

2 commits

  • fs/binfmt_elf.c: In function 'load_elf_binary':
    fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function

    The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't
    use the resulting value.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This patch is using mmap()'s randomization functionality in such a way that
    it maps the main executable of (specially compiled/linked -pie/-fpie)
    ET_DYN binaries onto a random address (in cases in which mmap() is allowed
    to perform a randomization).

    Origin of this patch is in exec-shield
    (http://people.redhat.com/mingo/exec-shield/)

    [jkosina@suse.cz: pie randomization: fix BAD_ADDR macro]
    Signed-off-by: Jan Kratochvil
    Signed-off-by: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Cc: Jakub Jelinek
    Signed-off-by: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kratochvil
     

07 Jul, 2007

1 commit

  • elf_core_dump() supports dumping arch specific ELF notes, via the #define
    ELF_CORE_WRITE_EXTRA_NOTES. Currently the only user of this is the powerpc
    spu coredump code.

    There is a bug in the handling of foffset WRT the arch notes, which causes
    us to erroneously increment foffset by the size of the arch notes, leaving
    a block of zeroes in the file, and causing all subsequent data in the file
    to be at + . eg:

    LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000

    Tells us we should have a chunk of data at 0x50000. The truth is the data
    is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes).

    This bug prevents gdb from reading the core file correctly.

    The simplest fix is to simply remember the size of the arch notes, and add
    it to foffset after we've written the arch notes. The only drawback is
    that if the arch code doesn't write as many bytes as it said it would, we
    end up with a broken core dump again. For now I think that's a reasonable
    requirement.

    Tested on a Cell blade, gdb no longer complains about the core file being
    bogus.

    While I'm here I should point out that the spu coredump code does not work
    if we're dumping to a pipe - we'll have to wait for 23 to fix that.

    Signed-off-by: Michael Ellerman
    Acked-by: Arnd Bergmann
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     

09 May, 2007

3 commits

  • When elf loader fails to map executable (due to memory shortage or because
    binary is malformed), it can return 0. Normally, this is invisible because
    process is killed with SIGKILL and it never returns to user space.

    But if exec() is called from kernel thread (hotplug, whatever)
    consequences are more interesting and vary depending on architecture.

    i386. Nothing especially interesting, execve() just returns
    with "success" :-)

    x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
    with zeros, ergo... double fault.

    ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it
    oopses due to return to zero PC, sometimes it sees NaT in
    rXX and oopses due to NaT consumption.

    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Kuznetsov
     
  • linux/module.h
    -> linux/elf.h
    -> asm-i386/elf.h
    -> linux/utsname.h
    -> linux/sched.h

    Noticeably cut the number of files which are rebuild upon touching sched.h
    and cut down pulled junk from every module.h inclusion.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove includes of where it is not used/needed.
    Suggested by Al Viro.

    Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
    sparc64, and arm (all 59 defconfigs).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

03 Apr, 2007

1 commit

  • When the dump cannot occur most likely because of a full file system and
    the page to be written is the zero page, the call to page_cache_release()
    is missed.

    Signed-off-by: Brian Pomerantz
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Cc: David Howells
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Pomerantz
     

17 Mar, 2007

1 commit

  • This bug was seen on ppc64, but it could have occurred on any
    architecture with a page size of 64k or above. The problem is that in
    fs/binfmt_elf.c:randomize_stack_top() randomizes the stack to within
    0x7ff pages. On 4k page machines, this is 8MB; on 64k page boxes, this
    is 128MB.

    The problem is that the new binary layout (selected in
    arch_pick_mmap_layout) places the mapping segment 128MB or the stack
    rlimit away from the top of the process memory, whichever is larger. If
    you chose an rlimit of less than 128MB (most defaults are in the 8Mb
    range) then you can end up having your entire stack randomized away.

    The fix is to make randomize_stack_top() only steal at most 8MB, which this
    patch does. However, I have to point out that even with this, your stack
    rlimit might not be exactly what you get if it's > 128MB, because you're
    still losing the random offset of up to 8MB.

    The true fix should be to leave an explicit gap for the randomization plus
    a buffer when determining mmap_base, but that would involve fixing all the
    architectures.

    Cc: Arjan van de Ven
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Bottomley
     

13 Feb, 2007

1 commit


27 Jan, 2007

3 commits

  • Proposed patch to fix #5 in
    http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt
    aka
    http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073

    To reproduce, do
    * grab poc at the end of advisory.
    * add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;"
    where first "4096" is something equal to or greater than 4096.
    * ./poc /usr/bin/sudo && ls -l

    Here I get with 2.6.20-rc5:

    -rw------- 1 ad ad 102400 2007-01-15 19:17 core
    ---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudo

    Check for MAY_READ like binfmt_misc.c does.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This patch fixes core dumps to include the vDSO vma, which is left out now.
    It removes the special-case core writing macros, which were not doing the
    right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the
    vma; there is no need for the fixmap page to be installed. It handles the
    CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
    get_gate_vma after real vmas in the same way the /proc/PID/maps code does.

    This changes core dumps so they no longer include the non-PT_LOAD phdrs from
    the vDSO. I made the change to add them in the first place, but in turned out
    that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner
    to leave them out, and just let the phdrs inside the vDSO image speak for
    themselves.

    Signed-off-by: Roland McGrath
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct. This
    provides a clean explicit way to have a vma always included in core dumps, as
    is needed for vDSO's.

    Signed-off-by: Roland McGrath
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

07 Jan, 2007

1 commit

  • This reverts commit 59287c0913cc9a6c75712a775f6c1c1ef418ef3b.

    Hugh Dickins reports that it causes random failures on x86 with SuSE
    10.2, and points out

    "Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE,
    sure to place the ET_DYN from time to time just where the comment
    says it's trying to avoid? I assume that somehow results in the error
    reported."

    (where the comment in question is the existing comment in the source
    code about mmap/brk clashes).

    Suggested-by: Hugh Dickins
    Acked-by: Marcus Meissner
    Cc: Andrew Morton
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Dave Jones
    Cc: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Dec, 2006

2 commits

  • Replace occurences of task->signal->session by a new process_session() helper
    routine.

    It will be useful for pid namespaces to abstract the session pid number.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

08 Dec, 2006

4 commits


04 Dec, 2006

1 commit

  • This patch adds SPU elf notes to the coredump. It creates a separate note
    for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1,
    /signal1_type, /signal2, /signal2_type, /event_mask, /event_status,
    /mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id.

    A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to
    specify they have extra elf core notes.

    A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the
    additional notes could be calculated and added to the notes phdr entry.

    A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes
    would be written after the existing notes.

    The SPU coredump code resides in spufs. Stub functions are provided in the
    kernel which are hooked into the spufs code which does the actual work via
    register_arch_coredump_calls().

    A new set of __spufs__read/get() functions was provided to allow the
    coredump code to read from the spufs files without having to lock the
    SPU context for each file read from.

    Cc:
    Signed-off-by: Dwayne Grant McConnell
    Signed-off-by: Arnd Bergmann

    Dwayne Grant McConnell
     

16 Oct, 2006

1 commit

  • It is silly to use non-static variable for writting zeroes to the file.

    And more seriously, foffset in core dump file dump function was incremented
    too much, so some parts of core dump were shifted by size of few phdrs and
    notes down, so although gdb was able to load that file, it did not make lot
    of sense - in my test case data pages were shifted down by about 900 bytes.

    Signed-off-by: Petr Vandrovec
    Signed-off-by: Linus Torvalds

    Petr Vandrovec
     

13 Oct, 2006

1 commit

  • The file based core dump code was broken by pipe changes - a relative
    llseek returns the absolute file position on success, not the relative
    one, so dump_seek() always failed when invoked with non-zero current
    position.

    Only success/failure can be tested with relative lseek, we have to trust
    kernel that on success we've got right file offset. With this fix in
    place I have finally real core files instead of 1KB fragments...

    Signed-off-by: Petr Vandrovec
    [ Cleaned it up a bit while here - use SEEK_CUR instead of hardcoding 1 ]
    Signed-off-by: Linus Torvalds

    Petr Vandrovec
     

01 Oct, 2006

2 commits

  • Using the infrastructure created in previous patches implement support to
    pipe core dumps into programs.

    This is done by overloading the existing core_pattern sysctl
    with a new syntax:

    |program

    When the first character of the pattern is a '|' the kernel will instead
    threat the rest of the pattern as a command to run. The core dump will be
    written to the standard input of that program instead of to a file.

    This is useful for having automatic core dump analysis without filling up
    disks. The program can do some simple analysis and save only a summary of
    the core dump.

    The core dump proces will run with the privileges and in the name space of
    the process that caused the core dump.

    I also increased the core pattern size to 128 bytes so that longer command
    lines fit.

    Most of the changes comes from allowing core dumps without seeks. They are
    fairly straight forward though.

    One small incompatibility is that if someone had a core pattern previously
    that started with '|' they will get suddenly new behaviour. I think that's
    unlikely to be a real problem though.

    Additional background:

    > Very nice, do you happen to have a program that can accept this kind of
    > input for crash dumps? I'm guessing that the embedded people will
    > really want this functionality.

    I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
    ran gdb on it to get a backtrace and wrote the summary to a shared directory.
    Then there was a simple CGI script to generate a "top 10" crashes HTML
    listing.

    Unfortunately this still had the disadvantage to needing full disk space for a
    dump except for deleting it afterwards (in fact it was worse because over the
    pipe holes didn't work so if you have a holey address map it would require
    more space).

    Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
    cores (at least it worked with zsh's =(cat core) syntax), so it would be
    likely possible to do it without temporary space with a simple wrapper that
    calls it in the right way. I ran out of time before doing that though.

    The demo prototype scripts weren't very good. If there is really interest I
    can dig them out (they are currently on a laptop disk on the desk with the
    laptop itself being in service), but I would recommend to rewrite them for any
    serious application of this and fix the disk space problem.

    Also to be really useful it should probably find a way to automatically fetch
    the debuginfos (I cheated and just installed them in advance). If nobody else
    does it I can probably do the rewrite myself again at some point.

    My hope at some point was that desktops would support it in their builtin
    crash reporters, but at least the KDE people I talked too seemed to be happy
    with their user space only solution.

    Alan sayeth:

    I don't believe that piping as such as neccessarily the right model, but
    the ability to intercept and processes core dumps from user space is asked
    for by many enterprise users as well. They want to know about, capture,
    analyse and process core dumps, often centrally and in automated form.

    [akpm@osdl.org: loff_t != unsigned long]
    Signed-off-by: Andi Kleen
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Create a new header file, fs/internal.h, for common definitions local to the
    sources in the fs/ directory.

    Move extern definitions that should be in header files from fs/*.c to
    fs/internal.h or other main header files where they span directories.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

30 Sep, 2006

2 commits

  • do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
    in wait_for_completion(&mm->core_done) at this point, so we can use RCU
    locks.

    Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Fixed race on put_files_struct on exec with proc. Restoring files on
    current on error path may lead to proc having a pointer to already kfree-d
    files_struct.

    ->files changing at exit.c and khtread.c are safe as exit_files() makes all
    things under lock.

    Found during OpenVZ stress testing.

    [akpm@osdl.org: add export]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     

27 Sep, 2006

1 commit

  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
    [PATCH] Don't set calgary iommu as default y
    [PATCH] i386/x86-64: New Intel feature flags
    [PATCH] x86: Add a cumulative thermal throttle event counter.
    [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
    [PATCH] x86: Refactor thermal throttle processing
    [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
    [PATCH] Fix unwinder warning in traps.c
    [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
    [PATCH] x86: Move direct PCI scanning functions out of line
    [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
    [PATCH] Don't leak NT bit into next task
    [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
    [PATCH] Fix some broken white space in ia32_signal.c
    [PATCH] Initialize argument registers for 32bit signal handlers.
    [PATCH] Remove all traces of signal number conversion
    [PATCH] Don't synchronize time reading on single core AMD systems
    [PATCH] Remove outdated comment in x86-64 mmconfig code
    [PATCH] Use string instructions for Core2 copy/clear
    [PATCH] x86: - restore i8259A eoi status on resume
    [PATCH] i386: Split multi-line printk in oops output.
    ...

    Linus Torvalds
     

26 Sep, 2006

2 commits


11 Jul, 2006

1 commit


04 Jul, 2006

1 commit

  • Fix check for bad address; use macro instead of open-coding two checks.

    Taken from RHEL4 kernel update.

    From: Ernie Petrides

    For background, the BAD_ADDR() macro should return TRUE if the address is
    TASK_SIZE, because that's the lowest address that is *not* valid for
    user-space mappings. The macro was correct in binfmt_aout.c but was wrong
    for the "equal to" case in binfmt_elf.c. There were two in-line validations
    of user-space addresses in binfmt_elf.c, which have been appropriately
    converted to use the corrected BAD_ADDR() macro in the patch you posted
    yesterday. Note that the size checks against TASK_SIZE are okay as coded.

    The additional changes that I propose are below. These are in the error
    paths for bad ELF entry addresses once load_elf_binary() has already
    committed to exec'ing the new image (following the tearing down of the
    task's original address space).

    The 1st hunk deals with the interp-side of the outer "if". There were two
    problems here. The printk() should be removed because this path can be
    triggered at will by a bogus interpreter image created and used by a
    malicious user. Further, the error code should not be ENOEXEC, because that
    causes the loop in search_binary_handler() to continue trying other exec
    handlers (twice, in fact). But it's too late for this to work correctly,
    because the user address space has already been torn down, and an exec()
    failure cannot be returned to the user code because the code no longer
    exists. The only recovery is to force a SIGSEGV, but it's best to terminate
    the search loop immediately. I somewhat arbitrarily chose EINVAL as a
    fallback error code, but any error returned by load_elf_interp() will
    override that (but this value will never be seen by user-space).

    The 2nd hunk deals with the non-interp-side of the outer "if". There were
    two problems here as well. The SIGSEGV needs to be forced, because a prior
    sigaction() syscall might have set the associated disposition to SIG_IGN.
    And the ENOEXEC should be changed to EINVAL as described above.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Ernie Petrides
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     

23 Jun, 2006

3 commits

  • Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless
    casts of kmalloc() return values in the same file.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • This patch removes the steal_locks() function.

    steal_locks() doesn't work correctly with any filesystem that does it's own
    lock management, including NFS, CIFS, etc.

    In addition it has weird semantics on local filesystems in case tasks
    sharing file-descriptor tables are doing POSIX locking operations in
    parallel to execve().

    The steal_locks() function has an effect on applications doing:

    clone(CLONE_FILES)
    /* in child */
    lock
    execve
    lock

    POSIX locks acquired before execve (by "child", "parent" or any further
    task sharing files_struct) will after the execve be owned exclusively by
    "child".

    According to Chris Wright some LSB/LTP kind of suite triggers without the
    stealing behavior, but there's no known real-world application that would
    also fail.

    Apps using NPTL are not affected, since all other threads are killed before
    execve.

    Apps using LinuxThreads are only affected if they

    - have multiple threads during exec (LinuxThreads doesn't kill other
    threads, the app may do it with pthread_kill_other_threads_np())
    - rely on POSIX locks being inherited across exec

    Both conditions are documented, but not their interaction.

    Apps using clone() natively are affected if they

    - use clone(CLONE_FILES)
    - rely on POSIX locks being inherited across exec

    The above scenarios are unlikely, but possible.

    If the patch is vetoed, there's a plan B, that involves mostly keeping the
    weird stealing semantics, but changing the way lock ownership is handled so
    that network and local filesystems work consistently.

    That would add more complexity though, so this solution seems to be
    preferred by most people.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: Chris Wright
    Cc: Christoph Hellwig
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

26 Mar, 2006

3 commits


27 Feb, 2006

1 commit