15 Oct, 2010

1 commit

  • akiphie points out that a.out core-dumps have that odd task struct
    dumping that was never used and was never really a good idea (it goes
    back into the mists of history, probably the original core-dumping
    code). Just remove it.

    Also do the access_ok() check on dump_write(). It probably doesn't
    matter (since normal filesystems all seem to do it anyway), but he
    points out that it's normally done by the VFS layer, so ...

    [ I suspect that we should possibly do "vfs_write()" instead of
    calling ->write directly. That also does the whole fsnotify and write
    statistics thing, which may or may not be a good idea. ]

    And just to be anal, do this all for the x86-64 32-bit a.out emulation
    code too, even though it's not enabled (and won't currently even
    compile)

    Reported-by: akiphie
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit

  • fs/binfmt_aout.c: In function `aout_core_dump':
    fs/binfmt_aout.c:125: warning: passing argument 2 of `dump_write' makes pointer from integer without a cast
    include/linux/coredump.h:12: note: expected `const void *' but argument is of type `long unsigned int'
    fs/binfmt_aout.c:132: warning: passing argument 2 of `dump_write' makes pointer from integer without a cast
    include/linux/coredump.h:12: note: expected `const void *' but argument is of type `long unsigned int'

    due to dump_write() expecting a user void *. Fold casts into the
    START_DATA/START_STACK macros and shut up the warnings.

    Signed-off-by: Borislav Petkov
    Cc: Daisuke HATAYAMA
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     

07 Mar, 2010

3 commits

  • My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
    them into other newly created *.c files. Then, each files will contain
    dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
    same. So, this patch moves them into a header file with dump_seek().
    Also, the patch deletes confusing DUMP_WRITE macros in each files.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • The current ELF dumper can produce broken corefiles if program headers
    exceed 65535. In particular, the program in 64-bit environment often
    demands more than 65535 mmaps. If you google max_map_count, then you can
    find many users facing this problem.

    Solaris has already dealt with this issue, and other OSes have also
    adopted the same method as in Solaris. Currently, Sun's document and AMD
    64 ABI include the description for the extension, where they call the
    extension Extended Numbering. See Reference for further information.

    I believe that linux kernel should adopt the same way as they did, so I've
    written this patch.

    I am also preparing for patches of GDB and binutils.

    How to fix
    ==========

    In new dumping process, there are two cases according to weather or
    not the number of program headers is equal to or more than 65535.

    - if less than 65535, the produced corefile format is exactly the same
    as the ordinary one.

    - if equal to or more than 65535, then e_phnum field is set to newly
    introduced constant PN_XNUM(0xffff) and the actual number of program
    headers is set to sh_info field of the section header at index 0.

    Compatibility Concern
    =====================

    * As already mentioned in Summary, Sun and AMD64 has already adopted
    this. See Reference.

    * There are four combinations according to whether kernel and userland
    tools are respectively modified or not. The next table summarizes
    shortly for each combination.

    ---------------------------------------------
    Original Kernel | Modified Kernel
    ---------------------------------------------
    < 65535 | >= 65535 | < 65535 | >= 65535
    -------------------------------------------------------------
    Original Tools | OK | broken | OK | broken (#)
    -------------------------------------------------------------
    Modified Tools | OK | broken | OK | OK
    -------------------------------------------------------------

    Note that there is no case that `OK' changes to `broken'.

    (#) Although this case remains broken, O-M behaves better than
    O-O. That is, while in O-O case e_phnum field would be extremely
    small due to integer overflow, in O-M case it is guaranteed to be at
    least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
    actual correct value than the O-O case.

    Test Program
    ============

    Here is a test program mkmmaps.c that is useful to produce the
    corefile with many mmaps. To use this, please take the following
    steps:

    $ ulimit -c unlimited
    $ sysctl vm.max_map_count=70000 # default 65530 is too small
    $ sysctl fs.file-max=70000
    $ mkmmaps 65535

    Then, the program will abort and a corefile will be generated.

    If failed, there are two cases according to the error message
    displayed.

    * ``out of memory'' means vm.max_map_count is still smaller

    * ``too many open files'' means fs.file-max is still smaller

    So, please change it to a larger value, and then retry it.

    mkmmaps.c
    ==
    #include
    #include
    #include
    #include
    #include
    int main(int argc, char **argv)
    {
    int maps_num;
    if (argc < 2) {
    fprintf(stderr, "mkmmaps [number of maps to be created]\n");
    exit(1);
    }
    if (sscanf(argv[1], "%d", &maps_num) == EOF) {
    perror("sscanf");
    exit(2);
    }
    if (maps_num < 0) {
    fprintf(stderr, "%d is invalid\n", maps_num);
    exit(3);
    }
    for (; maps_num > 0; --maps_num) {
    if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
    MAP_SHARED | MAP_ANONYMOUS, (int) -1,
    (off_t) NULL)) {
    perror("mmap");
    exit(4);
    }
    }
    abort();
    {
    char buffer[128];
    sprintf(buffer, "wc -l /proc/%u/maps", getpid());
    system(buffer);
    }
    return 0;
    }

    Tested on i386, ia64 and um/sys-i386.
    Built on sh4 (which covers fs/binfmt_elf_fdpic.c)

    References
    ==========

    - Sun microsystems: Linker and Libraries.
    Part No: 817-1984-17, September 2008.
    URL: http://docs.sun.com/app/docs/doc/817-1984

    - System V ABI AMD64 Architecture Processor Supplement
    Draft Version 0.99., May 11, 2009.
    URL: http://www.x86-64.org/

    This patch:

    There are three different definitions for dump_seek() functions in
    binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The
    only for binfmt_elf.c.

    My next patch will move dump_seek() into a header file in order to share
    the same implementations for dump_write() and dump_seek(). As the first
    step, this patch unify these three definitions for dump_seek() by applying
    the past commits that have been applied only for binfmt_elf.c.

    Specifically, the modification made here is part of the following commits:

    * d025c9db7f31fc0554ce7fb2dfc78d35a77f3487
    * 7f14daa19ea36b200d237ad3ac5826ae25360461

    This patch does not change a shape of corefiles.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

30 Jan, 2010

1 commit

  • 'flush_old_exec()' is the point of no return when doing an execve(), and
    it is pretty badly misnamed. It doesn't just flush the old executable
    environment, it also starts up the new one.

    Which is very inconvenient for things like setting up the new
    personality, because we want the new personality to affect the starting
    of the new environment, but at the same time we do _not_ want the new
    personality to take effect if flushing the old one fails.

    As a result, the x86-64 '32-bit' personality is actually done using this
    insane "I'm going to change the ABI, but I haven't done it yet" bit
    (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
    personality, but just the "pending" bit, so that "flush_thread()" can do
    the actual personality magic.

    This patch in no way changes any of that insanity, but it does split the
    'flush_old_exec()' function up into a preparatory part that can fail
    (still called flush_old_exec()), and a new part that will actually set
    up the new exec environment (setup_new_exec()). All callers are changed
    to trivially comply with the new world order.

    Signed-off-by: H. Peter Anvin
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Dec, 2009

1 commit

  • Introduce coredump parameter data structure (struct coredump_params) to
    simplify binfmt->core_dump() arguments.

    Signed-off-by: Masami Hiramatsu
    Suggested-by: Ingo Molnar
    Cc: Hidehiro Kawai
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     

04 Jan, 2009

2 commits

  • They are actually alpha vs. i386/arm/m68k i.e. ecoff vs. aout.

    In the only place where we actually tried to handle arm and i386/m68k in
    different ways (START_DATA() in coredump handling), the arm variant
    works for all of them (i386 and m68k have u.start_code set to 0).

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • it's been used only in sunos compat

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

14 Nov, 2008

1 commit

  • Make execve() take advantage of copy-on-write credentials, allowing it to set
    up the credentials in advance, and then commit the whole lot after the point
    of no return.

    This patch and the preceding patches have been tested with the LTP SELinux
    testsuite.

    This patch makes several logical sets of alteration:

    (1) execve().

    The credential bits from struct linux_binprm are, for the most part,
    replaced with a single credentials pointer (bprm->cred). This means that
    all the creds can be calculated in advance and then applied at the point
    of no return with no possibility of failure.

    I would like to replace bprm->cap_effective with:

    cap_isclear(bprm->cap_effective)

    but this seems impossible due to special behaviour for processes of pid 1
    (they always retain their parent's capability masks where normally they'd
    be changed - see cap_bprm_set_creds()).

    The following sequence of events now happens:

    (a) At the start of do_execve, the current task's cred_exec_mutex is
    locked to prevent PTRACE_ATTACH from obsoleting the calculation of
    creds that we make.

    (a) prepare_exec_creds() is then called to make a copy of the current
    task's credentials and prepare it. This copy is then assigned to
    bprm->cred.

    This renders security_bprm_alloc() and security_bprm_free()
    unnecessary, and so they've been removed.

    (b) The determination of unsafe execution is now performed immediately
    after (a) rather than later on in the code. The result is stored in
    bprm->unsafe for future reference.

    (c) prepare_binprm() is called, possibly multiple times.

    (i) This applies the result of set[ug]id binaries to the new creds
    attached to bprm->cred. Personality bit clearance is recorded,
    but now deferred on the basis that the exec procedure may yet
    fail.

    (ii) This then calls the new security_bprm_set_creds(). This should
    calculate the new LSM and capability credentials into *bprm->cred.

    This folds together security_bprm_set() and parts of
    security_bprm_apply_creds() (these two have been removed).
    Anything that might fail must be done at this point.

    (iii) bprm->cred_prepared is set to 1.

    bprm->cred_prepared is 0 on the first pass of the security
    calculations, and 1 on all subsequent passes. This allows SELinux
    in (ii) to base its calculations only on the initial script and
    not on the interpreter.

    (d) flush_old_exec() is called to commit the task to execution. This
    performs the following steps with regard to credentials:

    (i) Clear pdeath_signal and set dumpable on certain circumstances that
    may not be covered by commit_creds().

    (ii) Clear any bits in current->personality that were deferred from
    (c.i).

    (e) install_exec_creds() [compute_creds() as was] is called to install the
    new credentials. This performs the following steps with regard to
    credentials:

    (i) Calls security_bprm_committing_creds() to apply any security
    requirements, such as flushing unauthorised files in SELinux, that
    must be done before the credentials are changed.

    This is made up of bits of security_bprm_apply_creds() and
    security_bprm_post_apply_creds(), both of which have been removed.
    This function is not allowed to fail; anything that might fail
    must have been done in (c.ii).

    (ii) Calls commit_creds() to apply the new credentials in a single
    assignment (more or less). Possibly pdeath_signal and dumpable
    should be part of struct creds.

    (iii) Unlocks the task's cred_replace_mutex, thus allowing
    PTRACE_ATTACH to take place.

    (iv) Clears The bprm->cred pointer as the credentials it was holding
    are now immutable.

    (v) Calls security_bprm_committed_creds() to apply any security
    alterations that must be done after the creds have been changed.
    SELinux uses this to flush signals and signal handlers.

    (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
    to destroy the proposed new credentials and will then unlock
    cred_replace_mutex. No changes to the credentials will have been
    made.

    (2) LSM interface.

    A number of functions have been changed, added or removed:

    (*) security_bprm_alloc(), ->bprm_alloc_security()
    (*) security_bprm_free(), ->bprm_free_security()

    Removed in favour of preparing new credentials and modifying those.

    (*) security_bprm_apply_creds(), ->bprm_apply_creds()
    (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

    Removed; split between security_bprm_set_creds(),
    security_bprm_committing_creds() and security_bprm_committed_creds().

    (*) security_bprm_set(), ->bprm_set_security()

    Removed; folded into security_bprm_set_creds().

    (*) security_bprm_set_creds(), ->bprm_set_creds()

    New. The new credentials in bprm->creds should be checked and set up
    as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the
    second and subsequent calls.

    (*) security_bprm_committing_creds(), ->bprm_committing_creds()
    (*) security_bprm_committed_creds(), ->bprm_committed_creds()

    New. Apply the security effects of the new credentials. This
    includes closing unauthorised files in SELinux. This function may not
    fail. When the former is called, the creds haven't yet been applied
    to the process; when the latter is called, they have.

    The former may access bprm->cred, the latter may not.

    (3) SELinux.

    SELinux has a number of changes, in addition to those to support the LSM
    interface changes mentioned above:

    (a) The bprm_security_struct struct has been removed in favour of using
    the credentials-under-construction approach.

    (c) flush_unauthorized_files() now takes a cred pointer and passes it on
    to inode_has_perm(), file_has_perm() and dentry_open().

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     

27 Jul, 2008

1 commit

  • This moves all the ptrace hooks related to exec into tracehook.h inlines.

    This also lifts the calls for tracing out of the binfmt load_binary hooks
    into search_binary_handler() after it calls into the binfmt module. This
    change has no effect, since all the binfmt modules' load_binary functions
    did the call at the end on success, and now search_binary_handler() does
    it immediately after return if successful. We consolidate the repeated
    code, and binfmt modules no longer need to import ptrace_notify().

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

29 Apr, 2008

1 commit


09 Feb, 2008

1 commit

  • Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.

    Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
    be permitted to go looking for A.OUT libraries to load in such a case. Not
    only that, but under such conditions A.OUT core dumps are not produced either.

    To make this work, this patch also does the following:

    (1) Makes the existence of the contents of linux/a.out.h contingent on
    CONFIG_ARCH_SUPPORTS_AOUT.

    (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
    core dumping code.

    (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This
    is then included only where needed. This means that this bit of arch
    code will be stored in the appropriate A.OUT binfmt module rather than
    the core kernel.

    (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
    needed) and FRV.

    This patch depends on the previous patch to move STACK_TOP[_MAX] out of
    asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
    format is available.

    [jdike@addtoit.com: uml: re-remove accidentally restored code]
    Signed-off-by: David Howells
    Cc:
    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

08 Feb, 2008

1 commit

  • struct user.u_ar0 is defined to contain a pointer offset on all
    architectures in which it is defined (all architectures which define an
    a.out format except SPARC.) However, it has a pointer type in the headers,
    which is pointless -- is not exported to userspace, and it
    just makes the code messy.

    Redefine the field as "unsigned long" (which is the same size as a pointer
    on all Linux architectures) and change the setting code to user offsetof()
    instead of hand-coded arithmetic.

    Cc: Linux Arch Mailing List
    Cc: Bryan Wu
    Cc: Roman Zippel
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Lennert Buytenhek
    Cc: Håvard Skinnemoen
    Cc: Mikael Starvik
    Cc: Yoshinori Sato
    Cc: Tony Luck
    Cc: Hirokazu Takata
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     

20 Dec, 2007

1 commit

  • The problem was introduced by commit "mm: variable length argument
    support" (b6a2fea39318e43fee84fa7b0b90d68bed92d2ba)
    as it didn't update fs/binfmt_aout.c like other binfmt's.

    I noticed that on alpha when accidentally launched old OSF/1
    Acrobat Reader binary. Obviously, other architectures are affected
    as well.

    Signed-off-by: Ivan Kokshaysky
    Cc: Ollie Wild
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Hugh Dickins
    Cc: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Ivan Kokshaysky
     

17 Oct, 2007

1 commit

  • For some time /proc/sys/kernel/core_pattern has been able to set its output
    destination as a pipe, allowing a user space helper to receive and
    intellegently process a core. This infrastructure however has some
    shortcommings which can be enhanced. Specifically:

    1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
    when core_pattern is a pipe, since file system resources are not being
    consumed in this case, unless the user application wishes to save the core,
    at which point the app is restricted by usual file system limits and
    restrictions.

    2) The core_pattern code should be able to parse and pass options to the
    user space helper as an argv array. The real core limit of the uid of the
    crashing proces should also be passable to the user space helper (since it
    is overridden to zero when called).

    3) Some miscellaneous bugs need to be cleaned up (specifically the
    recognition of a recursive core dump, should the user mode helper itself
    crash. Also, the core dump code in the kernel should not wait for the user
    mode helper to exit, since the same context is responsible for writing to
    the pipe, and a read of the pipe by the user mode helper will result in a
    deadlock.

    This patch:

    Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that
    core_pattern is a pipe, the entire core will be fed to the user mode helper.

    Signed-off-by: Neil Horman
    Cc:
    Cc:
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     

09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

30 Sep, 2006

1 commit

  • Files supported by fs/proc/base.c, i.e. /proc//*, are not capable of
    meeting the validity checks in ELF load_elf_*() handling because they have
    no mmap handler which is required by ELF. In order to stop a.out
    executables being used as part of an exploit attack against /proc-related
    vulnerabilities, we make a.out executables depend on ->mmap() existing.

    Signed-off-by: Eugene Teo
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eugene Teo
     

11 Jan, 2006

1 commit

  • )

    From: Adrian Bunk

    - create one common dump_thread() prototype in kernel.h

    - dump_thread() is only used in fs/binfmt_aout.c and can therefore be
    removed on all architectures where CONFIG_BINFMT_AOUT is not
    available

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@osdl.org
     

30 Oct, 2005

1 commit

  • How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but
    that's not so good if an mm_counter_t is a special type. And how is rss
    initialized? By set_mm_counter, all over the place. Come on, we just need to
    initialize them both at once by set_mm_counter in mm_init (which follows the
    memcpy when forking).

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

22 Jun, 2005

1 commit

  • Ingo recently introduced a great speedup for allocating new mmaps using the
    free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
    causes huge performance increases in thread creation.

    The downside of this patch is that it does lead to fragmentation in the
    mmap-ed areas (visible via /proc/self/maps), such that some applications
    that work fine under 2.4 kernels quickly run out of memory on any 2.6
    kernel.

    The problem is twofold:

    1) the free_area_cache is used to continue a search for memory where
    the last search ended. Before the change new areas were always
    searched from the base address on.

    So now new small areas are cluttering holes of all sizes
    throughout the whole mmap-able region whereas before small holes
    tended to close holes near the base leaving holes far from the base
    large and available for larger requests.

    2) the free_area_cache also is set to the location of the last
    munmap-ed area so in scenarios where we allocate e.g. five regions of
    1K each, then free regions 4 2 3 in this order the next request for 1K
    will be placed in the position of the old region 3, whereas before we
    appended it to the still active region 1, placing it at the location
    of the old region 2. Before we had 1 free region of 2K, now we only
    get two free regions of 1K -> fragmentation.

    The patch addresses thes issues by introducing yet another cache descriptor
    cached_hole_size that contains the largest known hole size below the
    current free_area_cache. If a new request comes in the size is compared
    against the cached_hole_size and if the request can be filled with a hole
    below free_area_cache the search is started from the base instead.

    The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
    (earlier posted) leakme.c test program terminates after 50000+ iterations
    with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
    (as expected) with thread creation, Ingo's test_str02 with 20000 threads
    requires 0.7s system time.

    Taking out Ingo's patch (un-patch available per request) by basically
    deleting all mentions of free_area_cache from the kernel and starting the
    search for new memory always at the respective bases we observe: leakme
    terminates successfully with 11 distinctive hardly fragmented areas in
    /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
    time for Ingo's test_str02 with 20000 threads.

    Now - drumroll ;-) the appended patch works fine with leakme: it ends with
    only 7 distinct areas in /proc/self/maps and also thread creation seems
    sufficiently fast with 0.71s for 20000 threads.

    Signed-off-by: Wolfgang Wander
    Credit-to: "Richard Purdie"
    Signed-off-by: Ken Chen
    Acked-by: Ingo Molnar (partly)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wolfgang Wander
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds