20 Jul, 2011

1 commit


07 Jul, 2011

1 commit


01 Jun, 2010

1 commit

  • clear_user() returns the number of bytes that could not be copied rather than
    an error code. So we should return -EFAULT rather than directly returning the
    results.

    Without this patch, positive values may be returned to elf_fdpic_map_file()
    and the following error handlings do not function as expected.

    1.
    ret = elf_fdpic_map_file_constdisp_on_uclinux(params, file, mm);
    if (ret < 0)
    return ret;
    2.
    ret = elf_fdpic_map_file_by_direct_mmap(params, file, mm);
    if (ret < 0)
    return ret;

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: David Howells
    Acked-by: Mike Frysinger
    CC: Alexander Viro
    CC: Andrew Morton
    CC: Daisuke HATAYAMA
    CC: Paul Mundt
    Signed-off-by: Linus Torvalds

    Takuya Yoshikawa
     

28 Apr, 2010

1 commit

  • The checks for CONFIG_MMU at this location are duplicated as all the code is
    located inside a #ifndef CONFIG_MMU block. So the first conditional block will
    always be included while the second never will.

    Signed-off-by: Christoph Egger
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Christoph Egger
     

25 Mar, 2010

1 commit

  • Fix an incorrect for-loop in elf_core_vma_data_size(). The advance-pointer
    statement lacks an assignment:

    CC fs/binfmt_elf_fdpic.o
    fs/binfmt_elf_fdpic.c: In function 'elf_core_vma_data_size':
    fs/binfmt_elf_fdpic.c:1593: warning: statement with no effect

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

08 Mar, 2010

1 commit


07 Mar, 2010

6 commits

  • Pass mm->flags as a coredump parameter for consistency.

    ---
    1787 if (mm->core_state || !get_dumpable(mm)) { mmap_sem);
    1789 put_cred(cred);
    1790 goto fail;
    1791 }
    1792
    [...]
    1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ fsuid = 0; /* Dump root private */
    1801 }
    ---

    Since dumpable bits are not protected by lock, there is a chance to change
    these bits between (1) and (2).

    To solve this issue, this patch copies mm->flags to
    coredump_params.mm_flags at the beginning of do_coredump() and uses it
    instead of get_dumpable() while dumping core.

    This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
    dump_filter bits in mm->flags.

    [akpm@linux-foundation.org: fix merge]
    Signed-off-by: Masami Hiramatsu
    Acked-by: Roland McGrath
    Cc: Hidehiro Kawai
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • The current ELF dumper implementation can produce broken corefiles if
    program headers exceed 65535. This number is determined by the number of
    vmas which the process have. In particular, some extreme programs may use
    more than 65535 vmas. (If you google max_map_count, you can find some
    users facing this problem.) This kind of program never be able to generate
    correct coredumps.

    This patch implements ``extended numbering'' that uses sh_info field of
    the first section header instead of e_phnum field in order to represent
    upto 4294967295 vmas.

    This is supported by
    AMD64-ABI(http://www.x86-64.org/documentation.html) and
    Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
    Of course, we are preparing patches for gdb and binutils.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
    extended numbering and so will produce the corefiles with section header
    table in a special case.

    The problem is the process of writing a file header offset of the section
    header table into e_shoff field of the ELF header. ELF header is
    positioned at the beginning of the corefile, while section header at the
    end. So, we need to take which of the following ways:

    1. Seek backward to retry writing operation for ELF header
    after writing process for a whole part

    2. Make offset calculation process and writing process
    totally sequential

    The clause 1. is not always possible: one cannot assume that file system
    supports seek function. Consider the no_llseek case.

    Therefore, this patch adopts the clause 2.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
    macro for hiding _multiline_ logics in functions. This patch removes
    #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
    architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
    order to reduce a range of modification.

    This cleanup is for my next patches, but I think this cleanup itself is
    worth doing regardless of my firnal purpose.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
    them into other newly created *.c files. Then, each files will contain
    dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
    same. So, this patch moves them into a header file with dump_seek().
    Also, the patch deletes confusing DUMP_WRITE macros in each files.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • The current ELF dumper can produce broken corefiles if program headers
    exceed 65535. In particular, the program in 64-bit environment often
    demands more than 65535 mmaps. If you google max_map_count, then you can
    find many users facing this problem.

    Solaris has already dealt with this issue, and other OSes have also
    adopted the same method as in Solaris. Currently, Sun's document and AMD
    64 ABI include the description for the extension, where they call the
    extension Extended Numbering. See Reference for further information.

    I believe that linux kernel should adopt the same way as they did, so I've
    written this patch.

    I am also preparing for patches of GDB and binutils.

    How to fix
    ==========

    In new dumping process, there are two cases according to weather or
    not the number of program headers is equal to or more than 65535.

    - if less than 65535, the produced corefile format is exactly the same
    as the ordinary one.

    - if equal to or more than 65535, then e_phnum field is set to newly
    introduced constant PN_XNUM(0xffff) and the actual number of program
    headers is set to sh_info field of the section header at index 0.

    Compatibility Concern
    =====================

    * As already mentioned in Summary, Sun and AMD64 has already adopted
    this. See Reference.

    * There are four combinations according to whether kernel and userland
    tools are respectively modified or not. The next table summarizes
    shortly for each combination.

    ---------------------------------------------
    Original Kernel | Modified Kernel
    ---------------------------------------------
    < 65535 | >= 65535 | < 65535 | >= 65535
    -------------------------------------------------------------
    Original Tools | OK | broken | OK | broken (#)
    -------------------------------------------------------------
    Modified Tools | OK | broken | OK | OK
    -------------------------------------------------------------

    Note that there is no case that `OK' changes to `broken'.

    (#) Although this case remains broken, O-M behaves better than
    O-O. That is, while in O-O case e_phnum field would be extremely
    small due to integer overflow, in O-M case it is guaranteed to be at
    least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
    actual correct value than the O-O case.

    Test Program
    ============

    Here is a test program mkmmaps.c that is useful to produce the
    corefile with many mmaps. To use this, please take the following
    steps:

    $ ulimit -c unlimited
    $ sysctl vm.max_map_count=70000 # default 65530 is too small
    $ sysctl fs.file-max=70000
    $ mkmmaps 65535

    Then, the program will abort and a corefile will be generated.

    If failed, there are two cases according to the error message
    displayed.

    * ``out of memory'' means vm.max_map_count is still smaller

    * ``too many open files'' means fs.file-max is still smaller

    So, please change it to a larger value, and then retry it.

    mkmmaps.c
    ==
    #include
    #include
    #include
    #include
    #include
    int main(int argc, char **argv)
    {
    int maps_num;
    if (argc < 2) {
    fprintf(stderr, "mkmmaps [number of maps to be created]\n");
    exit(1);
    }
    if (sscanf(argv[1], "%d", &maps_num) == EOF) {
    perror("sscanf");
    exit(2);
    }
    if (maps_num < 0) {
    fprintf(stderr, "%d is invalid\n", maps_num);
    exit(3);
    }
    for (; maps_num > 0; --maps_num) {
    if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
    MAP_SHARED | MAP_ANONYMOUS, (int) -1,
    (off_t) NULL)) {
    perror("mmap");
    exit(4);
    }
    }
    abort();
    {
    char buffer[128];
    sprintf(buffer, "wc -l /proc/%u/maps", getpid());
    system(buffer);
    }
    return 0;
    }

    Tested on i386, ia64 and um/sys-i386.
    Built on sh4 (which covers fs/binfmt_elf_fdpic.c)

    References
    ==========

    - Sun microsystems: Linker and Libraries.
    Part No: 817-1984-17, September 2008.
    URL: http://docs.sun.com/app/docs/doc/817-1984

    - System V ABI AMD64 Architecture Processor Supplement
    Draft Version 0.99., May 11, 2009.
    URL: http://www.x86-64.org/

    This patch:

    There are three different definitions for dump_seek() functions in
    binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The
    only for binfmt_elf.c.

    My next patch will move dump_seek() into a header file in order to share
    the same implementations for dump_write() and dump_seek(). As the first
    step, this patch unify these three definitions for dump_seek() by applying
    the past commits that have been applied only for binfmt_elf.c.

    Specifically, the modification made here is part of the following commits:

    * d025c9db7f31fc0554ce7fb2dfc78d35a77f3487
    * 7f14daa19ea36b200d237ad3ac5826ae25360461

    This patch does not change a shape of corefiles.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     

09 Feb, 2010

1 commit

  • In particular, several occurances of funny versions of 'success',
    'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
    'beginning', 'desirable', 'separate' and 'necessary' are fixed.

    Signed-off-by: Daniel Mack
    Cc: Joe Perches
    Cc: Junio C Hamano
    Signed-off-by: Jiri Kosina

    Daniel Mack
     

30 Jan, 2010

1 commit

  • 'flush_old_exec()' is the point of no return when doing an execve(), and
    it is pretty badly misnamed. It doesn't just flush the old executable
    environment, it also starts up the new one.

    Which is very inconvenient for things like setting up the new
    personality, because we want the new personality to affect the starting
    of the new environment, but at the same time we do _not_ want the new
    personality to take effect if flushing the old one fails.

    As a result, the x86-64 '32-bit' personality is actually done using this
    insane "I'm going to change the ABI, but I haven't done it yet" bit
    (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
    personality, but just the "pending" bit, so that "flush_thread()" can do
    the actual personality magic.

    This patch in no way changes any of that insanity, but it does split the
    'flush_old_exec()' function up into a preparatory part that can fail
    (still called flush_old_exec()), and a new part that will actually set
    up the new exec environment (setup_new_exec()). All callers are changed
    to trivially comply with the new world order.

    Signed-off-by: H. Peter Anvin
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Jan, 2010

1 commit

  • The current code will load the stack size and protection markings, but
    then only use the markings in the MMU code path. The NOMMU code path
    always passes PROT_EXEC to the mmap() call. While this doesn't matter
    to most people whilst the code is running, it will cause a pointless
    icache flush when starting every FDPIC application. Typically this
    icache flush will be of a region on the order of 128KB in size, or may
    be the entire icache, depending on the facilities available on the CPU.

    In the case where the arch default behaviour seems to be desired
    (EXSTACK_DEFAULT), we probe VM_STACK_FLAGS for VM_EXEC to determine
    whether we should be setting PROT_EXEC or not.

    For arches that support an MPU (Memory Protection Unit - an MMU without
    the virtual mapping capability), setting PROT_EXEC or not will make an
    important difference.

    It should be noted that this change also affects the executability of
    the brk region, since ELF-FDPIC has that share with the stack. However,
    this is probably irrelevant as NOMMU programs aren't likely to use the
    brk region, preferring instead allocation via mmap().

    Signed-off-by: Mike Frysinger
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     

04 Jan, 2010

1 commit


18 Dec, 2009

1 commit

  • Introduce coredump parameter data structure (struct coredump_params) to
    simplify binfmt->core_dump() arguments.

    Signed-off-by: Masami Hiramatsu
    Suggested-by: Ingo Molnar
    Cc: Hidehiro Kawai
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     

16 Dec, 2009

2 commits

  • Currently all architectures but microblaze unconditionally define
    USE_ELF_CORE_DUMP. The microblaze omission seems like an error to me, so
    let's kill this ifdef and make sure we are the same everywhere.

    Signed-off-by: Christoph Hellwig
    Acked-by: Hugh Dickins
    Cc:
    Cc: Michal Simek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The NOMMU code currently clears all anonymous mmapped memory. While this
    is what we want in the default case, all memory allocation from userspace
    under NOMMU has to go through this interface, including malloc() which is
    allowed to return uninitialized memory. This can easily be a significant
    performance penalty. So for constrained embedded systems were security is
    irrelevant, allow people to avoid clearing memory unnecessarily.

    This also alters the ELF-FDPIC binfmt such that it obtains uninitialised
    memory for the brk and stack region.

    Signed-off-by: Jie Zhang
    Signed-off-by: Robin Getz
    Signed-off-by: Mike Frysinger
    Signed-off-by: David Howells
    Acked-by: Paul Mundt
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jie Zhang
     

24 Sep, 2009

1 commit

  • Ignore the loader's PT_GNU_STACK when calculating the stack size, and only
    consider the executable's PT_GNU_STACK, assuming the executable has one.

    Currently the behaviour is to take the largest stack size and use that,
    but that means you can't reduce the stack size in the executable. The
    loader's stack size should probably only be used when executing the loader
    directly.

    WARNING: This patch is slightly dangerous - it may render a system
    inoperable if the loader's stack size is larger than that of important
    executables, and the system relies unknowingly on this increasing the size
    of the stack.

    Signed-off-by: David Howells
    Signed-off-by: Mike Frysinger
    Acked-by: Paul Mundt
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

22 Sep, 2009

1 commit

  • In preparation for the next patch, add a simple get_dump_page(addr)
    interface for the CONFIG_ELF_CORE dumpers to use, instead of calling
    get_user_pages() directly. They're not interested in errors: they
    just want to use holes as much as possible, to save space and make
    sure that the data is aligned where the headers said it would be.

    Oh, and don't use that horrid DUMP_SEEK(off) macro!

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

19 Jun, 2009

1 commit


03 May, 2009

1 commit


03 Apr, 2009

1 commit


08 Jan, 2009

2 commits

  • Stop the ELF-FDPIC binfmt from attempting to expand the userspace stack and brk
    segments to fill the space actually allocated for it. The space allocated may
    be rounded up by mmap(), and may be wasted.

    However, finding out how much space we actually obtained uses the contentious
    kobjsize() function which we'd like to get rid of as it doesn't necessarily
    work for all slab allocators.

    Signed-off-by: David Howells
    Tested-by: Mike Frysinger
    Acked-by: Paul Mundt

    David Howells
     
  • Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

    (1) In SYSV SHM where nattch for a segment does not reflect the number of
    shmat's (and forks) done.

    (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
    exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
    that a VMA might be shared and already have its vm_mm assigned to another
    process or a dead process.

    A new struct (vm_region) is introduced to track a mapped region and to remember
    the circumstances under which it may be shared and the vm_list_struct structure
    is discarded as it's no longer required.

    This patch makes the following additional changes:

    (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
    with no recourse to __GFP_COMP, so the pages are not composite. Instead,
    each page has a reference on it held by the region. Anything else that is
    interested in such a page will have to get a reference on it to retain it.
    When the pages are released due to unmapping, each page is passed to
    put_page() and will be freed when the page usage count reaches zero.

    (2) Excess pages are trimmed after an allocation as the allocation must be
    made as a power-of-2 quantity of pages.

    (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
    end up with overlapping VMAs within the tree, the VMA struct address is
    appended to the sort key.

    (4) Non-anonymous VMAs are now added to the backing inode's prio list.

    (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
    the backing region. The VMA and region structs will be split if
    necessary.

    (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
    segment instead of all the attachments at that addresss. Multiple
    shmat()'s return the same address under NOMMU-mode instead of different
    virtual addresses as under MMU-mode.

    (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

    (8) /proc/maps is now the global list of mapped regions, and may list bits
    that aren't actually mapped anywhere.

    (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
    of RAM currently allocated by mmap to hold mappable regions that can't be
    mapped directly. These are copies of the backing device or file if not
    anonymous.

    These changes make NOMMU mode more similar to MMU mode. The downside is that
    NOMMU mode requires some extra memory to track things over NOMMU without this
    patch (VMAs are no longer shared, and there are now region structs).

    Signed-off-by: David Howells
    Tested-by: Mike Frysinger
    Acked-by: Paul Mundt

    David Howells
     

14 Nov, 2008

5 commits

  • Make execve() take advantage of copy-on-write credentials, allowing it to set
    up the credentials in advance, and then commit the whole lot after the point
    of no return.

    This patch and the preceding patches have been tested with the LTP SELinux
    testsuite.

    This patch makes several logical sets of alteration:

    (1) execve().

    The credential bits from struct linux_binprm are, for the most part,
    replaced with a single credentials pointer (bprm->cred). This means that
    all the creds can be calculated in advance and then applied at the point
    of no return with no possibility of failure.

    I would like to replace bprm->cap_effective with:

    cap_isclear(bprm->cap_effective)

    but this seems impossible due to special behaviour for processes of pid 1
    (they always retain their parent's capability masks where normally they'd
    be changed - see cap_bprm_set_creds()).

    The following sequence of events now happens:

    (a) At the start of do_execve, the current task's cred_exec_mutex is
    locked to prevent PTRACE_ATTACH from obsoleting the calculation of
    creds that we make.

    (a) prepare_exec_creds() is then called to make a copy of the current
    task's credentials and prepare it. This copy is then assigned to
    bprm->cred.

    This renders security_bprm_alloc() and security_bprm_free()
    unnecessary, and so they've been removed.

    (b) The determination of unsafe execution is now performed immediately
    after (a) rather than later on in the code. The result is stored in
    bprm->unsafe for future reference.

    (c) prepare_binprm() is called, possibly multiple times.

    (i) This applies the result of set[ug]id binaries to the new creds
    attached to bprm->cred. Personality bit clearance is recorded,
    but now deferred on the basis that the exec procedure may yet
    fail.

    (ii) This then calls the new security_bprm_set_creds(). This should
    calculate the new LSM and capability credentials into *bprm->cred.

    This folds together security_bprm_set() and parts of
    security_bprm_apply_creds() (these two have been removed).
    Anything that might fail must be done at this point.

    (iii) bprm->cred_prepared is set to 1.

    bprm->cred_prepared is 0 on the first pass of the security
    calculations, and 1 on all subsequent passes. This allows SELinux
    in (ii) to base its calculations only on the initial script and
    not on the interpreter.

    (d) flush_old_exec() is called to commit the task to execution. This
    performs the following steps with regard to credentials:

    (i) Clear pdeath_signal and set dumpable on certain circumstances that
    may not be covered by commit_creds().

    (ii) Clear any bits in current->personality that were deferred from
    (c.i).

    (e) install_exec_creds() [compute_creds() as was] is called to install the
    new credentials. This performs the following steps with regard to
    credentials:

    (i) Calls security_bprm_committing_creds() to apply any security
    requirements, such as flushing unauthorised files in SELinux, that
    must be done before the credentials are changed.

    This is made up of bits of security_bprm_apply_creds() and
    security_bprm_post_apply_creds(), both of which have been removed.
    This function is not allowed to fail; anything that might fail
    must have been done in (c.ii).

    (ii) Calls commit_creds() to apply the new credentials in a single
    assignment (more or less). Possibly pdeath_signal and dumpable
    should be part of struct creds.

    (iii) Unlocks the task's cred_replace_mutex, thus allowing
    PTRACE_ATTACH to take place.

    (iv) Clears The bprm->cred pointer as the credentials it was holding
    are now immutable.

    (v) Calls security_bprm_committed_creds() to apply any security
    alterations that must be done after the creds have been changed.
    SELinux uses this to flush signals and signal handlers.

    (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
    to destroy the proposed new credentials and will then unlock
    cred_replace_mutex. No changes to the credentials will have been
    made.

    (2) LSM interface.

    A number of functions have been changed, added or removed:

    (*) security_bprm_alloc(), ->bprm_alloc_security()
    (*) security_bprm_free(), ->bprm_free_security()

    Removed in favour of preparing new credentials and modifying those.

    (*) security_bprm_apply_creds(), ->bprm_apply_creds()
    (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

    Removed; split between security_bprm_set_creds(),
    security_bprm_committing_creds() and security_bprm_committed_creds().

    (*) security_bprm_set(), ->bprm_set_security()

    Removed; folded into security_bprm_set_creds().

    (*) security_bprm_set_creds(), ->bprm_set_creds()

    New. The new credentials in bprm->creds should be checked and set up
    as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the
    second and subsequent calls.

    (*) security_bprm_committing_creds(), ->bprm_committing_creds()
    (*) security_bprm_committed_creds(), ->bprm_committed_creds()

    New. Apply the security effects of the new credentials. This
    includes closing unauthorised files in SELinux. This function may not
    fail. When the former is called, the creds haven't yet been applied
    to the process; when the latter is called, they have.

    The former may access bprm->cred, the latter may not.

    (3) SELinux.

    SELinux has a number of changes, in addition to those to support the LSM
    interface changes mentioned above:

    (a) The bprm_security_struct struct has been removed in favour of using
    the credentials-under-construction approach.

    (c) flush_unauthorized_files() now takes a cred pointer and passes it on
    to inode_has_perm(), file_has_perm() and dentry_open().

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Use RCU to access another task's creds and to release a task's own creds.
    This means that it will be possible for the credentials of a task to be
    replaced without another task (a) requiring a full lock to read them, and (b)
    seeing deallocated memory.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Wrap current->cred and a few other accessors to hide their actual
    implementation.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Separate the task security context from task_struct. At this point, the
    security data is temporarily embedded in the task_struct with two pointers
    pointing to it.

    Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
    entry.S via asm-offsets.

    With comment fixes Signed-off-by: Marc Dionne

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Signed-off-by: James Morris

    David Howells
     

21 Oct, 2008

1 commit

  • Commit f06febc96ba8e0af80bcc3eaec0a109e88275fac ("timers: fix itimer/
    many thread hang") introduced a new task_cputime interface and
    subsequently only converted binfmt_elf over to it. This results in the
    build for binfmt_elf_fdpic blowing up given that p->signal->{u,s}time
    have disappeared from underneath us.

    Apply the same trivial fix from binfmt_elf to binfmt_elf_fdpic.

    Signed-off-by: Paul Mundt
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

17 Oct, 2008

3 commits

  • These auxvec entries are the only ones left unhandled out of the current
    base implementation. This syncs up binfmt_elf_fdpic with linux/auxvec.h
    and current binfmt_elf.

    Signed-off-by: Paul Mundt
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • binfmt_elf_fdpic seems to have grabbed a hard-coded hack from an ancient
    version of binfmt_elf in order to try and fix up initial stack alignment
    on multi-threaded x86, which while in addition to being unused, was also
    pushed down beyond the first set of operations on the stack pointer,
    negating the entire purpose.

    These days, we have an architecture independent arch_align_stack(), so we
    switch to using that instead. Move the initial alignment up before the
    initial stores while we're at it.

    Signed-off-by: Paul Mundt
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • Commit 483fad1c3fa1060d7e6710e84a065ad514571739 ("ELF loader support for
    auxvec base platform string") introduced AT_BASE_PLATFORM, but only
    implemented it for binfmt_elf.

    Given that AT_VECTOR_SIZE_BASE is unconditionally enlarged for us, and
    it's only optionally added in for the platforms that set
    ELF_BASE_PLATFORM, wire it up for binfmt_elf_fdpic, too.

    Signed-off-by: Paul Mundt
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

28 Jul, 2008

1 commit

  • While implementing binfmt_elf_fdpic on SH it quickly became apparent
    that SH was the first platform to support both binfmt_elf_fdpic and
    binfmt_elf, as well as the only of the FDPIC platforms to make use of the
    auxvt.

    Currently binfmt_elf_fdpic uses a special version of NEW_AUX_ENT() where
    the first argument is the entry displacement after csp has been adjusted,
    being reset after each adjustment. As we have no ability to sort this out
    through the platform's ARCH_DLINFO, this index needs to be managed
    entirely in create_elf_fdpic_tables(). Presently none of the platforms
    that set their own auxvt entries are able to do so through their
    respective ARCH_DLINFOs when using binfmt_elf_fdpic.

    In addition to this, binfmt_elf_fdpic has been looking at
    DLINFO_ARCH_ITEMS for the number of architecture-specific entries in the
    auxvt. This is legacy cruft, and is not defined by any platforms in-tree,
    even those that make heavy use of the auxvt. AT_VECTOR_SIZE_ARCH is
    always available, and contains the number that is of interest here, so we
    switch to using that unconditionally as well.

    As this has direct bearing on how much stack is used, platforms that have
    configurable (or dynamically adjustable) NEW_AUX_ENT calls need to either
    make AT_VECTOR_SIZE_ARCH more fine-grained, or leave it as a worst-case
    and live with some lost stack space if those entries aren't pushed (some
    platforms may also need to purposely sacrifice some space here for
    alignment considerations, as noted in the code -- although not an issue
    for any FDPIC-capable platform today).

    Signed-off-by: Paul Mundt
    Acked-by: David Howells

    Paul Mundt
     

27 Jul, 2008

1 commit

  • This moves all the ptrace hooks related to exec into tracehook.h inlines.

    This also lifts the calls for tracing out of the binfmt load_binary hooks
    into search_binary_handler() after it calls into the binfmt module. This
    change has no effect, since all the binfmt modules' load_binary functions
    did the call at the end on success, and now search_binary_handler() does
    it immediately after return if successful. We consolidate the repeated
    code, and binfmt modules no longer need to import ptrace_notify().

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

26 Jul, 2008

2 commits

  • Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list
    encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/.

    This patch allows futher cleanups in binfmt_elf_fdpic.c.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • linux_binfmt->core_dump() runs before the process does exit_aio(), this
    means that we can hit the kernel thread which shares the same ->mm.
    Afaics, nothing really bad can happen, but perhaps it makes sense to fix
    this minor bug.

    It is sad we have to iterate over all threads in system and use
    GFP_ATOMIC. Hopefully we can kill theses ugly do_each_thread()s, but this
    needs some nontrivial changes in mm_struct and do_coredump.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

07 Jun, 2008

1 commit

  • The nommu binfmt code uses ksize() for pointers returned from do_mmap()
    which is wrong. This converts the call-sites to use the nommu specific
    kobjsize() function which works as expected.

    Cc: Christoph Lameter
    Cc: Matt Mackall
    Acked-by: Paul Mundt
    Acked-by: David Howells
    Signed-off-by: Pekka Enberg
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg