11 Jul, 2013

1 commit

  • Since all architectures have been converted to use vm_unmapped_area(),
    there is no remaining use for the free_area_cache.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: "James E.J. Bottomley"
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Helge Deller
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Paul Mackerras
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

03 May, 2013

1 commit

  • Pull powerpc update from Benjamin Herrenschmidt:
    "The main highlights this time around are:

    - A pile of addition POWER8 bits and nits, such as updated
    performance counter support (Michael Ellerman), new branch history
    buffer support (Anshuman Khandual), base support for the new PCI
    host bridge when not using the hypervisor (Gavin Shan) and other
    random related bits and fixes from various contributors.

    - Some rework of our page table format by Aneesh Kumar which fixes a
    thing or two and paves the way for THP support. THP itself will
    not make it this time around however.

    - More Freescale updates, including Altivec support on the new e6500
    cores, new PCI controller support, and a pile of new boards support
    and updates.

    - The usual batch of trivial cleanups & fixes"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
    powerpc: Fix build error for book3e
    powerpc: Context switch the new EBB SPRs
    powerpc: Turn on the EBB H/FSCR bits
    powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
    powerpc: Setup BHRB instructions facility in HFSCR for POWER8
    powerpc: Fix interrupt range check on debug exception
    powerpc: Update tlbie/tlbiel as per ISA doc
    powerpc: Print page size info during boot
    powerpc: print both base and actual page size on hash failure
    powerpc: Fix hpte_decode to use the correct decoding for page sizes
    powerpc: Decode the pte-lp-encoding bits correctly.
    powerpc: Use encode avpn where we need only avpn values
    powerpc: Reduce PTE table memory wastage
    powerpc: Move the pte free routines from common header
    powerpc: Reduce the PTE_INDEX_SIZE
    powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
    powerpc: New hugepage directory format
    powerpc: Don't truncate pgd_index wrongly
    powerpc: Don't hard code the size of pte page
    powerpc: Save DAR and DSISR in pt_regs on MCE
    ...

    Linus Torvalds
     

01 May, 2013

2 commits

  • Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
    zap_threads() called by do_coredump().

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The comment I originally added in commit a3defbe5c337 ("binfmt_elf: fix
    PIE execution with randomization disabled") is not really 100% accurate
    -- sysctl is not the only way how PF_RANDOMIZE could be forcibly unset
    in runtime.

    Another option of course is direct modification of personality flags
    (i.e. running through setarch wrapper).

    Make the comment more explicit and accurate.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Kosina
     

26 Apr, 2013

1 commit

  • We are currently out of free bits in AT_HWCAP. With POWER8, we have
    several hardware features that we need to advertise.

    Tested on POWER and x86.

    Signed-off-by: Michael Neuling
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     

18 Apr, 2013

1 commit

  • Documentation/filesystems/proc.txt says about coredump_filter bitmask,

    Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
    effected by bit 5-6.

    However current code can go into the subsequent flag checks of bit 0-4
    for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work
    as written in the document.

    Signed-off-by: Naoya Horiguchi
    Reviewed-by: Rik van Riel
    Acked-by: Michal Hocko
    Reviewed-by: HATAYAMA Daisuke
    Acked-by: KOSAKI Motohiro
    Acked-by: David Rientjes
    Cc: [3.7+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

04 Mar, 2013

1 commit

  • Pull new ImgTec Meta architecture from James Hogan:
    "This adds core architecture support for Imagination's Meta processor
    cores, followed by some later miscellaneous arch/metag cleanups and
    fixes which I kept separate to ease review:

    - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
    - A few fixes all over, particularly for symbol prefixes
    - A few privilege protection fixes
    - Several cleanups (setup.c includes, split out a lot of
    metag_ksyms.c)
    - Fix some missing exports
    - Convert hugetlb to use vm_unmapped_area()
    - Copy device tree to non-init memory
    - Provide dma_get_sgtable()"

    * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits)
    metag: Provide dma_get_sgtable()
    metag: prom.h: remove declaration of metag_dt_memblock_reserve()
    metag: copy devicetree to non-init memory
    metag: cleanup metag_ksyms.c includes
    metag: move mm/init.c exports out of metag_ksyms.c
    metag: move usercopy.c exports out of metag_ksyms.c
    metag: move setup.c exports out of metag_ksyms.c
    metag: move kick.c exports out of metag_ksyms.c
    metag: move traps.c exports out of metag_ksyms.c
    metag: move irq enable out of irqflags.h on SMP
    genksyms: fix metag symbol prefix on crc symbols
    metag: hugetlb: convert to vm_unmapped_area()
    metag: export clear_page and copy_page
    metag: export metag_code_cache_flush_all
    metag: protect more non-MMU memory regions
    metag: make TXPRIVEXT bits explicit
    metag: kernel/setup.c: sort includes
    perf: Enable building perf tools for Meta
    metag: add boot time LNKGET/LNKSET check
    metag: add __init to metag_cache_probe()
    ...

    Linus Torvalds
     

03 Mar, 2013

1 commit

  • The commit "binfmt_elf: cleanups"
    (f670d0ecda73b7438eec9ed108680bc5f5362ad8) removed an ifndef elf_map but
    this breaks compilation for metag which does define elf_map.

    This adds the ifndef back in as it was before, but does not affect the
    other cleanups made by that patch.

    Signed-off-by: James Hogan
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Acked-by: Mikael Pettersson

    James Hogan
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Feb, 2013

1 commit


28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

18 Dec, 2012

1 commit

  • If elf_core_dump() is called and fill_note_info() fails in the kmalloc()
    then it returns 0 but has not yet initialised all the needed fields. As a
    result we do a kfree(randomness) after correctly skipping the thread data.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

29 Nov, 2012

1 commit


10 Oct, 2012

1 commit

  • Pull generic execve() changes from Al Viro:
    "This introduces the generic kernel_thread() and kernel_execve()
    functions, and switches x86, arm, alpha, um and s390 over to them."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
    s390: convert to generic kernel_execve()
    s390: switch to generic kernel_thread()
    s390: fold kernel_thread_helper() into ret_from_fork()
    s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
    um: switch to generic kernel_thread()
    x86, um/x86: switch to generic sys_execve and kernel_execve
    x86: split ret_from_fork
    alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
    alpha: switch to generic kernel_thread()
    alpha: switch to generic sys_execve()
    arm: get rid of execve wrapper, switch to generic execve() implementation
    arm: optimized current_pt_regs()
    arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
    arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
    generic sys_execve()
    generic kernel_execve()
    new helper: current_pt_regs()
    preparation for generic kernel_thread()
    um: kill thread->forking
    um: let signal_delivered() do SIGTRAP on singlestepping into handler
    ...

    Linus Torvalds
     

09 Oct, 2012

2 commits

  • A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
    currently it lost original meaning but still has some effects:

    | effect | alternative flags
    -+------------------------+---------------------------------------------
    1| account as reserved_vm | VM_IO
    2| skip in core dump | VM_IO, VM_DONTDUMP
    3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
    4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

    This patch removes reserved_vm counter from mm_struct. Seems like nobody
    cares about it, it does not exported into userspace directly, it only
    reduces total_vm showed in proc.

    Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

    remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
    remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

    [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags:
    VM_DONTEXPAND, VM_DONTCOPY. Currently this flag used only for
    sys_madvise. The next patch will use it for replacing the outdated flag
    VM_RESERVED.

    Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL
    (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)

    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

06 Oct, 2012

4 commits

  • This note has the following format:

    long count -- how many files are mapped
    long page_size -- units for file_ofs
    array of [COUNT] elements of
    long start
    long end
    long file_ofs
    followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...

    Signed-off-by: Denys Vlasenko
    Cc: Oleg Nesterov
    Cc: Amerigo Wang
    Cc: "Jonathan M. Foote"
    Cc: Roland McGrath
    Cc: Pedro Alves
    Cc: Fengguang Wu
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
    from the siginfo of the signal which caused core to be dumped.

    There are tools which try to analyze crashes for possible security
    implications, and they want to use, among other data, si_addr field from
    the SIGSEGV.

    This patch adds a new elf note, NT_SIGINFO, which contains the complete
    siginfo_t of the signal which killed the process.

    Signed-off-by: Denys Vlasenko
    Reviewed-by: Oleg Nesterov
    Cc: Amerigo Wang
    Cc: "Jonathan M. Foote"
    Cc: Roland McGrath
    Cc: Pedro Alves
    Cc: Fengguang Wu
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • This is a preparatory patch for the introduction of NT_SIGINFO elf note.

    With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
    do_coredump() and put it into coredump_params. It will be used by the
    next patch. Most changes are simple s/signr/siginfo->si_signo/.

    Signed-off-by: Denys Vlasenko
    Reviewed-by: Oleg Nesterov
    Cc: Amerigo Wang
    Cc: "Jonathan M. Foote"
    Cc: Roland McGrath
    Cc: Pedro Alves
    Cc: Fengguang Wu
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • load_elf_interp() has interp_map_addr carefully described as
    "uninitialized_var" and marked so as to avoid a warning. However if you
    trace the code it is passed into load_elf_interp and then this value is
    checked against NULL.

    As this return value isn't used this is actually safe but it freaks
    various analysis tools that see un-initialized memory addresses being read
    before their value is ever defined.

    Set it to NULL as a matter of programming good taste if nothing else

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

27 Sep, 2012

1 commit

  • In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
    memory for info->fields, it frees already allocated stuff and returns
    error to its caller, fill_note_info. Which in turn returns error to its
    caller, elf_core_dump. Which jumps to cleanup label and calls
    free_note_info, which will happily try to free all info->fields again.
    BOOM.

    This is the fix.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Denys Vlasenko
    Cc: Venu Byravarasu
    Cc:
    Signed-off-by: Andrew Morton

    Denys Vlasenko
     

20 Sep, 2012

1 commit


31 May, 2012

1 commit


24 May, 2012

1 commit

  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     

16 May, 2012

1 commit


21 Apr, 2012

2 commits

  • This continues the theme started with vm_brk() and vm_munmap():
    vm_mmap() does the same thing as do_mmap(), but additionally does the
    required VM locking.

    This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
    duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have
    to export our internal do_mmap_pgoff() function.

    Some day we hopefully don't have to export do_mmap() either, if all
    modular users can become the simpler vm_mmap() instead. We're actually
    very close to that already, with the notable exception of the (broken)
    use in i810, and a couple of stragglers in binfmt_elf.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • It does the same thing as "do_brk()", except it handles the VM locking
    too.

    It turns out that all external callers want that anyway, so we can make
    do_brk() static to just mm/mmap.c while at it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 Mar, 2012

1 commit

  • Pull x32 support for x86-64 from Ingo Molnar:
    "This tree introduces the X32 binary format and execution mode for x86:
    32-bit data space binaries using 64-bit instructions and 64-bit kernel
    syscalls.

    This allows applications whose working set fits into a 32 bits address
    space to make use of 64-bit instructions while using a 32-bit address
    space with shorter pointers, more compressed data structures, etc."

    Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

    * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    x32: Fix alignment fail in struct compat_siginfo
    x32: Fix stupid ia32/x32 inversion in the siginfo format
    x32: Add ptrace for x32
    x32: Switch to a 64-bit clock_t
    x32: Provide separate is_ia32_task() and is_x32_task() predicates
    x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
    x86/x32: Fix the binutils auto-detect
    x32: Warn and disable rather than error if binutils too old
    x32: Only clear TIF_X32 flag once
    x32: Make sure TS_COMPAT is cleared for x32 tasks
    fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
    fs: Fix close_on_exec pointer in alloc_fdtable
    x32: Drop non-__vdso weak symbols from the x32 VDSO
    x32: Fix coding style violations in the x32 VDSO code
    x32: Add x32 VDSO support
    x32: Allow x32 to be configured
    x32: If configured, add x32 system calls to system call tables
    x32: Handle process creation
    x32: Signal-related system calls
    x86: Add #ifdef CONFIG_COMPAT to
    ...

    Linus Torvalds
     

29 Mar, 2012

2 commits

  • …m/linux/kernel/git/dhowells/linux-asm_system

    Pull "Disintegrate and delete asm/system.h" from David Howells:
    "Here are a bunch of patches to disintegrate asm/system.h into a set of
    separate bits to relieve the problem of circular inclusion
    dependencies.

    I've built all the working defconfigs from all the arches that I can
    and made sure that they don't break.

    The reason for these patches is that I recently encountered a circular
    dependency problem that came about when I produced some patches to
    optimise get_order() by rewriting it to use ilog2().

    This uses bitops - and on the SH arch asm/bitops.h drags in
    asm-generic/get_order.h by a circuituous route involving asm/system.h.

    The main difficulty seems to be asm/system.h. It holds a number of
    low level bits with no/few dependencies that are commonly used (eg.
    memory barriers) and a number of bits with more dependencies that
    aren't used in many places (eg. switch_to()).

    These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

    Move memory barriers here. This already done for MIPS and Alpha.

    (2) asm/switch_to.h

    Move switch_to() and related stuff here.

    (3) asm/exec.h

    Move arch_align_stack() here. Other process execution related bits
    could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

    Move xchg() and cmpxchg() here as they're full word atomic ops and
    frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

    Move die() and related bits.

    (6) asm/auxvec.h

    Move AT_VECTOR_SIZE_ARCH here.

    Other arch headers are created as needed on a per-arch basis."

    Fixed up some conflicts from other header file cleanups and moving code
    around that has happened in the meantime, so David's testing is somewhat
    weakened by that. We'll find out anything that got broken and fix it..

    * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
    Delete all instances of asm/system.h
    Remove all #inclusions of asm/system.h
    Add #includes needed to permit the removal of asm/system.h
    Move all declarations of free_initmem() to linux/mm.h
    Disintegrate asm/system.h for OpenRISC
    Split arch_align_stack() out from asm-generic/system.h
    Split the switch_to() wrapper out of asm-generic/system.h
    Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
    Create asm-generic/barrier.h
    Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
    Disintegrate asm/system.h for Xtensa
    Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
    Disintegrate asm/system.h for Tile
    Disintegrate asm/system.h for Sparc
    Disintegrate asm/system.h for SH
    Disintegrate asm/system.h for Score
    Disintegrate asm/system.h for S390
    Disintegrate asm/system.h for PowerPC
    Disintegrate asm/system.h for PA-RISC
    Disintegrate asm/system.h for MN10300
    ...

    Linus Torvalds
     
  • asm/system.h is a cause of circular dependency problems because it contains
    commonly used primitive stuff like barrier definitions and uncommonly used
    stuff like switch_to() that might require MMU definitions.

    asm/system.h has been disintegrated by this point on all arches into the
    following common segments:

    (1) asm/barrier.h

    Moved memory barrier definitions here.

    (2) asm/cmpxchg.h

    Moved xchg() and cmpxchg() here. #included in asm/atomic.h.

    (3) asm/bug.h

    Moved die() and similar here.

    (4) asm/exec.h

    Moved arch_align_stack() here.

    (5) asm/elf.h

    Moved AT_VECTOR_SIZE_ARCH here.

    (6) asm/switch_to.h

    Moved switch_to() here.

    Signed-off-by: David Howells

    David Howells
     

24 Mar, 2012

2 commits

  • Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
    for 'VM_NODUMP' flag. The idea is is to add a new madvise() flag:
    MADV_DONTDUMP, which can be set by applications to specifically request
    memory regions which should not dump core.

    The specific application I have in mind is qemu: we can add a flag there
    that wouldn't dump all of guest memory when qemu dumps core. This flag
    might also be useful for security sensitive apps that want to absolutely
    make sure that parts of memory are not dumped. To clear the flag use:
    MADV_DODUMP.

    [akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
    [akpm@linux-foundation.org: fix up the architectures which broke]
    Signed-off-by: Jason Baron
    Acked-by: Roland McGrath
    Cc: Chris Metcalf
    Cc: Avi Kivity
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • The motivation for this patchset was that I was looking at a way for a
    qemu-kvm process, to exclude the guest memory from its core dump, which
    can be quite large. There are already a number of filter flags in
    /proc//coredump_filter, however, these allow one to specify 'types'
    of kernel memory, not specific address ranges (which is needed in this
    case).

    Since there are no more vma flags available, the first patch eliminates
    the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by
    the kernel to mark vdso and vsyscall pages. However, it is simple
    enough to check if a vma covers a vdso or vsyscall page without the need
    for this flag.

    The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
    'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
    'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters
    continue to work the same as before unless 'MADV_DONTDUMP' is set on the
    region.

    The qemu code which implements this features is at:

    http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch

    In my testing the qemu core dump shrunk from 383MB -> 13MB with this
    patch.

    I also believe that the 'MADV_DONTDUMP' flag might be useful for
    security sensitive apps, which might want to select which areas are
    dumped.

    This patch:

    The VM_ALWAYSDUMP flag is currently used by the coredump code to
    indicate that a vma is part of a vsyscall or vdso section. However, we
    can determine if a vma is in one these sections by checking it against
    the gate_vma and checking for a non-NULL return value from
    arch_vma_name(). Thus, freeing a valuable vma bit.

    Signed-off-by: Jason Baron
    Acked-by: Roland McGrath
    Cc: Chris Metcalf
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

21 Mar, 2012

2 commits


03 Mar, 2012

1 commit

  • The regset common infrastructure assumed that regsets would always
    have .get and .set methods, but not necessarily .active methods.
    Unfortunately people have since written regsets without .set methods.

    Rather than putting in stub functions everywhere, handle regsets with
    null .get or .set methods explicitly.

    Signed-off-by: H. Peter Anvin
    Reviewed-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc:
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     

21 Feb, 2012

1 commit


11 Jan, 2012

1 commit

  • Randomization of PIE load address is hard coded in binfmt_elf.c for X86
    and ARM. Create a new Kconfig variable
    (CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead. Thus
    architecture specific policy is pushed out of the generic binfmt_elf.c and
    into the architecture Kconfig files.

    X86 and ARM Kconfigs are modified to select the new variable so there is
    no change in behavior. A follow on patch will select it for MIPS too.

    Signed-off-by: David Daney
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Alexander Viro
    Acked-by: H. Peter Anvin
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Daney
     

03 Nov, 2011

1 commit

  • The case of address space randomization being disabled in runtime through
    randomize_va_space sysctl is not treated properly in load_elf_binary(),
    resulting in SIGKILL coming at exec() time for certain PIE-linked binaries
    in case the randomization has been disabled at runtime prior to calling
    exec().

    Handle the randomize_va_space == 0 case the same way as if we were not
    supporting .text randomization at all.

    Based on original patch by H.J. Lu and Josh Boyer.

    Signed-off-by: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: H.J. Lu
    Cc:
    Tested-by: Josh Boyer
    Acked-by: Nicolas Pitre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Kosina
     

20 Jul, 2011

1 commit