01 Nov, 2009

1 commit

  • Don't pass NULL pointers to fput() in the error handling paths of the NOMMU
    do_mmap_pgoff() as it can't handle it.

    The following can be used as a test program:

    int main() { static long long a[1024 * 1024 * 20] = { 0 }; return a;}

    Without the patch, the code oopses in atomic_long_dec_and_test() as called by
    fput() after the kernel complains that it can't allocate that big a chunk of
    memory. With the patch, the kernel just complains about the allocation size
    and then the program segfaults during execve() as execve() can't complete the
    allocation of all the new ELF program segments.

    Reported-by: Robin Getz
    Signed-off-by: David Howells
    Acked-by: Robin Getz
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    David Howells
     

28 Sep, 2009

1 commit


25 Sep, 2009

2 commits

  • Ignore the address parameter given to NOMMU mmap() as it is a hint, rather
    than giving an error if it's non-zero. MAP_FIXED still gets an error.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix MAP_PRIVATE mmap() of files and devices where the data in the backing store
    might be mapped directly. Use the BDI_CAP_MAP_DIRECT capability flag to govern
    whether or not we should be trying to map a file directly. This can be used to
    determine whether or not a region has been filled in at the point where we call
    do_mmap_shared() or do_mmap_private().

    The BDI_CAP_MAP_DIRECT capability flag is cleared by validate_mmap_request() if
    there's any reason we can't use it. It's also cleared in do_mmap_pgoff() if
    f_op->get_unmapped_area() fails.

    Without this fix, attempting to run a program from a RomFS image on a
    non-mappable MTD partition results in a BUG as the kernel attempts XIP, and
    this can be caught in gdb:

    Program received signal SIGABRT, Aborted.
    0xc005dce8 in add_nommu_region (region=) at mm/nommu.c:547
    (gdb) bt
    #0 0xc005dce8 in add_nommu_region (region=) at mm/nommu.c:547
    #1 0xc005f168 in do_mmap_pgoff (file=0xc31a6620, addr=, len=3808, prot=3, flags=6146, pgoff=0) at mm/nommu.c:1373
    #2 0xc00a96b8 in elf_fdpic_map_file (params=0xc33fbbec, file=0xc31a6620, mm=0xc31bef60, what=0xc0213144 "executable") at mm.h:1145
    #3 0xc00aa8b4 in load_elf_fdpic_binary (bprm=0xc316cb00, regs=) at fs/binfmt_elf_fdpic.c:343
    #4 0xc006b588 in search_binary_handler (bprm=0x6, regs=0xc33fbce0) at fs/exec.c:1234
    #5 0xc006c648 in do_execve (filename=, argv=0xc3ad14cc, envp=0xc3ad1460, regs=0xc33fbce0) at fs/exec.c:1356
    #6 0xc0008cf0 in sys_execve (name=, argv=0xc3ad14cc, envp=0xc3ad1460) at arch/frv/kernel/process.c:263
    #7 0xc00075dc in __syscall_call () at arch/frv/kernel/entry.S:897

    Note that this fix does the following commit differently:

    commit a190887b58c32d19c2eee007c5eb8faa970a69ba
    Author: David Howells
    Date: Sat Sep 5 11:17:07 2009 -0700
    nommu: fix error handling in do_mmap_pgoff()

    Reported-by: Graff Yang
    Signed-off-by: David Howells
    Acked-by: Pekka Enberg
    Cc: Paul Mundt
    Cc: Mel Gorman
    Cc: Greg Ungerer
    Signed-off-by: Linus Torvalds

    David Howells
     

24 Sep, 2009

2 commits

  • Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
    vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
    into mm/truncate.c.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • My 58fa879e1e640a1856f736b418984ebeccee1c95 "mm: FOLL flags for GUP flags"
    broke CONFIG_NOMMU build by forgetting to update nommu.c foll_flags type:

    mm/nommu.c:171: error: conflicting types for `__get_user_pages'
    mm/internal.h:254: error: previous declaration of `__get_user_pages' was here
    make[1]: *** [mm/nommu.o] Error 1

    My 03f6462a3ae78f36eb1f0ee8b4d5ae2f7859c1d5 "mm: move highest_memmap_pfn"
    broke CONFIG_NOMMU build by forgetting to add a nommu.c highest_memmap_pfn:

    mm/built-in.o: In function `memmap_init_zone':
    (.meminit.text+0x326): undefined reference to `highest_memmap_pfn'
    mm/built-in.o: In function `memmap_init_zone':
    (.meminit.text+0x32d): undefined reference to `highest_memmap_pfn'

    Fix both breakages, and give myself 30 lashes (ouch!)

    Reported-by: Michal Simek
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

22 Sep, 2009

4 commits

  • Some architectures (like the Blackfin arch) implement some of the
    "simpler" features that one would expect out of a MMU such as memory
    protection.

    In our case, we actually get read/write/exec protection down to the page
    boundary so processes can't stomp on each other let alone the kernel.

    There is a performance decrease (which depends greatly on the workload)
    however as the hardware/software interaction was not optimized at design
    time.

    Signed-off-by: Bernd Schmidt
    Signed-off-by: Bryan Wu
    Signed-off-by: Mike Frysinger
    Acked-by: David Howells
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernd Schmidt
     
  • __get_user_pages() has been taking its own GUP flags, then processing
    them into FOLL flags for follow_page(). Though oddly named, the FOLL
    flags are more widely used, so pass them to __get_user_pages() now.
    Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.

    (The patch to __get_user_pages() looks peculiar, with both gup_flags
    and foll_flags: the gup_flags remain constant; but as before there's
    an exceptional case, out of scope of the patch, in which foll_flags
    per page have FOLL_WRITE masked off.)

    Signed-off-by: Hugh Dickins
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • GUP_FLAGS_IGNORE_VMA_PERMISSIONS and GUP_FLAGS_IGNORE_SIGKILL were
    flags added solely to prevent __get_user_pages() from doing some of
    what it usually does, in the munlock case: we can now remove them.

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Fix the following 'make includecheck' warning:

    mm/nommu.c: internal.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Cc: David Howells
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     

06 Sep, 2009

1 commit

  • Fix the error handling in do_mmap_pgoff(). If do_mmap_shared_file() or
    do_mmap_private() fail, we jump to the error_put_region label at which
    point we cann __put_nommu_region() on the region - but we haven't yet
    added the region to the tree, and so __put_nommu_region() may BUG
    because the region tree is empty or it may corrupt the region tree.

    To get around this, we can afford to add the region to the region tree
    before calling do_mmap_shared_file() or do_mmap_private() as we keep
    nommu_region_sem write-locked, so no-one can race with us by seeing a
    transient region.

    Signed-off-by: David Howells
    Acked-by: Pekka Enberg
    Acked-by: Paul Mundt
    Cc: Mel Gorman
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

19 Aug, 2009

1 commit

  • According to the POSIX (1003.1-2008), the file descriptor shall have been
    opened with read permission, regardless of the protection options specified to
    mmap(). The ltp test cases mmap06/07 need this.

    Signed-off-by: Graff Yang
    Acked-by: Paul Mundt
    Signed-off-by: David Howells
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Graff Yang
     

17 Aug, 2009

1 commit

  • Currently SELinux enforcement of controls on the ability to map low memory
    is determined by the mmap_min_addr tunable. This patch causes SELinux to
    ignore the tunable and instead use a seperate Kconfig option specific to how
    much space the LSM should protect.

    The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
    permissions will always protect the amount of low memory designated by
    CONFIG_LSM_MMAP_MIN_ADDR.

    This allows users who need to disable the mmap_min_addr controls (usual reason
    being they run WINE as a non-root user) to do so and still have SELinux
    controls preventing confined domains (like a web server) from being able to
    map some area of low memory.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     

02 Jul, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: LCDC dcache flush for deferred io
    sh: Fix compiler error and include the definition of IS_ERR_VALUE
    sh: re-add LCDC fbdev support to the Migo-R defconfig
    sh: fix se7724 ceu names
    sh: ms7724se: Enable sh_eth in defconfig.
    arch/sh/boards/mach-se/7206/io.c: Remove unnecessary semicolons
    sh: ms7724se: Add sh_eth support
    nommu: provide follow_pfn().
    sh: Kill off unused DEBUG_BOOTMEM symbol.
    perf_counter tools: add cpu_relax()/rmb() definitions for sh.
    sh64: Hook up page fault events for software perf counters.
    sh: Hook up page fault events for software perf counters.
    sh: make set_perf_counter_pending() static inline.
    clocksource: sh_tmu: Make undefined TCOR behaviour less undefined.

    Linus Torvalds
     

26 Jun, 2009

2 commits

  • With the introduction of follow_pfn() as an exported symbol, modules have
    begun making use of it. Unfortunately this was not reflected on nommu at
    the time, so the in-tree users have subsequently all blown up with link
    errors there.

    This provides a simple follow_pfn() that just returns addr >> PAGE_SHIFT,
    which will do the right thing on nommu. There is no need to do range
    checking within the vma, as the find_vma() case will already take care of
    this.

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • Currently the 4th parameter of get_user_pages() is called len, but its
    in pages, not bytes. Rename the thing to nr_pages to avoid future
    confusion.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

10 Jun, 2009

1 commit

  • With the "security: use mmap_min_addr indepedently of security models"
    change, mmap_min_addr is used in common areas, which susbsequently blows
    up the nommu build. This stubs in the definition in the nommu case as
    well.

    Signed-off-by: Paul Mundt

    --

    mm/nommu.c | 3 +++
    1 file changed, 3 insertions(+)
    Signed-off-by: James Morris

    Paul Mundt
     

08 May, 2009

1 commit


07 May, 2009

1 commit

  • NOMMU mmap() has an option controlled by a sysctl variable that determines
    whether the allocations made by do_mmap_private() should have the excess
    space trimmed off and returned to the allocator. Make the initial setting
    of this variable a Kconfig configuration option.

    The reason there can be excess space is that the allocator only allocates
    in power-of-2 size chunks, but mmap()'s can be made in sizes that aren't a
    power of 2.

    There are two alternatives:

    (1) Keep the excess as dead space. The dead space then remains unused for the
    lifetime of the mapping. Mappings of shared objects such as libc, ld.so
    or busybox's text segment may retain their dead space forever.

    (2) Return the excess to the allocator. This means that the dead space is
    limited to less than a page per mapping, but it means that for a transient
    process, there's more chance of fragmentation as the excess space may be
    reused fairly quickly.

    During the boot process, a lot of transient processes are created, and
    this can cause a lot of fragmentation as the pagecache and various slabs
    grow greatly during this time.

    By turning off the trimming of excess space during boot and disabling
    batching of frees, Coldfire can manage to boot.

    A better way of doing things might be to have /sbin/init turn this option
    off. By that point libc, ld.so and init - which are all long-duration
    processes - have all been loaded and trimmed.

    Reported-by: Lanttor Guo
    Signed-off-by: David Howells
    Tested-by: Lanttor Guo
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

03 May, 2009

1 commit

  • The Committed_AS field can underflow in certain situations:

    > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
    > 1 Committed_AS: 18446744073709323392 kB
    > 11 Committed_AS: 18446744073709455488 kB
    > 6 Committed_AS: 35136 kB
    > 5 Committed_AS: 18446744073709454400 kB
    > 7 Committed_AS: 35904 kB
    > 3 Committed_AS: 18446744073709453248 kB
    > 2 Committed_AS: 34752 kB
    > 9 Committed_AS: 18446744073709453248 kB
    > 8 Committed_AS: 34752 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 7 Committed_AS: 18446744073709454080 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 5 Committed_AS: 18446744073709454080 kB
    > 6 Committed_AS: 18446744073709320960 kB

    Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
    not check for underflow.

    But NR_CPUS proportional isn't good calculation. In general,
    possibility of lock contention is proportional to the number of online
    cpus, not theorical maximum cpus (NR_CPUS).

    The current kernel has generic percpu-counter stuff. using it is right
    way. it makes code simplify and percpu_counter_read_positive() don't
    make underflow issue.

    Reported-by: Dave Hansen
    Signed-off-by: KOSAKI Motohiro
    Cc: Eric B Munson
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: [All kernel versions]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

03 Apr, 2009

1 commit

  • Fix a number of issues with the per-MM VMA patch:

    (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
    a NOMMU system with more than 2G pages. Makes no difference on a 32-bit
    system.

    (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
    lest it overflow.

    (3) Move the allocation of the vm_area_struct slab back for fork.c.

    (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.

    (5) Use BUG_ON() rather than if () BUG().

    (6) Make the default validate_nommu_regions() a static inline rather than a
    #define.

    (7) Make free_page_series()'s objection to pages with a refcount != 1 more
    informative.

    (8) Adjust the __put_nommu_region() banner comment to indicate that the
    semaphore must be held for writing.

    (9) Limit the number of warnings about munmaps of non-mmapped regions.

    Reported-by: Andrew Morton
    Signed-off-by: David Howells
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

27 Jan, 2009

1 commit


21 Jan, 2009

1 commit


14 Jan, 2009

2 commits


08 Jan, 2009

4 commits

  • Now that we no longer use compound pages for all large allocations,
    kobjsize() actively breaks things like binfmt_flat by always handing
    back PAGE_SIZE for mmap'ed regions. Fix this up by looking up the
    VMA region for non-compounds.

    Ideally binfmt_flat wants to get rid of kobjsize() completely, but
    this is an incremental step.

    Signed-off-by: Paul Mundt
    Signed-off-by: David Howells
    Tested-by: Mike Frysinger

    Paul Mundt
     
  • NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
    the nearest power-of-2 number of pages. Currently it then discards the excess
    pages back to the page allocator, making that memory available for use by other
    things. This can, however, cause greater amount of fragmentation.

    To counter this, a sysctl is added in order to fine-tune the trimming
    behaviour. The default behaviour remains to trim pages aggressively, while
    this can either be disabled completely or set to a higher page-granular
    watermark in order to have finer-grained control.

    vm region vm_top bits taken from an earlier patch by David Howells.

    Signed-off-by: Paul Mundt
    Signed-off-by: David Howells
    Tested-by: Mike Frysinger

    Paul Mundt
     
  • Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

    (1) In SYSV SHM where nattch for a segment does not reflect the number of
    shmat's (and forks) done.

    (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
    exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
    that a VMA might be shared and already have its vm_mm assigned to another
    process or a dead process.

    A new struct (vm_region) is introduced to track a mapped region and to remember
    the circumstances under which it may be shared and the vm_list_struct structure
    is discarded as it's no longer required.

    This patch makes the following additional changes:

    (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
    with no recourse to __GFP_COMP, so the pages are not composite. Instead,
    each page has a reference on it held by the region. Anything else that is
    interested in such a page will have to get a reference on it to retain it.
    When the pages are released due to unmapping, each page is passed to
    put_page() and will be freed when the page usage count reaches zero.

    (2) Excess pages are trimmed after an allocation as the allocation must be
    made as a power-of-2 quantity of pages.

    (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
    end up with overlapping VMAs within the tree, the VMA struct address is
    appended to the sort key.

    (4) Non-anonymous VMAs are now added to the backing inode's prio list.

    (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
    the backing region. The VMA and region structs will be split if
    necessary.

    (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
    segment instead of all the attachments at that addresss. Multiple
    shmat()'s return the same address under NOMMU-mode instead of different
    virtual addresses as under MMU-mode.

    (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

    (8) /proc/maps is now the global list of mapped regions, and may list bits
    that aren't actually mapped anywhere.

    (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
    of RAM currently allocated by mmap to hold mappable regions that can't be
    mapped directly. These are copies of the backing device or file if not
    anonymous.

    These changes make NOMMU mode more similar to MMU mode. The downside is that
    NOMMU mode requires some extra memory to track things over NOMMU without this
    patch (VMAs are no longer shared, and there are now region structs).

    Signed-off-by: David Howells
    Tested-by: Mike Frysinger
    Acked-by: Paul Mundt

    David Howells
     
  • Delete the askedalloc and realalloc variables as nothing actually uses the
    value calculated.

    Signed-off-by: David Howells
    Tested-by: Mike Frysinger
    Acked-by: Paul Mundt

    David Howells
     

06 Jan, 2009

1 commit

  • We used to have rather schizophrenic set of checks for NULL ->i_op even
    though it had been eliminated years ago. You'd need to go out of your
    way to set it to NULL explicitly _and_ a bunch of code would die on
    such inodes anyway. After killing two remaining places that still
    did that bogosity, all that crap can go away.

    Signed-off-by: Al Viro

    Al Viro
     

31 Oct, 2008

1 commit

  • Junjiro R. Okajima reported a problem where knfsd crashes if you are
    using it to export shmemfs objects and run strict overcommit. In this
    situation the current->mm based modifier to the overcommit goes through a
    NULL pointer.

    We could simply check for NULL and skip the modifier but we've caught
    other real bugs in the past from mm being NULL here - cases where we did
    need a valid mm set up (eg the exec bug about a year ago).

    To preserve the checks and get the logic we want shuffle the checking
    around and add a new helper to the vm_ security wrappers

    Also fix a current->mm reference in nommu that should use the passed mm

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix build]
    Reported-by: Junjiro R. Okajima
    Acked-by: James Morris
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

20 Oct, 2008

1 commit

  • Make sure that mlocked pages also live on the unevictable LRU, so kswapd
    will not scan them over and over again.

    This is achieved through various strategies:

    1) add yet another page flag--PG_mlocked--to indicate that
    the page is locked for efficient testing in vmscan and,
    optionally, fault path. This allows early culling of
    unevictable pages, preventing them from getting to
    page_referenced()/try_to_unmap(). Also allows separate
    accounting of mlock'd pages, as Nick's original patch
    did.

    Note: Nick's original mlock patch used a PG_mlocked
    flag. I had removed this in favor of the PG_unevictable
    flag + an mlock_count [new page struct member]. I
    restored the PG_mlocked flag to eliminate the new
    count field.

    2) add the mlock/unevictable infrastructure to mm/mlock.c,
    with internal APIs in mm/internal.h. This is a rework
    of Nick's original patch to these files, taking into
    account that mlocked pages are now kept on unevictable
    LRU list.

    3) update vmscan.c:page_evictable() to check PageMlocked()
    and, if vma passed in, the vm_flags. Note that the vma
    will only be passed in for new pages in the fault path;
    and then only if the "cull unevictable pages in fault
    path" patch is included.

    4) add try_to_unlock() to rmap.c to walk a page's rmap and
    ClearPageMlocked() if no other vmas have it mlocked.
    Reuses as much of try_to_unmap() as possible. This
    effectively replaces the use of one of the lru list links
    as an mlock count. If this mechanism let's pages in mlocked
    vmas leak through w/o PG_mlocked set [I don't know that it
    does], we should catch them later in try_to_unmap(). One
    hopes this will be rare, as it will be relatively expensive.

    Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
    Signed-off-by: Nick Piggin

    splitlru: introduce __get_user_pages():

    New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
    because current get_user_pages() can't grab PROT_NONE pages theresore it
    cause PROT_NONE pages can't munlock.

    [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
    [akpm@linux-foundation.org: untangle patch interdependencies]
    [akpm@linux-foundation.org: fix things after out-of-order merging]
    [hugh@veritas.com: fix page-flags mess]
    [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
    [kosaki.motohiro@jp.fujitsu.com: build fix]
    [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
    [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Cc: Nick Piggin
    Cc: Dave Hansen
    Cc: Matt Mackall
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

04 Aug, 2008

1 commit

  • Now that SH has switched to vmalloc_exec() for PAGE_KERNEL_EXEC usage,
    it's apparent that nommu has no vmalloc_exec() definition of its own.
    Stub in the one from mm/vmalloc.c.

    Signed-off-by: Paul Mundt

    Paul Mundt
     

27 Jul, 2008

1 commit

  • This adds tracehook_expect_breakpoints() as a formal hook for the nommu
    code to use for its, "Is text-poking likely?" check at mmap time. This
    names the actual semantics the code means to test, and documents it.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Reviewed-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

12 Jun, 2008

1 commit

  • This implements a few changes on top of the recent kobjsize() refactoring
    introduced by commit 6cfd53fc03670c7a544a56d441eb1a6cc800d72b.

    As Christoph points out:

    virt_to_head_page cannot return NULL. virt_to_page also
    does not return NULL. pfn_valid() needs to be used to
    figure out if a page is valid. Otherwise the page struct
    reference that was returned may have PageReserved() set
    to indicate that it is not a valid page.

    As discussed further in the thread, virt_addr_valid() is the preferable
    way to validate the object pointer in this case. In addition to fixing
    up the reserved page case, it also has the benefit of encapsulating the
    hack introduced by commit 4016a1390d07f15b267eecb20e76a48fd5c524ef on
    the impacted platforms, allowing us to get rid of the extra checking in
    kobjsize() for the platforms that don't perform this type of bizarre
    memory_end abuse (every nommu platform that isn't blackfin). If blackfin
    decides to get in line with every other platform and use PageReserved
    for the DMA pages in question, kobjsize() will also continue to work
    fine.

    It also turns out that compound_order() will give us back 0-order for
    non-head pages, so we can get rid of the PageCompound check and just
    use compound_order() directly. Clean that up while we're at it.

    Signed-off-by: Paul Mundt
    Reviewed-by: Christoph Lameter
    Acked-by: David Howells
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

07 Jun, 2008

1 commit

  • kobjsize() has been abusing page->index as a method for sorting out
    compound order, which blows up both for page cache pages, and SLOB's
    reuse of the index in struct slob_page.

    Presently we are not able to accurately size arbitrary pointers that
    don't come from kmalloc(), so the best we can do is sort out the
    compound order from the head page if it's a compound page, or default
    to 0-order if it's impossible to ksize() the object.

    Obviously this leaves quite a bit to be desired in terms of object
    sizing accuracy, but the behaviour is unchanged over the existing
    implementation, while fixing the page->index oopses originally reported
    here:

    http://marc.info/?l=linux-mm&m=121127773325245&w=2

    Accuracy could also be improved by having SLUB and SLOB both set PG_slab
    on ksizeable pages, rather than just handling the __GFP_COMP cases
    irregardless of the PG_slab setting, as made possibly with Pekka's
    patches:

    http://marc.info/?l=linux-kernel&m=121139439900534&w=2
    http://marc.info/?l=linux-kernel&m=121139440000537&w=2
    http://marc.info/?l=linux-kernel&m=121139440000540&w=2

    This is primarily a bugfix for nommu systems for 2.6.26, with the aim
    being to gradually kill off kobjsize() and its particular brand of
    object abuse entirely.

    Reviewed-by: Pekka Enberg
    Signed-off-by: Paul Mundt
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

25 May, 2008

1 commit

  • The atomic_t type is 32bit but a 64bit system can have more than 2^32
    pages of virtual address space available. Without this we overflow on
    ludicrously large mappings

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

29 Apr, 2008

1 commit

  • The kernel implements readlink of /proc/pid/exe by getting the file from
    the first executable VMA. Then the path to the file is reconstructed and
    reported as the result.

    Because of the VMA walk the code is slightly different on nommu systems.
    This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
    walking the VMAs to find the first executable file-backed VMA we store a
    reference to the exec'd file in the mm_struct.

    That reference would prevent the filesystem holding the executable file
    from being unmounted even after unmapping the VMAs. So we track the number
    of VM_EXECUTABLE VMAs and drop the new reference when the last one is
    unmapped. This avoids pinning the mounted filesystem.

    [akpm@linux-foundation.org: improve comments]
    [yamamoto@valinux.co.jp: fix dup_mmap]
    Signed-off-by: Matt Helsley
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc:"Eric W. Biederman"
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hugh Dickins
    Signed-off-by: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

28 Apr, 2008

1 commit

  • Don't perform kobjsize operations on objects the kernel doesn't manage.

    On Blackfin, drivers can get dma coherent memory by calling a function
    dma_alloc_coherent(). We do this in nommu by configuring a chunk of uncached
    memory at the top of memory.

    Since we don't want the kernel to use the uncached memory, we lie to the
    kernel, and tell it that it's max memory is between 0, and the start of the
    uncached dma coherent section.

    this all works well, until this memory gets exposed into userspace (with a
    frame buffer), when you look at the process's maps, it shows the framebuf:

    root:/proc> cat maps
    [snip]
    03f0ef00-03f34700 rw-p 00000000 1f:00 192 /dev/fb0
    root:/proc>

    This is outside the "normal" range for the kernel. When the kernel tries to
    find the size of this object (when you run ps), it dies in nommu.c in
    kobjsize.

    BUG_ON(page->index >= MAX_ORDER);

    since the page we are referring to is outside what the kernel thinks is it's
    max valid memory.

    root:~> while [ 1 ]; ps > /dev/null; done
    kernel BUG at mm/nommu.c:119!
    Kernel panic - not syncing: BUG!

    We fixed this by adding a check to reject out of range object pointers as it
    already does that for NULL pointers.

    Signed-off-by: Michael Hennerich
    Signed-off-by: Robin Getz
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Hennerich
     

06 Feb, 2008

1 commit

  • This builds on top of the earlier vmalloc_32_user() work introduced by
    b50731732f926d6c49fd0724616a7344c31cd5cf, as we now have places in the nommu
    allmodconfig that hit up against these missing APIs.

    As vmalloc_32_user() is already implemented, this is moved over to
    vmalloc_user() and simply made a wrapper. As all current nommu platforms are
    32-bit addressable, there's no special casing we have to do for ZONE_DMA and
    things of that nature as per GFP_VMALLOC32.

    remap_vmalloc_range() needs to check VM_USERMAP in order to figure out whether
    we permit the remap or not, which means that we also have to rework the
    vmalloc_user() code to grovel for the VMA and set the flag.

    Signed-off-by: Paul Mundt
    Acked-by: David McCullough
    Acked-by: David Howells
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt