30 Sep, 2006

2 commits

  • This is an updated version of Eric Biederman's is_init() patch.
    (http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
    replaces a few more instances of ->pid == 1 with is_init().

    Further, is_init() checks pid and thus removes dependency on Eric's other
    patches for now.

    Eric's original description:

    There are a lot of places in the kernel where we test for init
    because we give it special properties. Most significantly init
    must not die. This results in code all over the kernel test
    ->pid == 1.

    Introduce is_init to capture this case.

    With multiple pid spaces for all of the cases affected we are
    looking for only the first process on the system, not some other
    process that has pid == 1.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Sukadev Bhattiprolu
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Cedric Le Goater
    Cc:
    Acked-by: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Make PROT_WRITE imply PROT_READ for a number of architectures which don't
    support write only in hardware.

    While looking at this, I noticed that some architectures which do not
    support write only mappings already take the exact same approach. For
    example, in arch/alpha/mm/fault.c:

    "
    if (cause < 0) {
    if (!(vma->vm_flags & VM_EXEC))
    goto bad_area;
    } else if (!cause) {
    /* Allow reads even for write-only mappings */
    if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
    goto bad_area;
    } else {
    if (!(vma->vm_flags & VM_WRITE))
    goto bad_area;
    }
    "

    Thus, this patch brings other architectures which do not support write only
    mappings in-line and consistent with the rest. I've verified the patch on
    ia64, x86_64 and x86.

    Additional discussion:

    Several architectures, including x86, can not support write-only mappings.
    The pte for x86 reserves a single bit for protection and its two states are
    read only or read/write. Thus, write only is not supported in h/w.

    Currently, if i 'mmap' a page write-only, the first read attempt on that page
    creates a page fault and will SEGV. That check is enforced in
    arch/blah/mm/fault.c. However, if i first write that page it will fault in
    and the pte will be set to read/write. Thus, any subsequent reads to the page
    will succeed. It is this inconsistency in behavior that this patch is
    attempting to address. Furthermore, if the page is swapped out, and then
    brought back the first read will also cause a SEGV. Thus, any arbitrary read
    on a page can potentially result in a SEGV.

    According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
    implementation may also allow read access." Also as mentioned, some
    archtectures, such as alpha, shown above already take the approach that i am
    suggesting.

    The counter-argument to this raised by Arjan, is that the kernel is enforcing
    the write only mapping the best it can given the h/w limitations. This is
    true, however Alan Cox, and myself would argue that the inconsitency in
    behavior, that is applications can sometimes work/sometimes fails is highly
    undesireable. If you read through the thread, i think people, came to an
    agreement on the last patch i posted, as nobody has objected to it...

    Signed-off-by: Jason Baron
    Cc: Russell King
    Cc: "Luck, Tony"
    Cc: Hugh Dickins
    Cc: Roman Zippel
    Cc: Geert Uytterhoeven
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Acked-by: Andi Kleen
    Acked-by: Alan Cox
    Cc: Arjan van de Ven
    Acked-by: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Ian Molton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

27 Sep, 2006

24 commits


26 Sep, 2006

1 commit

  • One of the changes necessary for shared page tables is to standardize the
    pxx_page macros. pte_page and pmd_page have always returned the struct
    page associated with their entry, while pte_page_kernel and pmd_page_kernel
    have returned the kernel virtual address. pud_page and pgd_page, on the
    other hand, return the kernel virtual address.

    Shared page tables needs pud_page and pgd_page to return the actual page
    structures. There are very few actual users of these functions, so it is
    simple to standardize their usage.

    Since this is basic cleanup, I am submitting these changes as a standalone
    patch. Per Hugh Dickins' comments about it, I am also changing the
    pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

    Signed-off-by: Dave McCracken
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave McCracken
     

01 Jul, 2006

1 commit


22 Mar, 2006

3 commits

  • Quite a long time back, prepare_hugepage_range() replaced
    is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
    verify if an address range is suitable for a hugepage mapping.
    is_aligned_hugepage_range() stuck around, but only to implement
    prepare_hugepage_range() on archs which didn't implement their own.

    Most archs (everything except ia64 and powerpc) used the same
    implementation of is_aligned_hugepage_range(). On powerpc, which
    implements its own prepare_hugepage_range(), the custom version was never
    used.

    In addition, "is_aligned_hugepage_range()" was a bad name, because it
    suggests it returns true iff the given range is a good hugepage range,
    whereas in fact it returns 0-or-error (so the sense is reversed).

    This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
    prepare_hugepage_range() is defined directly. Most archs use the default
    version, which simply checks the given region is aligned to the size of a
    hugepage. ia64 and powerpc define custom versions. The ia64 one simply
    checks that the range is in the correct address space region in addition to
    being suitably aligned. The powerpc version (just as previously) checks
    for suitable addresses, and if necessary performs low-level MMU frobbing to
    set up new areas for use by hugepages.

    No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).

    Signed-off-by: David Gibson
    Signed-off-by: Zhang Yanmin
    Cc: "David S. Miller"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Gibson
     
  • set_page_count usage outside mm/ is limited to setting the refcount to 1.
    Remove set_page_count from outside mm/, and replace those users with
    init_page_count() and set_page_refcounted().

    This allows more debug checking, and tighter control on how code is allowed
    to play around with page->_count.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Have an explicit mm call to split higher order pages into individual pages.
    Should help to avoid bugs and be more explicit about the code's intention.

    Signed-off-by: Nick Piggin
    Cc: Russell King
    Cc: David Howells
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Zankel
    Signed-off-by: Yoichi Yuasa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Jan, 2006

2 commits

  • Currently the CPU subtype options are cluttering up arch/sh/Kconfig somewhat.

    Given that, this moves all of that in to its own arch/sh/mm/Kconfig. Things
    like cache configuration are also moved to this new location.

    This also adds support for strict CPU tuning on newer cores, which requires
    the addition of as-option.

    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • This introduces a few changes in the way that the I/O routines are defined on
    SH, specifically so that things like the iomap API properly wrap through the
    machvec for board-specific quirks.

    In addition to this, the old p3_ioremap() work is converted to a more generic
    __ioremap() that will map through the PMB if it's available, or fall back on
    page tables for everything else.

    An alpha-like IO_CONCAT is also added so we can start to clean up the
    board-specific io.h mess, which will be handled in board update patches..

    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

07 Nov, 2005

2 commits

  • SH7705 in extended cache mode has some left-over VALID_PAGE() cruft that it
    checks when doing lazy dcache write-back. This has been gone for some time
    (the last bits were in the discontig code, which should now also be gone --
    this also fixes up a build error in the non-discontig case).

    pfn_valid() gives the desired behaviour, so we switch to that.

    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • There was only one board using this (hp690 specifically), and it just so
    happens that it's only physically discontiguous at the "normal" P1 offset. If
    we bump up the P1 offset, it's possible to hit a shadowed region of memory
    where we suddenly become magically contiguous.

    As people have been using this shadowed region workaround for quite some time
    (and without any adverse effects), it's time to drop the left over discontig
    bits that no longer have any practical use (it was always very much
    hp690-centric to begin with).

    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

30 Oct, 2005

3 commits

  • Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate
    pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places.

    The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the
    screen fits inside the one page table, as indeed it does.

    The sh __do_page_fault: which handles both kernel faults (without lock) and
    user mm faults (locked - though it set_pte without locking before).

    The sh64 flush_cache_range and helpers: which wrongly thought callers held
    page_table_lock before (only its tlb_start_vma did, and no longer does so);
    moved the flush loop down, and adjusted the large versus small range decision
    to consider a range which spans page tables as large.

    Signed-off-by: Hugh Dickins
    Acked-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • First step in pushing down the page_table_lock. init_mm.page_table_lock has
    been used throughout the architectures (usually for ioremap): not to serialize
    kernel address space allocation (that's usually vmlist_lock), but because
    pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.

    Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
    architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
    and drop it when allocating a new one, to check lest a racing task already
    did. Similarly no page_table_lock in vmalloc's map_vm_area.

    Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
    user mms, which are converted only by a later patch, for now they have to lock
    differently according to whether or not it's init_mm.

    If sources get muddled, there's a danger that an arch source taking
    init_mm.page_table_lock will be mixed with common source also taking it (or
    neither take it). So break the rules and make another change, which should
    break the build for such a mismatch: remove the redundant mm arg from
    pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).

    Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
    used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
    pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
    map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
    took page_table_lock for no good reason.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The sh64 hugetlbpage.c seems to be erroneous, left over from a bygone age,
    clashing with the common hugetlb.c. Replace it by a copy of the sh
    hugetlbpage.c. Except, delete that mk_pte_huge macro neither uses.

    Signed-off-by: Hugh Dickins
    Acked-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Oct, 2005

1 commit


22 Jun, 2005

1 commit

  • A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch
    attempts to consolidate a lot of the code across the arch's, putting the
    combined version in mm/hugetlb.c. There are a couple of uglyish hacks in
    order to covert all the hugepage archs, but the result is a very large
    reduction in the total amount of code. It also means things like hugepage
    lazy allocation could be implemented in one place, instead of six.

    Tested, at least a little, on ppc64, i386 and x86_64.

    Notes:
    - this patch changes the meaning of set_huge_pte() to be more
    analagous to set_pte()
    - does SH4 need s special huge_ptep_get_and_clear()??

    Acked-by: William Lee Irwin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Gibson