20 May, 2008

1 commit

  • The x86_64 pgd_bad(), pud_bad(), pmd_bad() inlines have differed from
    their x86_32 counterparts in a couple of ways: they've been unnecessarily
    weak (e.g. letting 0 or 1 count as good), and were typed as unsigned long.
    Strengthen them and return int.

    The PAE pmd_bad was too weak before, allowing any junk in the upper half;
    but got strengthened by the patch correcting its ~PAGE_MASK to ~PTE_MASK.
    The PAE pud_bad already said ~PTE_MASK; and since it folds into pgd_bad,
    and we don't set the protection bits at that level, it'll do as is.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

07 May, 2008

1 commit

  • Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

    That came from 9fc34113f6880b215cbea4e7017fc818700384c2 x86: debug pmd_bad();
    but we understand now that the typecasting was wrong for PAE in the previous
    version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

    And revert that cded932b75ab0a5f9181ee3da34a0a488d1a14fd x86: fix pmd_bad
    and pud_bad to support huge pages. It was the wrong way round: we shouldn't
    weaken every pmd_bad and pud_bad check to let huge pages slip through - in
    part they check that we _don't_ have a huge page where it's not expected.

    Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
    been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
    junk in the upper word is good; and x86_64 should follow x86_32's stricter
    comparison, to stop thinking any subset of required bits is good); but that
    should be a later patch.

    Fix Hans' good observation that follow_page() will never find pmd_huge()
    because that would have already failed the pmd_bad test: test pmd_huge in
    between the pmd_none and pmd_bad tests. Tighten x86's pmd_huge() check?
    No, once it's a hugepage entry, it can get quite far from a good pmd: for
    example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

    However... though follow_page() contains this and another test for huge
    pages, so it's nice to keep it working on them, where does it actually get
    called on a huge page? get_user_pages() checks is_vm_hugetlb_page(vma) to
    to call alternative hugetlb processing, as does unmap_vmas() and others.

    Signed-off-by: Hugh Dickins
    Earlier-version-tested-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Jeff Chua
    Cc: Hans Rosenfeld
    Cc: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

25 Apr, 2008

1 commit

  • All pagetables need fundamentally the same setup and destruction, so
    just use the same code for everything.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Andi Kleen
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     

17 Apr, 2008

4 commits


04 Mar, 2008

1 commit

  • This reverts commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd.

    Arjan bisected down a boot-time hang to this, saying:
    ".. it prevents the kernel to finish booting on my (Penryn based)
    laptop. The boot stops right after freeing the init memory."

    and while it's not clear exactly what triggers it, at this stage we're
    better off just reverting it while Ingo tries to figure out what went
    wrong.

    Requested-by: Arjan van de Ven
    Cc: Hans Rosenfeld
    Cc: Nish Aravamudan
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Mar, 2008

1 commit

  • I recently stumbled upon a problem in the support for huge pages. If a
    program using huge pages does not explicitly unmap them, they remain
    mapped (and therefore, are lost) after the program exits.

    I observed that the free huge page count in /proc/meminfo decreased when
    running my program, and it did not increase after the program exited.
    After running the program a few times, no more huge pages could be
    allocated.

    The reason for this seems to be that the x86 pmd_bad and pud_bad
    consider pmd/pud entries having the PSE bit set invalid. I think there
    is nothing wrong with this bit being set, it just indicates that the
    lowest level of translation has been reached. This bit has to be (and
    is) checked after the basic validity of the entry has been checked, like
    in this fragment from follow_page() in mm/memory.c:

    if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
    goto no_page_table;

    if (pmd_huge(*pmd)) {
    BUG_ON(flags & FOLL_GET);
    page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
    goto out;
    }

    Note that this code currently doesn't work as intended if the pmd refers
    to a huge page, the pmd_huge() check can not be reached if the page is
    huge.

    Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
    allow for the PSE bit being set fixes this. For similar reasons,
    allowing the NX bit being set is necessary, too. I have seen huge pages
    having the NX bit set in their pmd entry, which would cause the same
    problem.

    Signed-Off-By: Hans Rosenfeld
    Signed-off-by: Ingo Molnar

    Hans Rosenfeld
     

19 Feb, 2008

2 commits

  • In order to have it at all levels, add pgd_large() which only
    returns 0.

    Signed-off-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    H. Peter Anvin
     
  • The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
    from __START_KERNEL_map. The kernel itself only needs _text to _end
    mapped in the high alias. On relocatible kernels the ASM setup code
    adjusts the compile time created high mappings to the relocation. This
    creates invalid pmd entries for negative offsets:

    0xffffffff80000000 -> pmd entry: ffffffffff2001e3
    It points outside of the physical address space and is marked present.

    This starts at the virtual address __START_KERNEL_map and goes up to
    the point where the first valid physical address (0x0) is mapped.

    Zap the mappings before _text and after _end right away in early
    boot. This removes also the invalid entries.

    Furthermore it simplifies the range check for high aliases.

    Signed-off-by: Thomas Gleixner
    Acked-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

04 Feb, 2008

2 commits


30 Jan, 2008

20 commits


20 Oct, 2007

1 commit

  • remove asm/bitops.h includes

    including asm/bitops directly may cause compile errors. don't include it
    and include linux/bitops instead. next patch will deny including asm header
    directly.

    Cc: Adrian Bunk
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

17 Oct, 2007

1 commit

  • x86_64 uses 2M page table entries to map its 1-1 kernel space. We also
    implement the virtual memmap using 2M page table entries. So there is no
    additional runtime overhead over FLATMEM, initialisation is slightly more
    complex. As FLATMEM still references memory to obtain the mem_map pointer and
    SPARSEMEM_VMEMMAP uses a compile time constant, SPARSEMEM_VMEMMAP should be
    superior.

    With this SPARSEMEM becomes the most efficient way of handling virt_to_page,
    pfn_to_page and friends for UP, SMP and NUMA on x86_64.

    [apw@shadowen.org: code resplit, style fixups]
    [apw@shadowen.org: vmemmap x86_64: ensure end of section memmap is initialised]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andy Whitcroft
    Acked-by: Mel Gorman
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

11 Oct, 2007

1 commit