10 Feb, 2008

3 commits


09 Feb, 2008

1 commit

  • Background: I've implemented 1K/2K page tables for s390. These sub-page
    page tables are required to properly support the s390 virtualization
    instruction with KVM. The SIE instruction requires that the page tables
    have 256 page table entries (pte) followed by 256 page status table entries
    (pgste). The pgstes are only required if the process is using the SIE
    instruction. The pgstes are updated by the hardware and by the hypervisor
    for a number of reasons, one of them is dirty and reference bit tracking.
    To avoid wasting memory the standard pte table allocation should return
    1K/2K (31/64 bit) and 2K/4K if the process is using SIE.

    Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
    the s390 version for pte_alloc_one cannot return a pointer to a struct
    page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
    cannot return a pointer to a pte either, since that would require more than
    32 bit for the return value of pte_alloc_one (and the pte * would not be
    accessible since its not kmapped).

    Solution: The only solution I found to this dilemma is a new typedef: a
    pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
    later patch. For everybody else it will be a (struct page *). The
    additional problem with the initialization of the ptl lock and the
    NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
    a destructor pgtable_page_dtor. The page table allocation and free
    functions need to call these two whenever a page table page is allocated or
    freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
    To get the pgtable_t back from a pmd entry that has been installed with
    pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
    call in free_pte_range and apply_to_pte_range.

    Signed-off-by: Martin Schwidefsky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     

06 Feb, 2008

1 commit

  • (with Martin Schwidefsky )

    The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
    first argument. The free functions do not get the mm_struct argument. This
    is 1) asymmetrical and 2) to do mm related page table allocations the mm
    argument is needed on the free function as well.

    [kamalesh@linux.vnet.ibm.com: i386 fix]
    [akpm@linux-foundation.org: coding-syle fixes]
    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Martin Schwidefsky
    Cc:
    Signed-off-by: Kamalesh Babulal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

22 Oct, 2007

2 commits

  • Get independent from asm-generic/4level-fixup.h

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The current tlb flushing code for page table entries violates the
    s390 architecture in a small detail. The relevant section from the
    principles of operation (SA22-7832-02 page 3-47):

    "A valid table entry must not be changed while it is attached
    to any CPU and may be used for translation by that CPU except to
    (1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY or
    INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page-table
    entry, or (3) make a change by means of a COMPARE AND SWAP AND
    PURGE instruction that purges the TLB."

    That means if one thread of a multithreaded applciation uses a vma
    while another thread does an unmap on it, the page table entries of
    that vma needs to get removed with IPTE, IDTE or CSP. In some strange
    and rare situations a cpu could check-stop (die) because a entry has
    been pushed out of the TLB that is still needed to complete a
    (milli-coded) instruction. I've never seen it happen with the current
    code on any of the supported machines, so right now this is a
    theoretical problem. But I want to fix it nevertheless, to avoid
    headaches in the futures.

    To get this implemented correctly without changing common code the
    primitives ptep_get_and_clear, ptep_get_and_clear_full and
    ptep_set_wrprotect need to use the IPTE instruction to invalidate the
    pte before the new pte value gets stored. If IPTE is always used for
    the three primitives three important operations will have a performace
    hit: fork, mprotect and exit_mmap. Time for some workarounds:

    * 1: ptep_get_and_clear_full is used in unmap_vmas to remove page
    tables entries in a batched tlb gather operation. If the mmu_gather
    context passed to unmap_vmas has been started with full_mm_flush==1
    or if only one cpu is online or if the only user of a mm_struct is the
    current process then the fullmm indication in the mmu_gather context is
    set to one. All TLBs for mm_struct are flushed by the tlb_gather_mmu
    call. No new TLBs can be created while the unmap is in progress. In
    this case ptep_get_and_clear_full clears the ptes with a simple store.

    * 2: ptep_get_and_clear is used in change_protection to clear the
    ptes from the page tables before they are reentered with the new
    access flags. At the end of the update flush_tlb_range clears the
    remaining TLBs. In general the ptep_get_and_clear has to issue IPTE
    for each pte and flush_tlb_range is a nop. But if there is only one
    user of the mm_struct then ptep_get_and_clear uses simple stores
    to do the update and flush_tlb_range will flush the TLBs.

    * 3: Similar to 2, ptep_set_wrprotect is used in copy_page_range
    for a fork to make all ptes of a cow mapping read-only. At the end of
    of copy_page_range dup_mmap will flush the TLBs with a call to
    flush_tlb_mm. Check for mm->mm_users and if there is only one user
    avoid using IPTE in ptep_set_wrprotect and let flush_tlb_mm clear the
    TLBs.

    Overall for single threaded programs the tlb flush code now performs
    better, for multi threaded programs it is slightly worse. In particular
    exit_mmap() now does a single IDTE for the mm and then just frees every
    page cache reference and every page table page directly without a delay
    over the mmu_gather structure.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds