16 Jun, 2011

1 commit


23 May, 2011

1 commit

  • The page_clear_dirty primitive always sets the default storage key
    which resets the access control bits and the fetch protection bit.
    That will surprise a KVM guest that sets non-zero access control
    bits or the fetch protection bit. Merge page_test_dirty and
    page_clear_dirty back to a single function and only clear the
    dirty bit from the storage key.

    In addition move the function page_test_and_clear_dirty and
    page_test_and_clear_young to page.h where they belong. This
    requires to change the parameter from a struct page * to a page
    frame number.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

01 Mar, 2011

1 commit

  • Commit e2cda3226481 ("thp: add pmd mangling generic functions") replaced
    some macros in with inline functions.

    If the functions are to be defined (not all architectures need them)
    then struct vm_area_struct must be defined first. So include
    .

    Fixes a build failure seen in Debian:

    CC [M] drivers/media/dvb/mantis/mantis_pci.o
    In file included from arch/arm/include/asm/pgtable.h:460,
    from drivers/media/dvb/mantis/mantis_pci.c:25:
    include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
    include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type

    Signed-off-by: Ben Hutchings
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     

17 Jan, 2011

1 commit

  • pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
    BUG() and they were defined only to diminish the risk of build issues on
    not-x86 archs and to be consistent with the generic pte methods previously
    defined in include/asm-generic/pgtable.h.

    But they are causing more trouble than they were supposed to solve, so
    it's simpler not to define them when THP is off.

    This is also correcting the export of pmdp_splitting_flush which is
    currently unused (x86 isn't using the generic implementation in
    mm/pgtable-generic.c and no other arch needs that [yet]).

    Signed-off-by: Andrea Arcangeli
    Sam Ravnborg
    Cc: Stephen Rothwell
    Cc: "David S. Miller"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

14 Jan, 2011

2 commits

  • Some are needed to build but not actually used on archs not supporting
    transparent hugepages. Others like pmdp_clear_flush are used by x86 too.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • These returns 0 at compile time when the config option is disabled, to
    allow gcc to eliminate the transparent hugepage function calls at compile
    time without additional #ifdefs (only the export of those functions have
    to be visible to gcc but they won't be required at link time and
    huge_memory.o can be not built at all).

    _PAGE_BIT_UNUSED1 is never used for pmd, only on pte.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

25 Oct, 2010

1 commit


24 Aug, 2010

1 commit

  • In x86, access and dirty bits are set automatically by CPU when CPU accesses
    memory. When we go into the code path of below flush_tlb_fix_spurious_fault(),
    we already set dirty bit for pte and don't need flush tlb. This might mean
    tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When
    the CPUs do page write, they will automatically check the bit and no software
    involved.

    On the other hand, flush tlb in below position is harmful. Test creates CPU
    number of threads, each thread writes to a same but random address in same vma
    range and we measure the total time. Under a 4 socket system, original time is
    1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is
    20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for
    tlb flush.

    Signed-off-by: Shaohua Li
    LKML-Reference:
    Acked-by: Suresh Siddha
    Cc: Andrea Archangeli
    Signed-off-by: H. Peter Anvin

    Shaohua Li
     

23 Jun, 2009

1 commit

  • Most architectures now provide a pgprot_noncached(), the
    remaining ones can simply use an dummy default implementation,
    except for cris and xtensa, which should override the
    default appropriately.

    Signed-off-by: Arnd Bergmann
    Cc: Jesper Nilsson
    Cc: Chris Zankel
    Cc: Magnus Damm

    Paul Mundt
     

30 Mar, 2009

2 commits


14 Jan, 2009

1 commit


20 Dec, 2008

1 commit


19 Dec, 2008

1 commit


16 Jul, 2008

1 commit

  • Commit 1ea0704e0d aka "mm: add a ptep_modify_prot transaction abstraction"

    caused:

    | CC init/main.o
    |In file included from include2/asm/pgtable.h:68,
    | from /home/bigeasy/git/linux-2.6-m68k/include/linux/mm.h:39,
    | from include2/asm/uaccess.h:8,
    | from /home/bigeasy/git/linux-2.6-m68k/include/linux/poll.h:13,
    | from /home/bigeasy/git/linux-2.6-m68k/include/linux/rtc.h:113,
    | from /home/bigeasy/git/linux-2.6-m68k/include/linux/efi.h:19,
    | from /home/bigeasy/git/linux-2.6-m68k/init/main.c:43:
    |/linux-2.6/include/asm-generic/pgtable.h: In function '__ptep_modify_prot_start':
    |/linux-2.6/include/asm-generic/pgtable.h:209: error: implicit declaration of function 'ptep_get_and_clear'
    |/linux-2.6/include/asm-generic/pgtable.h:209: error: incompatible types in return
    |/linux-2.6/include/asm-generic/pgtable.h: In function '__ptep_modify_prot_commit':
    |/linux-2.6/include/asm-generic/pgtable.h:220: error: implicit declaration of function 'set_pte_at'
    |make[2]: *** [init/main.o] Error 1
    |make[1]: *** [init] Error 2
    |make: *** [sub-make] Error 2

    on my m68knommu box.

    Acked-by: Jeremy Fitzhardinge
    Cc: Linus Torvalds
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Signed-off-by: Sebastian Siewior
    Signed-off-by: Linus Torvalds

    Sebastian Siewior
     

25 Jun, 2008

1 commit

  • This patch adds an API for doing read-modify-write updates to a pte's
    protection bits which may race against hardware updates to the pte.
    After reading the pte, the hardware may asynchonously set the accessed
    or dirty bits on a pte, which would be lost when writing back the
    modified pte value.

    The existing technique to handle this race is to use
    ptep_get_and_clear() atomically fetch the old pte value and clear it
    in memory. This has the effect of marking the pte as non-present,
    which will prevent the hardware from updating its state. When the new
    value is written back, the pte will be present again, and the hardware
    can resume updating the access/dirty flags.

    When running in a virtualized environment, pagetable updates are
    relatively expensive, since they generally involve some trap into the
    hypervisor. To mitigate the cost of these updates, we tend to batch
    them.

    However, because of the atomic nature of ptep_get_and_clear(), it is
    inherently non-batchable. This new interface allows batching by
    giving the underlying implementation enough information to open a
    transaction between the read and write phases:

    ptep_modify_prot_start() returns the current pte value, and puts the
    pte entry into a state where either the hardware will not update the
    pte, or if it does, the updates will be preserved on commit.

    ptep_modify_prot_commit() writes back the updated pte, makes sure that
    any hardware updates made since ptep_modify_prot_start() are
    preserved.

    ptep_modify_prot_start() and _commit() must be exactly paired, and
    used while holding the appropriate pte lock. They do not protect
    against other software updates of the pte in any way.

    The current implementations of ptep_modify_prot_start and _commit are
    functionally unchanged from before: _start() uses ptep_get_and_clear()
    fetch the pte and zero the entry, preventing any hardware updates.
    _commit() simply writes the new pte value back knowing that the
    hardware has not updated the pte in the meantime.

    The only current user of this interface is mprotect

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Linus Torvalds
    Acked-by: Hugh Dickins
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     

17 Oct, 2007

1 commit

  • Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
    set_pte(). This is too late. This patch removes lazy_mmu_prot_update and
    add modfied set_pte() for flushing if necessary.

    This patch flush icache of a page when
    new pte has exec bit.
    && new pte has present bit
    && new pte is user's page.
    && (old *ptep is not present
    || new pte's pfn is not same to old *ptep's ptn)
    && new pte's page has no Pg_arch_1 bit.
    Pg_arch_1 is set when a page is cache consistent.

    I think this condition checks are much easier to understand than considering
    "Where sync_icache_dcache() should be inserted ?".

    pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as
    clean-up. So, I added it again.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: "Luck, Tony"
    Cc: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

12 Aug, 2007

1 commit

  • There are some parts of include/asm-generic/pgtable.h that are relevant to
    the non-mmu architectures. To make it easier to include this from them I
    would like to ifdef the relevant parts.

    Without this there is a handful of functions that are referenced in here
    that are not defined on many non-mmu architectures. They could be defined
    out of course, as an alternative approach.

    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Ungerer
     

18 Jul, 2007

2 commits


17 Jun, 2007

1 commit

  • Some changes done a while ago to avoid pounding on ptep_set_access_flags and
    update_mmu_cache in some race situations break sun4c which requires
    update_mmu_cache() to always be called on minor faults.

    This patch reworks ptep_set_access_flags() semantics, implementations and
    callers so that it's now responsible for returning whether an update is
    necessary or not (basically whether the PTE actually changed). This allow
    fixing the sparc implementation to always return 1 on sun4c.

    [akpm@linux-foundation.org: fixes, cleanups]
    Signed-off-by: Benjamin Herrenschmidt
    Cc: Hugh Dickins
    Cc: David Miller
    Cc: Mark Fortescue
    Acked-by: William Lee Irwin III
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

27 Apr, 2007

1 commit

  • The page_test_and_clear_dirty primitive really consists of two
    operations, page_test_dirty and the page_clear_dirty. The combination
    of the two is not an atomic operation, so it makes more sense to have
    two separate operations instead of one.
    In addition to the improved readability of the s390 version of
    SetPageUptodate, it now avoids the page_test_dirty operation which is
    an insert-storage-key-extended (iske) instruction which is an expensive
    operation.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

09 Apr, 2007

1 commit

  • Since lazy MMU batching mode still allows interrupts to enter, it is
    possible for interrupt handlers to try to use kmap_atomic, which fails when
    lazy mode is active, since the PTE update to highmem will be delayed. The
    best workaround is to issue an explicit flush in kmap_atomic_functions
    case; this is the only way nested PTE updates can happen in the interrupt
    handler.

    Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix.

    This patch gets reverted again when we start 2.6.22 and the bug gets fixed
    differently.

    Signed-off-by: Zachary Amsden
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     

13 Feb, 2007

1 commit

  • The VMI ROM has a mode where hypercalls can be queued and batched. This turns
    out to be a significant win during context switch, but must be done at a
    specific point before side effects to CPU state are visible to subsequent
    instructions. This is similar to the MMU batching hooks already provided.
    The same hooks could be used by the Xen backend to implement a context switch
    multicall.

    To explain a bit more about lazy modes in the paravirt patches, basically, the
    idea is that only one of lazy CPU or MMU mode can be active at any given time.
    Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
    multiple PTE updates (say, inside a remap loop), but to avoid keeping some
    kind of state machine about when to flush cpu or mmu updates, we just allow
    one or the other to be active. Although there is no real reason a more
    comprehensive scheme could not be implemented, there is also no demonstrated
    need for this extra complexity.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Chris Wright
    Signed-off-by: Andrew Morton

    Zachary Amsden
     

01 Oct, 2006

3 commits

  • Now that ptep_establish has a definition in PAE i386 3-level paging code, the
    only paging model which is insane enough to have multi-word hardware PTEs
    which are not efficient to set atomically, we can remove the ghost of
    set_pte_atomic from other architectures which falesly duplicated it, and
    remove all knowledge of it from the generic pgtable code.

    set_pte_atomic is now a private pte operator which is specific to i386

    Signed-off-by: Zachary Amsden
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • Implement lazy MMU update hooks which are SMP safe for both direct and shadow
    page tables. The idea is that PTE updates and page invalidations while in
    lazy mode can be batched into a single hypercall. We use this in VMI for
    shadow page table synchronization, and it is a win. It also can be used by
    PPC and for direct page tables on Xen.

    For SMP, the enter / leave must happen under protection of the page table
    locks for page tables which are being modified. This is because otherwise,
    you end up with stale state in the batched hypercall, which other CPUs can
    race ahead of. Doing this under the protection of the locks guarantees the
    synchronization is correct, and also means that spurious faults which are
    generated during this window by remote CPUs are properly handled, as the page
    fault handler must re-check the PTE under protection of the same lock.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • Change pte_clear_full to a more appropriately named pte_clear_not_present,
    allowing optimizations when not-present mapping changes need not be reflected
    in the hardware TLB for protected page table modes. There is also another
    case that can use it in the fremap code.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     

26 Sep, 2006

1 commit

  • Parsing generic pgtable.h in assembler is simply crazy. None of this file is
    needed in assembler code, and C inline functions and structures routine break
    one or more different compiles.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

02 Jun, 2006

1 commit

  • If we move a mapping from one virtual address to another,
    and this changes the virtual color of the mapping to those
    pages, we can see corrupt data due to D-cache aliasing.

    Check for and deal with this by overriding the move_pte()
    macro. Set things up so that other platforms can cleanly
    override the move_pte() macro too.

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Nov, 2005

1 commit

  • Fix more include file problems that surfaced since I submitted the previous
    fix-missing-includes.patch. This should now allow not to include sched.h
    from module.h, which is done by a followup patch.

    Signed-off-by: Tim Schmielau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

30 Oct, 2005

1 commit


28 Sep, 2005

1 commit

  • Move the ZERO_PAGE remapping complexity to the move_pte macro in
    asm-generic, have it conditionally depend on
    __HAVE_ARCH_MULTIPLE_ZERO_PAGE, which gets defined for MIPS.

    For architectures without __HAVE_ARCH_MULTIPLE_ZERO_PAGE, move_pte becomes
    a noop.

    From: Hugh Dickins

    Fix nasty little bug we've missed in Nick's mremap move ZERO_PAGE patch.
    The "pte" at that point may be a swap entry or a pte_file entry: we must
    check pte_present before perhaps corrupting such an entry.

    Patch below against 2.6.14-rc2-mm1, but the same bug is in 2.6.14-rc2's
    mm/mremap.c, and more dangerous there since it's affecting all arches: I
    think the safest course is to send Nick's patch and Yoichi's build fix and
    this fix (build tested) on to Linus - so only MIPS can be affected.

    Signed-off-by: Nick Piggin
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

05 Sep, 2005

1 commit

  • Add a new accessor for PTEs, which passes the full hint from the mmu_gather
    struct; this allows architectures with hardware pagetables to optimize away
    atomic PTE operations when destroying an address space. Removing the
    locked operation should allow better pipelining of memory access in this
    loop. I measured an average savings of 30-35 cycles per zap_pte_range on
    the first 500 destructions on Pentium-M, but I believe the optimization
    would win more on older processors which still assert the bus lock on xchg
    for an exclusive cacheline.

    Update: I made some new measurements, and this saves exactly 26 cycles over
    ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
    cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
    full address space destruction.

    pte_clear_full is not yet used, but is provided for future optimizations
    (in particular, when running inside of a hypervisor that queues page table
    updates, the full hint allows us to avoid queueing unnecessary page table
    update for an address space in the process of being destroyed.

    This is not a huge win, but it does help a bit, and sets the stage for
    further hypervisor optimization of the mm layer on all architectures.

    Signed-off-by: Zachary Amsden
    Cc: Christoph Lameter
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     

22 Jun, 2005

1 commit

  • It's common practice to msync a large address range regularly, in which
    often only a few ptes have actually been dirtied since the previous pass.

    sync_pte_range then goes much faster if it tests whether pte is dirty
    before locating and accessing each struct page cacheline; and it is hardly
    slowed by ptep_clear_flush_dirty repeating that test in the opposite case,
    when every pte actually is dirty.

    But beware, s390's pte_dirty always says false, since its dirty bit is kept
    in the storage key, located via the struct page address. So skip this
    optimization in its case: use a pte_maybe_dirty macro which just says true
    if page_test_and_clear_dirty is implemented.

    Signed-off-by: Abhijit Karmarkar
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Abhijit Karmarkar
     

20 Apr, 2005

1 commit

  • ia64 and sparc64 hurriedly had to introduce their own variants of
    pgd_addr_end, to leapfrog over the holes in their virtual address spaces which
    the final clear_page_range suddenly presented when converted from pgd_index to
    pgd_addr_end. But now that free_pgtables respects the vma list, those holes
    are never presented, and the arch variants can go.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds