12 Oct, 2007

1 commit


18 Jul, 2007

2 commits


17 Jul, 2007

1 commit

  • Kill pte_rdprotect(), pte_exprotect(), pte_mkread(), pte_mkexec(), pte_read(),
    pte_exec(), and pte_user() except where arch-specific code is making use of
    them.

    Signed-off-by: Jan Beulich
    Cc: Andi Kleen
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

17 Jun, 2007

1 commit

  • Some changes done a while ago to avoid pounding on ptep_set_access_flags and
    update_mmu_cache in some race situations break sun4c which requires
    update_mmu_cache() to always be called on minor faults.

    This patch reworks ptep_set_access_flags() semantics, implementations and
    callers so that it's now responsible for returning whether an update is
    necessary or not (basically whether the PTE actually changed). This allow
    fixing the sparc implementation to always return 1 on sun4c.

    [akpm@linux-foundation.org: fixes, cleanups]
    Signed-off-by: Benjamin Herrenschmidt
    Cc: Hugh Dickins
    Cc: David Miller
    Cc: Mark Fortescue
    Acked-by: William Lee Irwin III
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

27 Apr, 2007

1 commit

  • The page_test_and_clear_dirty primitive really consists of two
    operations, page_test_dirty and the page_clear_dirty. The combination
    of the two is not an atomic operation, so it makes more sense to have
    two separate operations instead of one.
    In addition to the improved readability of the s390 version of
    SetPageUptodate, it now avoids the page_test_dirty operation which is
    an insert-storage-key-extended (iske) instruction which is an expensive
    operation.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

06 Feb, 2007

2 commits

  • This provides a noexec protection on s390 hardware. Our hardware does
    not have any bits left in the pte for a hw noexec bit, so this is a
    different approach using shadow page tables and a special addressing
    mode that allows separate address spaces for code and data.

    As a special feature of our "secondary-space" addressing mode, separate
    page tables can be specified for the translation of data addresses
    (storage operands) and instruction addresses. The shadow page table is
    used for the instruction addresses and the standard page table for the
    data addresses.
    The shadow page table is linked to the standard page table by a pointer
    in page->lru.next of the struct page corresponding to the page that
    contains the standard page table (since page->private is not really
    private with the pte_lock and the page table pages are not in the LRU
    list).
    Depending on the software bits of a pte, it is either inserted into
    both page tables or just into the standard (data) page table. Pages of
    a vma that does not have the VM_EXEC bit set get mapped only in the
    data address space. Any try to execute code on such a page will cause a
    page translation exception. The standard reaction to this is a SIGSEGV
    with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn)
    and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the
    kernel to the signal stack frame. Unfortunately, the signal return
    mechanism cannot be modified to use an SA_RESTORER because the
    exception unwinding code depends on the system call opcode stored
    behind the signal stack frame.

    This feature requires that user space is executed in secondary-space
    mode and the kernel in home-space mode, which means that the addressing
    modes need to be switched and that the noexec protection only works
    for user space.
    After switching the addressing modes, we cannot use the mvcp/mvcs
    instructions anymore to copy between kernel and user space. A new
    mvcos instruction has been added to the z9 EC/BC hardware which allows
    to copy between arbitrary address spaces, but on older hardware the
    page tables need to be walked manually.

    Signed-off-by: Gerald Schaefer
    Signed-off-by: Martin Schwidefsky

    Gerald Schaefer
     
  • Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

08 Dec, 2006

1 commit

  • Virtual memmap support for s390. Inspired by the ia64 implementation.

    Unlike ia64 we need a mechanism which allows us to dynamically attach
    shared memory regions.
    These memory regions are accessed via the dcss device driver. dcss
    implements the 'direct_access' operation, which requires struct pages
    for every single shared page.
    Therefore this implementation provides an interface to attach/detach
    shared memory:

    int add_shared_memory(unsigned long start, unsigned long size);
    int remove_shared_memory(unsigned long start, unsigned long size);

    The purpose of the add_shared_memory function is to add the given
    memory range to the 1:1 mapping and to make sure that the
    corresponding range in the vmemmap is backed with physical pages.
    It also initialises the new struct pages.

    remove_shared_memory in turn only invalidates the page table
    entries in the 1:1 mapping. The page tables and the memory used for
    struct pages in the vmemmap are currently not freed. They will be
    reused when the next segment will be attached.
    Given that the maximum size of a shared memory region is 2GB and
    in addition all regions must reside below 2GB this is not too much of
    a restriction, but there is room for improvement.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

04 Dec, 2006

1 commit

  • VMALLOC_END on 31bit should be 0x8000000UL instead of 0x7fffffffL.
    The page mask which is used to make sure memory_end is on 4MB/2MB
    boundary is wrong and not needed. Therefore remove it.
    Make sure a vmalloc area does also exist and work on (future)
    machines with 4TB and more memory.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

19 Oct, 2006

1 commit

  • handle_pte_fault uses pte_present, pte_none and pte_file to find out
    the type of a pte. That is done without holding the page table lock.
    This clashes with the way how ptep_clear_flush removes active page
    table entries from the system. First the ipte instruction is used
    to invalidate the pte and remove all plt entries for the page. The
    ipte sets the hardware invalid bit without changing any other bit.
    After the ipte finished the pte is cleared. A concurrent fault can
    observe the the previously valid pte with the invalid bit set. With
    the current encoding of the different pte types an invalidated
    read-only pte can be misinterpreted as a swap-pte.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

05 Oct, 2006

1 commit


30 Sep, 2006

1 commit

  • Convert s390 page handling macros to functions. In particular this fixes a
    problem with s390's SetPageUptodate macro which uses its input parameter
    twice which again can cause subtle bugs.

    [akpm@osdl.org: build fix]
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

28 Sep, 2006

1 commit

  • Major cleanup of all s390 inline assemblies. They now have a common
    coding style. Quite a few have been shortened, mainly by using register
    asm variables. Use of the EX_TABLE macro helps as well. The atomic ops,
    bit ops and locking inlines new use the Q-constraint if a newer gcc
    is used. That results in slightly better code.

    Thanks to Christian Borntraeger for proof reading the changes.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

26 Sep, 2006

1 commit

  • One of the changes necessary for shared page tables is to standardize the
    pxx_page macros. pte_page and pmd_page have always returned the struct
    page associated with their entry, while pte_page_kernel and pmd_page_kernel
    have returned the kernel virtual address. pud_page and pgd_page, on the
    other hand, return the kernel virtual address.

    Shared page tables needs pud_page and pgd_page to return the actual page
    structures. There are very few actual users of these functions, so it is
    simple to standardize their usage.

    Since this is basic cleanup, I am submitting these changes as a standalone
    patch. Per Hugh Dickins' comments about it, I am also changing the
    pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

    Signed-off-by: Dave McCracken
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave McCracken
     

20 Sep, 2006

1 commit


01 Jul, 2006

1 commit

  • NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas
    event counters do not need to be.

    Zone based VM statistics are necessary to be able to determine what the state
    of memory in one zone is. In a NUMA system this can be helpful for local
    reclaim and other memory optimizations that may be able to shift VM load in
    order to get more balanced memory use.

    It is also useful to know how the computing load affects the memory
    allocations on various zones. This patchset allows the retrieval of that data
    from userspace.

    The patchset introduces a framework for counters that is a cross between the
    existing page_stats --which are simply global counters split per cpu-- and the
    approach of deferred incremental updates implemented for nr_pagecache.

    Small per cpu 8 bit counters are added to struct zone. If the counter exceeds
    certain thresholds then the counters are accumulated in an array of
    atomic_long in the zone and in a global array that sums up all zone values.
    The small 8 bit counters are next to the per cpu page pointers and so they
    will be in high in the cpu cache when pages are allocated and freed.

    Access to VM counter information for a zone and for the whole machine is then
    possible by simply indexing an array (Thanks to Nick Piggin for pointing out
    that approach). The access to the total number of pages of various types does
    no longer require the summing up of all per cpu counters.

    Benefits of this patchset right now:

    - Ability for UP and SMP configuration to determine how memory
    is balanced between the DMA, NORMAL and HIGHMEM zones.

    - loops over all processors are avoided in writeback and
    reclaim paths. We can avoid caching the writeback information
    because the needed information is directly accessible.

    - Special handling for nr_pagecache removed.

    - zone_reclaim_interval vanishes since VM stats can now determine
    when it is worth to do local reclaim.

    - Fast inline per node page state determination.

    - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
    counters are counting simply which processor allocated a page somewhere
    and guestimate based on that. So the counters were not useful to show
    the actual distribution of page use on a specific zone.

    - The swap_prefetch patch requires per node statistics in order to
    figure out when processors of a node can prefetch. This patch provides
    some of the needed numbers.

    - Detailed VM counters available in more /proc and /sys status files.

    References to earlier discussions:
    V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
    V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

    Performance tests with AIM7 did not show any regressions. Seems to be a tad
    faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes
    fixes for s390/arm/uml arch code.

    This patch:

    Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

    Create vmstat.c/vmstat.h by separating the counter code and the proc
    functions.

    Move the vm_stat_text array before zoneinfo_show.

    [akpm@osdl.org: s390 build fix]
    [akpm@osdl.org: HOTPLUG_CPU build fix]
    Signed-off-by: Christoph Lameter
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Nov, 2005

1 commit


07 Nov, 2005

1 commit

  • Fix more include file problems that surfaced since I submitted the previous
    fix-missing-includes.patch. This should now allow not to include sched.h
    from module.h, which is done by a followup patch.

    Signed-off-by: Tim Schmielau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

20 Apr, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds