15 Jan, 2011

6 commits

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
    PCI/PM: Report wakeup events before resuming devices
    PCI/PM: Use pm_wakeup_event() directly for reporting wakeup events
    PCI: sysfs: Update ROM to include default owner write access
    x86/PCI: make Broadcom CNB20LE driver EMBEDDED and EXPERIMENTAL
    x86/PCI: don't use native Broadcom CNB20LE driver when ACPI is available
    PCI/ACPI: Request _OSC control once for each root bridge (v3)
    PCI: enable pci=bfsort by default on future Dell systems
    PCI/PCIe: Clear Root PME Status bits early during system resume
    PCI: pci-stub: ignore zero-length id parameters
    x86/PCI: irq and pci_ids patch for Intel Patsburg
    PCI: Skip id checking if no id is passed
    PCI: fix __pci_device_probe kernel-doc warning
    PCI: make pci_restore_state return void
    PCI: Disable ASPM if BIOS asks us to
    PCI: Add mask bit definition for MSI-X table
    PCI: MSI: Move MSI-X entry definition to pci_regs.h

    Fix up trivial conflicts in drivers/net/{skge.c,sky2.c} that had in the
    meantime been converted to not use legacy PCI power management, and thus
    no longer use pci_restore_state() at all (and that caused trivial
    conflicts with the "make pci_restore_state return void" patch)

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (59 commits)
    mfd: ab8500-core chip version cut 2.0 support
    mfd: Flag WM831x /IRQ as a wake source
    mfd: Convert WM831x away from legacy I2C PM operations
    regulator: Support MAX8998/LP3974 DVS-GPIO
    mfd: Support LP3974 RTC
    i2c: Convert SCx200 driver from using raw PCI to platform device
    x86: OLPC: convert olpc-xo1 driver from pci device to platform device
    mfd: MAX8998/LP3974 hibernation support
    mfd/ab8500: remove spi support
    mfd: Remove ARCH_U8500 dependency from AB8500
    misc: Make AB8500_PWM driver depend on U8500 due to PWM breakage
    mfd: Add __devexit annotation for vx855_remove
    mfd: twl6030 irq_data conversion.
    gpio: Fix cs5535 printk warnings
    misc: Fix cs5535 printk warnings
    mfd: Convert Wolfson MFD drivers to use irq_data accessor function
    mfd: Convert TWL4030 to new irq_ APIs
    mfd: Convert tps6586x driver to new irq_ API
    mfd: Convert tc6393xb driver to new irq_ APIs
    mfd: Convert t7166xb driver to new irq_ API
    ...

    Linus Torvalds
     
  • This functionality is known to be incomplete, so discourage its use in
    general-purpose kernels.

    The only reason to use this driver is to support PCI hotplug on CNB20LE-
    based machines that don't have ACPI, and there are very few such
    systems.

    Reference: https://bugzilla.redhat.com/show_bug.cgi?id=665109
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • The broadcom_bus.c quirk was written (without benefit of documentation)
    to support PCI hotplug on an old system that doesn't have ACPI. As
    such, we should only use it when the system doesn't have ACPI.

    If the system does have ACPI and we need the host bridge description, we
    should get it from the ACPI _CRS method. On machines older than 2008,
    we currently ignore _CRS, but that doesn't mean we should use
    broadcom_bus.c. It means we should either (a) do what we've done in the
    past and assume everything in the PCI gap is routed to bus 0 (so hotplug
    may not work), or (b) arrange to use _CRS. This patch does (a).

    Reference: https://bugzilla.redhat.com/show_bug.cgi?id=665109
    Acked-by: Ira W. Snyder
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • This patch enables pci=bfsort by default on future Dell systems.
    It reads SMBIOS type 0xB1 vendor specific record and sets pci=bfsort
    accordingly.

    Offset Name Length Value Description

    04 Flags0 Word Varies Bits 9-10
    - 10:9 = 00 Unknown
    - 10:9 = 01 Breadth First
    - 10:9 = 10 Depth First
    - 10:9 = 11 Reserved

    1. Any time pci=bfsort has to be enabled on a system, we need to add the
    model number of the system to the white list. With this patch, that
    is not required.

    2. Typically, model number has to be added to the white list when the
    system is under development. With this change, that is not required.

    Signed-off-by: Jordan Hargrave
    Signed-off-by: Narendra K
    Signed-off-by: Jesse Barnes

    Narendra_K@Dell.com
     
  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
    [S390] MAINTAINERS: Update zcrypt driver entry
    [S390] Randomize PIEs
    [S390] Randomise the brk region
    [S390] Add is_32bit_task() helper function
    [S390] Randomize lower bits of stack address
    [S390] Randomize mmap start address
    [S390] Rearrange mmap.c
    [S390] Enable flexible mmap layout for 64 bit processes
    [S390] vdso: dont map at mmap_base
    [S390] reduce miminum gap between stack and mmap_base
    [S390] mmap: consider stack address randomization
    [S390] Update default configuration
    [S390] cio: path_event overindication after resume

    Linus Torvalds
     

14 Jan, 2011

34 commits

  • The cs5535-mfd driver now takes care of the PCI BAR handling; this
    means the olpc-xo1 driver shouldn't be touching the PCI device at all.

    This patch uses both cs5535-acpi and cs5535-pms platform devices rather
    than a single platform device because the cs5535-mfd driver may be used
    by other CS5535 platform-specific drivers; OLPC doesn't get to dictate
    that ACPI and PMS will always be used together.

    Signed-off-by: Andres Salomon
    Acked-by: H. Peter Anvin
    Signed-off-by: Samuel Ortiz

    Andres Salomon
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
    ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
    ACPI: fix resource check message
    ACPI / Battery: Update information on info notification and resume
    ACPI: Drop device flag wake_capable
    ACPI: Always check if _PRW is present before trying to evaluate it
    ACPI / PM: Check status of power resources under mutexes
    ACPI / PM: Rename acpi_power_off_device()
    ACPI / PM: Drop acpi_power_nocheck
    ACPI / PM: Drop acpi_bus_get_power()
    Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
    ACPI / Fan: Rework the handling of power resources
    ACPI / PM: Register power resource devices as soon as they are needed
    ACPI / PM: Register acpi_power_driver early
    ACPI / PM: Add function for updating device power state consistently
    ACPI / PM: Add function for device power state initialization
    ACPI / PM: Introduce __acpi_bus_get_power()
    ACPI / PM: Introduce function for refcounting device power resources
    ACPI / PM: Add functions for manipulating lists of power resources
    ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
    ACPICA: Update version to 20101209
    ...

    Linus Torvalds
     
  • * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
    cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
    intel_idle: open broadcast clock event
    cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific
    cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle
    cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions
    SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW
    cpuidle: delete NOP CPUIDLE_FLAG_POLL
    ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs
    cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL
    ACPI, intel_idle: Cleanup idle= internal variables
    cpuidle: Make cpuidle_enable_device() call poll_idle_init()
    intel_idle: update Sandy Bridge core C-state residency targets

    Linus Torvalds
     
  • * 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/p2m: Fix module linking error.
    xen p2m: clear the old pte when adding a page to m2p_override
    xen gntdev: use gnttab_map_refs and gnttab_unmap_refs
    xen: introduce gnttab_map_refs and gnttab_unmap_refs
    xen p2m: transparently change the p2m mappings in the m2p override
    xen/gntdev: Fix circular locking dependency
    xen/gntdev: stop using "token" argument
    xen: gntdev: move use of GNTMAP_contains_pte next to the map_op
    xen: add m2p override mechanism
    xen: move p2m handling to separate file
    xen/gntdev: add VM_PFNMAP to vma
    xen/gntdev: allow usermode to map granted pages
    xen: define gnttab_set_map_op/unmap_op

    Fix up trivial conflict in drivers/xen/Kconfig

    Linus Torvalds
     
  • Define MADV_NOHUGEPAGE.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • For GRU and EPT, we need gup-fast to set referenced bit too (this is why
    it's correct to return 0 when shadow_access_mask is zero, it requires
    gup-fast to set the referenced bit). qemu-kvm access already sets the
    young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
    paging EPT minor fault we relay on gup-fast to signal the page is in
    use...

    We also need to check the young bits on the secondary pagetables for NPT
    and not nested shadow mmu as the data may never get accessed again by the
    primary pte.

    Without this closer accuracy, we'd have to remove the heuristic that
    avoids collapsing hugepages in hugepage virtual regions that have not even
    a single subpage in use.

    ->test_young is full backwards compatible with GRU and other usages that
    don't have young bits in pagetables set by the hardware and that should
    nuke the secondary mmu mappings when ->clear_flush_young runs just like
    EPT does.

    Removing the heuristic that checks the young bit in
    khugepaged/collapse_huge_page completely isn't so bad either probably but
    I thought it was worth it and this makes it reliable.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Archs implementing Transparent Hugepage Support must implement a function
    called has_transparent_hugepage to be sure the virtual or physical CPU
    supports Transparent Hugepages.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add pmd_modify() for use with mprotect() on huge pmds.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add support for transparent hugepages to x86 32bit.

    Share the same VM_ bitflag for VM_MAPPED_COPY. mm/nommu.c will never
    support transparent hugepages.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Lately I've been working to make KVM use hugepages transparently without
    the usual restrictions of hugetlbfs. Some of the restrictions I'd like to
    see removed:

    1) hugepages have to be swappable or the guest physical memory remains
    locked in RAM and can't be paged out to swap

    2) if a hugepage allocation fails, regular pages should be allocated
    instead and mixed in the same vma without any failure and without
    userland noticing

    3) if some task quits and more hugepages become available in the
    buddy, guest physical memory backed by regular pages should be
    relocated on hugepages automatically in regions under
    madvise(MADV_HUGEPAGE) (ideally event driven by waking up the
    kernel deamon if the order=HPAGE_PMD_SHIFT-PAGE_SHIFT list becomes
    not null)

    4) avoidance of reservation and maximization of use of hugepages whenever
    possible. Reservation (needed to avoid runtime fatal faliures) may be ok for
    1 machine with 1 database with 1 database cache with 1 database cache size
    known at boot time. It's definitely not feasible with a virtualization
    hypervisor usage like RHEV-H that runs an unknown number of virtual machines
    with an unknown size of each virtual machine with an unknown amount of
    pagecache that could be potentially useful in the host for guest not using
    O_DIRECT (aka cache=off).

    hugepages in the virtualization hypervisor (and also in the guest!) are
    much more important than in a regular host not using virtualization,
    becasue with NPT/EPT they decrease the tlb-miss cacheline accesses from 24
    to 19 in case only the hypervisor uses transparent hugepages, and they
    decrease the tlb-miss cacheline accesses from 19 to 15 in case both the
    linux hypervisor and the linux guest both uses this patch (though the
    guest will limit the addition speedup to anonymous regions only for
    now...). Even more important is that the tlb miss handler is much slower
    on a NPT/EPT guest than for a regular shadow paging or no-virtualization
    scenario. So maximizing the amount of virtual memory cached by the TLB
    pays off significantly more with NPT/EPT than without (even if there would
    be no significant speedup in the tlb-miss runtime).

    The first (and more tedious) part of this work requires allowing the VM to
    handle anonymous hugepages mixed with regular pages transparently on
    regular anonymous vmas. This is what this patch tries to achieve in the
    least intrusive possible way. We want hugepages and hugetlb to be used in
    a way so that all applications can benefit without changes (as usual we
    leverage the KVM virtualization design: by improving the Linux VM at
    large, KVM gets the performance boost too).

    The most important design choice is: always fallback to 4k allocation if
    the hugepage allocation fails! This is the _very_ opposite of some large
    pagecache patches that failed with -EIO back then if a 64k (or similar)
    allocation failed...

    Second important decision (to reduce the impact of the feature on the
    existing pagetable handling code) is that at any time we can split an
    hugepage into 512 regular pages and it has to be done with an operation
    that can't fail. This way the reliability of the swapping isn't decreased
    (no need to allocate memory when we are short on memory to swap) and it's
    trivial to plug a split_huge_page* one-liner where needed without
    polluting the VM. Over time we can teach mprotect, mremap and friends to
    handle pmd_trans_huge natively without calling split_huge_page*. The fact
    it can't fail isn't just for swap: if split_huge_page would return -ENOMEM
    (instead of the current void) we'd need to rollback the mprotect from the
    middle of it (ideally including undoing the split_vma) which would be a
    big change and in the very wrong direction (it'd likely be simpler not to
    call split_huge_page at all and to teach mprotect and friends to handle
    hugepages instead of rolling them back from the middle). In short the
    very value of split_huge_page is that it can't fail.

    The collapsing and madvise(MADV_HUGEPAGE) part will remain separated and
    incremental and it'll just be an "harmless" addition later if this initial
    part is agreed upon. It also should be noted that locking-wise replacing
    regular pages with hugepages is going to be very easy if compared to what
    I'm doing below in split_huge_page, as it will only happen when
    page_count(page) matches page_mapcount(page) if we can take the PG_lock
    and mmap_sem in write mode. collapse_huge_page will be a "best effort"
    that (unlike split_huge_page) can fail at the minimal sign of trouble and
    we can try again later. collapse_huge_page will be similar to how KSM
    works and the madvise(MADV_HUGEPAGE) will work similar to
    madvise(MADV_MERGEABLE).

    The default I like is that transparent hugepages are used at page fault
    time. This can be changed with
    /sys/kernel/mm/transparent_hugepage/enabled. The control knob can be set
    to three values "always", "madvise", "never" which mean respectively that
    hugepages are always used, or only inside madvise(MADV_HUGEPAGE) regions,
    or never used. /sys/kernel/mm/transparent_hugepage/defrag instead
    controls if the hugepage allocation should defrag memory aggressively
    "always", only inside "madvise" regions, or "never".

    The pmd_trans_splitting/pmd_trans_huge locking is very solid. The
    put_page (from get_user_page users that can't use mmu notifier like
    O_DIRECT) that runs against a __split_huge_page_refcount instead was a
    pain to serialize in a way that would result always in a coherent page
    count for both tail and head. I think my locking solution with a
    compound_lock taken only after the page_first is valid and is still a
    PageHead should be safe but it surely needs review from SMP race point of
    view. In short there is no current existing way to serialize the O_DIRECT
    final put_page against split_huge_page_refcount so I had to invent a new
    one (O_DIRECT loses knowledge on the mapping status by the time gup_fast
    returns so...). And I didn't want to impact all gup/gup_fast users for
    now, maybe if we change the gup interface substantially we can avoid this
    locking, I admit I didn't think too much about it because changing the gup
    unpinning interface would be invasive.

    If we ignored O_DIRECT we could stick to the existing compound refcounting
    code, by simply adding a get_user_pages_fast_flags(foll_flags) where KVM
    (and any other mmu notifier user) would call it without FOLL_GET (and if
    FOLL_GET isn't set we'd just BUG_ON if nobody registered itself in the
    current task mmu notifier list yet). But O_DIRECT is fundamental for
    decent performance of virtualized I/O on fast storage so we can't avoid it
    to solve the race of put_page against split_huge_page_refcount to achieve
    a complete hugepage feature for KVM.

    Swap and oom works fine (well just like with regular pages ;). MMU
    notifier is handled transparently too, with the exception of the young bit
    on the pmd, that didn't have a range check but I think KVM will be fine
    because the whole point of hugepages is that EPT/NPT will also use a huge
    pmd when they notice gup returns pages with PageCompound set, so they
    won't care of a range and there's just the pmd young bit to check in that
    case.

    NOTE: in some cases if the L2 cache is small, this may slowdown and waste
    memory during COWs because 4M of memory are accessed in a single fault
    instead of 8k (the payoff is that after COW the program can run faster).
    So we might want to switch the copy_huge_page (and clear_huge_page too) to
    not temporal stores. I also extensively researched ways to avoid this
    cache trashing with a full prefault logic that would cow in 8k/16k/32k/64k
    up to 1M (I can send those patches that fully implemented prefault) but I
    concluded they're not worth it and they add an huge additional complexity
    and they remove all tlb benefits until the full hugepage has been faulted
    in, to save a little bit of memory and some cache during app startup, but
    they still don't improve substantially the cache-trashing during startup
    if the prefault happens in >4k chunks. One reason is that those 4k pte
    entries copied are still mapped on a perfectly cache-colored hugepage, so
    the trashing is the worst one can generate in those copies (cow of 4k page
    copies aren't so well colored so they trashes less, but again this results
    in software running faster after the page fault). Those prefault patches
    allowed things like a pte where post-cow pages were local 4k regular anon
    pages and the not-yet-cowed pte entries were pointing in the middle of
    some hugepage mapped read-only. If it doesn't payoff substantially with
    todays hardware it will payoff even less in the future with larger l2
    caches, and the prefault logic would blot the VM a lot. If one is
    emebdded transparent_hugepage can be disabled during boot with sysfs or
    with the boot commandline parameter transparent_hugepage=0 (or
    transparent_hugepage=2 to restrict hugepages inside madvise regions) that
    will ensure not a single hugepage is allocated at boot time. It is simple
    enough to just disable transparent hugepage globally and let transparent
    hugepages be allocated selectively by applications in the MADV_HUGEPAGE
    region (both at page fault time, and if enabled with the
    collapse_huge_page too through the kernel daemon).

    This patch supports only hugepages mapped in the pmd, archs that have
    smaller hugepages will not fit in this patch alone. Also some archs like
    power have certain tlb limits that prevents mixing different page size in
    the same regions so they will not fit in this framework that requires
    "graceful fallback" to basic PAGE_SIZE in case of physical memory
    fragmentation. hugetlbfs remains a perfect fit for those because its
    software limits happen to match the hardware limits. hugetlbfs also
    remains a perfect fit for hugepage sizes like 1GByte that cannot be hoped
    to be found not fragmented after a certain system uptime and that would be
    very expensive to defragment with relocation, so requiring reservation.
    hugetlbfs is the "reservation way", the point of transparent hugepages is
    not to have any reservation at all and maximizing the use of cache and
    hugepages at all times automatically.

    Some performance result:

    vmx andrea # LD_PRELOAD=/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE=yes HUGETLB_PATH=/mnt/huge/ ./largep
    ages3
    memset page fault 1566023
    memset tlb miss 453854
    memset second tlb miss 453321
    random access tlb miss 41635
    random access second tlb miss 41658
    vmx andrea # LD_PRELOAD=/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE=yes HUGETLB_PATH=/mnt/huge/ ./largepages3
    memset page fault 1566471
    memset tlb miss 453375
    memset second tlb miss 453320
    random access tlb miss 41636
    random access second tlb miss 41637
    vmx andrea # ./largepages3
    memset page fault 1566642
    memset tlb miss 453417
    memset second tlb miss 453313
    random access tlb miss 41630
    random access second tlb miss 41647
    vmx andrea # ./largepages3
    memset page fault 1566872
    memset tlb miss 453418
    memset second tlb miss 453315
    random access tlb miss 41618
    random access second tlb miss 41659
    vmx andrea # echo 0 > /proc/sys/vm/transparent_hugepage
    vmx andrea # ./largepages3
    memset page fault 2182476
    memset tlb miss 460305
    memset second tlb miss 460179
    random access tlb miss 44483
    random access second tlb miss 44186
    vmx andrea # ./largepages3
    memset page fault 2182791
    memset tlb miss 460742
    memset second tlb miss 459962
    random access tlb miss 43981
    random access second tlb miss 43988

    ============
    #include
    #include
    #include
    #include

    #define SIZE (3UL*1024*1024*1024)

    int main()
    {
    char *p = malloc(SIZE), *p2;
    struct timeval before, after;

    gettimeofday(&before, NULL);
    memset(p, 0, SIZE);
    gettimeofday(&after, NULL);
    printf("memset page fault %Lu\n",
    (after.tv_sec-before.tv_sec)*1000000UL +
    after.tv_usec-before.tv_usec);

    gettimeofday(&before, NULL);
    memset(p, 0, SIZE);
    gettimeofday(&after, NULL);
    printf("memset tlb miss %Lu\n",
    (after.tv_sec-before.tv_sec)*1000000UL +
    after.tv_usec-before.tv_usec);

    gettimeofday(&before, NULL);
    memset(p, 0, SIZE);
    gettimeofday(&after, NULL);
    printf("memset second tlb miss %Lu\n",
    (after.tv_sec-before.tv_sec)*1000000UL +
    after.tv_usec-before.tv_usec);

    gettimeofday(&before, NULL);
    for (p2 = p; p2 < p+SIZE; p2 += 4096)
    *p2 = 0;
    gettimeofday(&after, NULL);
    printf("random access tlb miss %Lu\n",
    (after.tv_sec-before.tv_sec)*1000000UL +
    after.tv_usec-before.tv_usec);

    gettimeofday(&before, NULL);
    for (p2 = p; p2 < p+SIZE; p2 += 4096)
    *p2 = 0;
    gettimeofday(&after, NULL);
    printf("random access second tlb miss %Lu\n",
    (after.tv_sec-before.tv_sec)*1000000UL +
    after.tv_usec-before.tv_usec);

    return 0;
    }
    ============

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • This should work for both hugetlbfs and transparent hugepages.

    [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
    Signed-off-by: Andrea Arcangeli
    Cc: Avi Kivity
    Cc: Marcelo Tosatti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • split_huge_page_pmd compat code. Each one of those would need to be
    expanded to hundred of lines of complex code without a fully reliable
    split_huge_page_pmd design.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • pte alloc routines must wait for split_huge_page if the pmd is not present
    and not null (i.e. pmd_trans_splitting). The additional branches are
    optimized away at compile time by pmd_trans_splitting if the config option
    is off. However we must pass the vma down in order to know the anon_vma
    lock to wait for.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Force gup_fast to take the slow path and block if the pmd is splitting,
    not only if it's none.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add needed pmd mangling functions with symmetry with their pte
    counterparts. pmdp_splitting_flush() is the only new addition on the pmd_
    methods and it's needed to serialize the VM against split_huge_page. It
    simply atomically sets the splitting bit in a similar way
    pmdp_clear_flush_young atomically clears the accessed bit.
    pmdp_splitting_flush() also has to flush the tlb to make it effective
    against gup_fast, but it wouldn't really require to flush the tlb too.
    Just the tlb flush is the simplest operation we can invoke to serialize
    pmdp_splitting_flush() against gup_fast.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • These returns 0 at compile time when the config option is disabled, to
    allow gcc to eliminate the transparent hugepage function calls at compile
    time without additional #ifdefs (only the export of those functions have
    to be visible to gcc but they won't be required at link time and
    huge_memory.o can be not built at all).

    _PAGE_BIT_UNUSED1 is never used for pmd, only on pte.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • No paravirt version of set_pmd_at/pmd_update/pmd_update_defer.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be
    necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs
    pmd_update_defer), but this is to keep full simmetry with pte paravirt
    ops, which looks cleaner and simpler from a common code POV.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Used by paravirt and not paravirt set_pmd_at.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Alter compound get_page/put_page to keep references on subpages too, in
    order to allow __split_huge_page_refcount to split an hugepage even while
    subpages have been pinned by one of the get_user_pages() variants.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Define MADV_HUGEPAGE.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Arnd Bergmann
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
    module_init(). Much of the code is duplicated and can be generalized in a
    globally accessible function, __vmalloc_node_range().

    __vmalloc_node() now calls into __vmalloc_node_range() with a range of
    [VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.

    Each architecture may then use __vmalloc_node_range() directly to remove
    the duplication of code.

    Signed-off-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: "David S. Miller"
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] fix build error - arch/ia64/kernel/perfmon.c

    Linus Torvalds
     
  • arch/ia64/kernel/perfmon.c:621: error: duplicate 'static'

    Introduced by commit c74a1cbb3cac348f276fabc381758f5b0b4713b2

    pass default dentry_operations to mount_pseudo()

    Signed-off-by: Tony Luck

    Tony Luck
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6:
    avr32: update default configuration files for Atmel boards
    avr32: Convert to clocksource_register_hz
    avr32: make architecture sys_clone prototype match asm-generic prototype
    avr32: use syscall prototypes from asm-generic instead of arch
    avr32: disable kprobes for all default configurations
    avr32: boards: setup: use IS_ERR() instead of NULL check

    Linus Torvalds
     
  • This patch adjusts some values to make the default configuration for Atmel
    boards more similar, and adds missing values to enable required functions. Also
    remove defined symbols for functions not in use.

    Signed-off-by: Hans-Christian Egtvedt

    Hans-Christian Egtvedt
     
  • This converts the avr32 clocksource to use clocksource_register_hz.

    This is untested, so any assistance in testing would be appreciated!

    CC: Hans-Christian Egtvedt
    CC: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • This patch will fix the arguments to the architecture sys_clone() function to
    match the asm-generic/syscalls.h prototype. In the same go remove the
    architecture specific prototype for the same function.

    The sys_clone() function is only called from assembly, hence the argument types
    were not having any affect.

    Signed-off-by: Hans-Christian Egtvedt

    Hans-Christian Egtvedt
     
  • This patch removes the redundant syscalls prototypes in the architecture
    specific syscalls.h header file. These were identical with the ones in
    asm-generic/syscalls.h.

    Signed-off-by: Hans-Christian Egtvedt
    Reported-by: Peter Huewe
    Reported-by: Sven Schnelle
    Cc: stable

    Hans-Christian Egtvedt
     
  • This patch will disable kprobes for all the default AVR32 board configurations.
    This works around a regression in kprobes which seems to be related to AVR32 is
    now lacking the struct kprobe_ctlblk.

    Signed-off-by: Hans-Christian Egtvedt

    Hans-Christian Egtvedt
     
  • clk_get() returns ERR_PTR() on error, not NULL.

    Signed-off-by: Vasiliy Kulikov
    Acked-by: Hans-Christian Egtvedt

    Vasiliy Kulikov
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] Fix format warning in arch/ia64/kernel/acpi.c

    Linus Torvalds
     
  • * 'rmobile-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    ARM: mach-shmobile: Kill off unused !gpio_is_valid() case
    ARM: mach-shmobile: sh7372 Enable SDIO IRQs for Mackerel
    ARM: mach-shmobile: sh7377 Enable SDIO IRQs
    ARM: mach-shmobile: sh7367 Enable SDIO IRQs
    ARM: mach-shmobile: sh7372 Enable SDIO IRQs
    ARM: mach-shmobile: mackerel: Add touchscreen ST1232 support
    ARM: mach-shmobile: ap4eb: SCIF port for earlyprintk when using zboot
    ARM: mach-shmobile: mackerel: SCIF port for earlyprintk when using zboot
    ARM: mach-shmobile: mackerel: Add support get_cd in CN23

    Linus Torvalds
     
  • * 'sh-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (31 commits)
    sh: Add support for AP-SH4AD-0A board.
    sh: Add support for AP-SH4A-3A board.
    sh: Add a new mach type for alpha project boards.
    serial: sh-sci: build fixes.
    sh: sh7372 SH4AL-DSP probe support
    sh: sh7366 Enable SDIO IRQs
    sh: sh7343 Enable SDIO IRQs
    sh: mach-ecovec24: enable runtime PM for SDHI
    sh: sh7723 / ap325rxa enable SDIO IRQs
    sh: sh7722 Enable SDIO IRQs
    sh: sh7724 Enable SDIO IRQs
    sh: Fix up legacy PTEA space attribute mapping.
    sh: Stub out legacy PCC pgprot encoding for X2 TLBs.
    sh: constify prefetch pointers.
    sh: Add a machvec callback for early memblock reservations.
    sh: update sh7757lcr_defconfig
    sh: add PVR probing for SH7757 3rd cut
    sh: Use device_initcall() instead of __initcall()
    sh: intc - convert board specific landisk code
    sh: Move init_landisk_IRQ to header file
    ...

    Linus Torvalds