13 Feb, 2015

11 commits

  • If an attacker can cause a controlled kernel stack overflow, overwriting
    the restart block is a very juicy exploit target. This is because the
    restart_block is held in the same memory allocation as the kernel stack.

    Moving the restart block to struct task_struct prevents this exploit by
    making the restart_block harder to locate.

    Note that there are other fields in thread_info that are also easy
    targets, at least on some architectures.

    It's also a decent simplification, since the restart code is more or less
    identical on all architectures.

    [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
    Signed-off-by: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: David Miller
    Acked-by: Richard Weinberger
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Richard Kuo
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Jonas Bonn
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Chris Metcalf
    Cc: Guan Xuetao
    Cc: Chris Zankel
    Cc: Max Filippov
    Cc: Oleg Nesterov
    Cc: Guenter Roeck
    Signed-off-by: James Hogan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Remove the function search_one_table() that is not used anywhere.

    This was partially found by using a static code analysis program called
    cppcheck.

    Signed-off-by: Rickard Strandqvist
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rickard Strandqvist
     
  • Commit b38af4721f59 ("x86,mm: fix pte_special versus pte_numa") adjusted
    the pte_special check to take into account that a special pte had
    SPECIAL and neither PRESENT nor PROTNONE. Now that NUMA hinting PTEs
    are no longer modifying _PAGE_PRESENT it should be safe to restore the
    original pte_special behaviour.

    Signed-off-by: Mel Gorman
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This patch removes the NUMA PTE bits and associated helpers. As a
    side-effect it increases the maximum possible swap space on x86-64.

    One potential source of problems is races between the marking of PTEs
    PROT_NONE, NUMA hinting faults and migration. It must be guaranteed that
    a PTE being protected is not faulted in parallel, seen as a pte_none and
    corrupting memory. The base case is safe but transhuge has problems in
    the past due to an different migration mechanism and a dependance on page
    lock to serialise migrations and warrants a closer look.

    task_work hinting update parallel fault
    ------------------------ --------------
    change_pmd_range
    change_huge_pmd
    __pmd_trans_huge_lock
    pmdp_get_and_clear
    __handle_mm_fault
    pmd_none
    do_huge_pmd_anonymous_page
    read? pmd_lock blocks until hinting complete, fail !pmd_none test
    write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none
    pmd_modify
    set_pmd_at

    task_work hinting update parallel migration
    ------------------------ ------------------
    change_pmd_range
    change_huge_pmd
    __pmd_trans_huge_lock
    pmdp_get_and_clear
    __handle_mm_fault
    do_huge_pmd_numa_page
    migrate_misplaced_transhuge_page
    pmd_lock waits for updates to complete, recheck pmd_same
    pmd_modify
    set_pmd_at

    Both of those are safe and the case where a transhuge page is inserted
    during a protection update is unchanged. The case where two processes try
    migrating at the same time is unchanged by this series so should still be
    ok. I could not find a case where we are accidentally depending on the
    PTE not being cleared and flushed. If one is missed, it'll manifest as
    corruption problems that start triggering shortly after this series is
    merged and only happen when NUMA balancing is enabled.

    Signed-off-by: Mel Gorman
    Tested-by: Sasha Levin
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Mark Brown
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • With PROT_NONE, the traditional page table manipulation functions are
    sufficient.

    [andre.przywara@arm.com: fix compiler warning in pmdp_invalidate()]
    [akpm@linux-foundation.org: fix build with STRICT_MM_TYPECHECKS]
    Signed-off-by: Mel Gorman
    Acked-by: Linus Torvalds
    Acked-by: Aneesh Kumar
    Tested-by: Sasha Levin
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if
    they are triggered. This patch adds warnings just in case they are being
    accidentally depended upon.

    Signed-off-by: Mel Gorman
    Acked-by: Aneesh Kumar K.V
    Tested-by: Sasha Levin
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Convert existing users of pte_numa and friends to the new helper. Note
    that the kernel is broken after this patch is applied until the other page
    table modifiers are also altered. This patch layout is to make review
    easier.

    Signed-off-by: Mel Gorman
    Acked-by: Linus Torvalds
    Acked-by: Aneesh Kumar
    Acked-by: Benjamin Herrenschmidt
    Tested-by: Sasha Levin
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This is a preparatory patch that introduces protnone helpers for automatic
    NUMA balancing.

    Signed-off-by: Mel Gorman
    Acked-by: Linus Torvalds
    Acked-by: Aneesh Kumar K.V
    Tested-by: Sasha Levin
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull IOMMU updates from Joerg Roedel:
    "This time with:

    - Generic page-table framework for ARM IOMMUs using the LPAE
    page-table format, ARM-SMMU and Renesas IPMMU make use of it
    already.

    - Break out the IO virtual address allocator from the Intel IOMMU so
    that it can be used by other DMA-API implementations too. The
    first user will be the ARM64 common DMA-API implementation for
    IOMMUs

    - Device tree support for Renesas IPMMU

    - Various fixes and cleanups all over the place"

    * tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
    iommu/amd: Convert non-returned local variable to boolean when relevant
    iommu: Update my email address
    iommu/amd: Use wait_event in put_pasid_state_wait
    iommu/amd: Fix amd_iommu_free_device()
    iommu/arm-smmu: Avoid build warning
    iommu/fsl: Various cleanups
    iommu/fsl: Use %pa to print phys_addr_t
    iommu/omap: Print phys_addr_t using %pa
    iommu: Make more drivers depend on COMPILE_TEST
    iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered
    iommu: Disable on !MMU builds
    iommu/fsl: Remove unused fsl_of_pamu_ids[]
    iommu/fsl: Fix section mismatch
    iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator
    iommu: Fix trace_map() to report original iova and original size
    iommu/arm-smmu: add support for iova_to_phys through ATS1PR
    iopoll: Introduce memory-mapped IO polling macros
    iommu/arm-smmu: don't touch the secure STLBIALL register
    iommu/arm-smmu: make use of generic LPAE allocator
    iommu: io-pgtable-arm: add non-secure quirk
    ...

    Linus Torvalds
     
  • Pull ARM updates from Russell King:

    - clang assembly fixes from Ard

    - optimisations and cleanups for Aurora L2 cache support

    - efficient L2 cache support for secure monitor API on Exynos SoCs

    - debug menu cleanup from Daniel Thompson to allow better behaviour for
    multiplatform kernels

    - StrongARM SA11x0 conversion to irq domains, and pxa_timer

    - kprobes updates for older ARM CPUs

    - move probes support out of arch/arm/kernel to arch/arm/probes

    - add inline asm support for the rbit (reverse bits) instruction

    - provide an ARM mode secondary CPU entry point (for Qualcomm CPUs)

    - remove the unused ARMv3 user access code

    - add driver_override support to AMBA Primecell bus

    * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits)
    ARM: 8256/1: driver coamba: add device binding path 'driver_override'
    ARM: 8301/1: qcom: Use secondary_startup_arm()
    ARM: 8302/1: Add a secondary_startup that assumes ARM mode
    ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip
    ARM: kprobes: Fix compilation error caused by superfluous '*'
    ARM: 8297/1: cache-l2x0: optimize aurora range operations
    ARM: 8296/1: cache-l2x0: clean up aurora cache handling
    ARM: 8284/1: sa1100: clear RCSR_SMR on resume
    ARM: 8283/1: sa1100: collie: clear PWER register on machine init
    ARM: 8282/1: sa1100: use handle_domain_irq
    ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver
    ARM: 8280/1: sa1100: switch to irq_domain_add_simple()
    ARM: 8279/1: sa1100: merge both GPIO irqdomains
    ARM: 8278/1: sa1100: split irq handling for low GPIOs
    ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code
    ARM: 8290/1: decompressor: fix a wrong comment
    ARM: 8286/1: mm: Fix dma_contiguous_reserve comment
    ARM: 8248/1: pm: remove outdated comment
    ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X)
    ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX
    ...

    Linus Torvalds
     
  • Pull AVR32 update from Hans-Christian Egtvedt.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
    avr32: update all default configurations
    avr32: remove fake at91 cpu identification
    avr32: wire up missing syscalls

    Linus Torvalds
     

12 Feb, 2015

17 commits

  • Merge second set of updates from Andrew Morton:
    "More of MM"

    * emailed patches from Andrew Morton : (83 commits)
    mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    vmstat: Reduce time interval to stat update on idle cpu
    mm/page_owner.c: remove unnecessary stack_trace field
    Documentation/filesystems/proc.txt: describe /proc//map_files
    mm: incorporate read-only pages into transparent huge pages
    vmstat: do not use deferrable delayed work for vmstat_update
    mm: more aggressive page stealing for UNMOVABLE allocations
    mm: always steal split buddies in fallback allocations
    mm: when stealing freepages, also take pages created by splitting buddy page
    mincore: apply page table walker on do_mincore()
    mm: /proc/pid/clear_refs: avoid split_huge_page()
    mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
    mempolicy: apply page table walker on queue_pages_range()
    arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
    memcg: cleanup preparation for page table walk
    numa_maps: remove numa_maps->vma
    numa_maps: fix typo in gather_hugetbl_stats
    pagemap: use walk->vma instead of calling find_vma()
    clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
    ...

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:

    - Update of all defconfigs

    - Addition of a bunch of config options to modernise our defconfigs

    - Some PS3 updates from Geoff

    - Optimised memcmp for 64 bit from Anton

    - Fix for kprobes that allows 'perf probe' to work from Naveen

    - Several cxl updates from Ian & Ryan

    - Expanded support for the '24x7' PMU from Cody & Sukadev

    - Freescale updates from Scott:
    "Highlights include 8xx optimizations, some more work on datapath
    device tree content, e300 machine check support, t1040 corenet
    error reporting, and various cleanups and fixes"

    * tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
    cxl: Add missing return statement after handling AFU errror
    cxl: Fail AFU initialisation if an invalid configuration record is found
    cxl: Export optional AFU configuration record in sysfs
    powerpc/mm: Warn on flushing tlb page in kernel context
    powerpc/powernv: Add OPAL soft-poweroff routine
    powerpc/perf/hv-24x7: Document sysfs event description entries
    powerpc/perf/hv-gpci: add the remaining gpci requests
    powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
    powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
    perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
    perf: add PMU_EVENT_ATTR_STRING() helper
    perf: provide sysfs_show for struct perf_pmu_events_attr
    powerpc/kernel: Avoid initializing device-tree pointer twice
    powerpc: Remove old compile time disabled syscall tracing code
    powerpc/kernel: Make syscall_exit a local label
    cxl: Fix device_node reference counting
    powerpc/mm: bail out early when flushing TLB page
    powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
    perf/powerpc: reset event hw state when adding it to the PMU
    powerpc/qe: Use strlcpy()
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Catalin Marinas:
    "arm64 updates for 3.20:

    - reimplementation of the virtual remapping of UEFI Runtime Services
    in a way that is stable across kexec
    - emulation of the "setend" instruction for 32-bit tasks (user
    endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
    accordingly)
    - compat_sys_call_table implemented in C (from asm) and made it a
    constant array together with sys_call_table
    - export CPU cache information via /sys (like other architectures)
    - DMA API implementation clean-up in preparation for IOMMU support
    - macros clean-up for KVM
    - dropped some unnecessary cache+tlb maintenance
    - CONFIG_ARM64_CPU_SUSPEND clean-up
    - defconfig update (CPU_IDLE)

    The EFI changes going via the arm64 tree have been acked by Matt
    Fleming. There is also a patch adding sys_*stat64 prototypes to
    include/linux/syscalls.h, acked by Andrew Morton"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
    arm64: compat: Remove incorrect comment in compat_siginfo
    arm64: Fix section mismatch on alloc_init_p[mu]d()
    arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
    arm64: mm: use *_sect to check for section maps
    arm64: drop unnecessary cache+tlb maintenance
    arm64:mm: free the useless initial page table
    arm64: Enable CPU_IDLE in defconfig
    arm64: kernel: remove ARM64_CPU_SUSPEND config option
    arm64: make sys_call_table const
    arm64: Remove asm/syscalls.h
    arm64: Implement the compat_sys_call_table in C
    syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
    compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
    arm64: uapi: expose our struct ucontext to the uapi headers
    smp, ARM64: Kill SMP single function call interrupt
    arm64: Emulate SETEND for AArch32 tasks
    arm64: Consolidate hotplug notifier for instruction emulation
    arm64: Track system support for mixed endian EL0
    arm64: implement generic IOMMU configuration
    arm64: Combine coherent and non-coherent swiotlb dma_ops
    ...

    Linus Torvalds
     
  • Pull s390 updates from Martin Schwidefsky:

    - The remaining patches for the z13 machine support: kernel build
    option for z13, the cache synonym avoidance, SMT support,
    compare-and-delay for spinloops and the CES5S crypto adapater.

    - The ftrace support for function tracing with the gcc hotpatch option.
    This touches common code Makefiles, Steven is ok with the changes.

    - The hypfs file system gets an extension to access diagnose 0x0c data
    in user space for performance analysis for Linux running under z/VM.

    - The iucv hvc console gets wildcard spport for the user id filtering.

    - The cacheinfo code is converted to use the generic infrastructure.

    - Cleanup and bug fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
    s390/process: free vx save area when releasing tasks
    s390/hypfs: Eliminate hypfs interval
    s390/hypfs: Add diagnose 0c support
    s390/cacheinfo: don't use smp_processor_id() in preemptible context
    s390/zcrypt: fixed domain scanning problem (again)
    s390/smp: increase maximum value of NR_CPUS to 512
    s390/jump label: use different nop instruction
    s390/jump label: add sanity checks
    s390/mm: correct missing space when reporting user process faults
    s390/dasd: cleanup profiling
    s390/dasd: add locking for global_profile access
    s390/ftrace: hotpatch support for function tracing
    ftrace: let notrace function attribute disable hotpatching if necessary
    ftrace: allow architectures to specify ftrace compile options
    s390: reintroduce diag 44 calls for cpu_relax()
    s390/zcrypt: Add support for new crypto express (CEX5S) adapter.
    s390/zcrypt: Number of supported ap domains is not retrievable.
    s390/spinlock: add compare-and-delay to lock wait loops
    s390/tape: remove redundant if statement
    s390/hvc_iucv: add simple wildcard matches to the iucv allow filter
    ...

    Linus Torvalds
     
  • We don't have to use mm_walk->private to pass vma to the callback function
    because of mm_walk->vma. And walk_page_vma() is useful if we walk over a
    single vma.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Kirill A. Shutemov
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Cyrill Gorcunov
    Cc: Dave Hansen
    Cc: Pavel Emelyanov
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • This allows the get_user_pages_fast slow path to release the mmap_sem
    before blocking.

    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Kirill A. Shutemov
    Cc: Andres Lagar-Cavilla
    Cc: Peter Feiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
    *before* pgd_free(). And if an arch does pte/pmd allocation in
    pgd_alloc() and frees them in pgd_free() we see offset in counters by the
    time of the checks.

    We tried to workaround this by offsetting expected counter value according
    to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap(). But it
    doesn't work in some cases:

    1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
    upper addresses occupied with huge pmd entries, so the trick with
    offsetting expected counter value will get really ugly: we will have
    to apply it nr_pmds, but not nr_ptes.

    2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
    pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
    which is allocated at boot and shared accross all processes.

    The proposal is to move the check to check_mm() which happens *after*
    pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
    which would bring counters to zero if nothing leaked.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Tyler Baker
    Tested-by: Tyler Baker
    Tested-by: Nishanth Menon
    Cc: Russell King
    Cc: James Hogan
    Cc: Guan Xuetao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Dave noticed that unprivileged process can allocate significant amount of
    memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
    memory cgroup. The trick is to allocate a lot of PMD page tables. Linux
    kernel doesn't account PMD tables to the process, only PTE.

    The use-cases below use few tricks to allocate a lot of PMD page tables
    while keeping VmRSS and VmPTE low. oom_score for the process will be 0.

    #include
    #include
    #include
    #include
    #include
    #include

    #define PUD_SIZE (1UL << 30)
    #define PMD_SIZE (1UL << 21)

    #define NR_PUD 130000

    int main(void)
    {
    char *addr = NULL;
    unsigned long i;

    prctl(PR_SET_THP_DISABLE);
    for (i = 0; i < NR_PUD ; i++) {
    addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
    MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
    if (addr == MAP_FAILED) {
    perror("mmap");
    break;
    }
    *addr = 'x';
    munmap(addr, PMD_SIZE);
    mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
    MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
    if (addr == MAP_FAILED)
    perror("re-mmap"), exit(1);
    }
    printf("PID %d consumed %lu KiB in PMD page tables\n",
    getpid(), i * 4096 >> 10);
    return pause();
    }

    The patch addresses the issue by account PMD tables to the process the
    same way we account PTE.

    The main place where PMD tables is accounted is __pmd_alloc() and
    free_pmd_range(). But there're few corner cases:

    - HugeTLB can share PMD page tables. The patch handles by accounting
    the table to all processes who share it.

    - x86 PAE pre-allocates few PMD tables on fork.

    - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
    check on exit(2).

    Accounting only happens on configuration where PMD page table's level is
    present (PMD is not folded). As with nr_ptes we use per-mm counter. The
    counter value is used to calculate baseline for badness score by
    oom-killer.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Dave Hansen
    Cc: Hugh Dickins
    Reviewed-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: David Rientjes
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • ARM uses custom implementation of PMD folding in 2-level page table case.
    Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
    folded, but ARM doesn't do this. Let's fix it.

    Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc(). It
    also fixes problems with recently-introduced pmd accounting on ARM without
    LPAE.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Nishanth Menon
    Reported-by: Simon Horman
    Tested-by: Simon Horman
    Tested-by: Fabio Estevam
    Tested-by: Felipe Balbi
    Tested-by: Nishanth Menon
    Tested-by: Peter Ujfalusi
    Tested-by: Krzysztof Kozlowski
    Tested-by: Geert Uytterhoeven
    Cc: Dave Hansen
    Cc: Hugh Dickins
    Cc: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: David Rientjes
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • LKP has triggered a compiler warning after my recent patch "mm: account
    pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
    >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

    The code:

    > 2857 WARN_ON(mm_nr_pmds(mm) >
    2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

    In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has
    the same type -- int. PUD_SHIFT.

    I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
    long. On every arch for consistency.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Microblaze uses custom implementation of PMD folding, but doesn't define
    __PAGETABLE_PMD_FOLDED, which generic code expects to see. Let's fix it.

    Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc(). It
    also fixes problems with recently-introduced pmd accounting.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • 32-bit sparc uses swap instruction to implement set_pte(). It called
    using GCC inline assembler. But it misses the "memory" clobber to
    indicate that pte value will be updated in memory.

    As result GCC doesn't know that it cannot postpone pte pointer dereference
    which occurs before set_pte() to post-set_pte() time.

    It leads to real-world bugs -- [1]. In this situation we have code:

    ptent = ptep_modify_prot_start(mm, addr, pte);
    ptent = pte_modify(ptent, newprot);
    ...
    ptep_modify_prot_commit(mm, addr, pte, ptent);

    ptep_modify_prot_start() in sparc case is just 'pte' dereference plus
    pte_clear(). pte_clear() calls broken set_pte(). GCC thinks it's valid
    to dereference 'pte' again on pte_modify() and gets cleared pte.
    ptep_modify_prot_commit() puts 'pteent' with pfn==0 back to page table,
    which eventually leads to the crash.

    [1] http://lkml.kernel.org/r/54C06B19.8060305@roeck-us.net

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Cc: Paul Moore
    Cc: Joonsoo Kim
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Migrating hugepages and hwpoisoned hugepages are considered as non-present
    hugepages, and they are referenced via migration entries and hwpoison
    entries in their page table slots.

    This behavior causes race condition because pmd_huge() doesn't tell
    non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is
    one example where the kernel would call follow_page_pte() for such
    hugepage while this function is supposed to handle only normal pages.

    To avoid this, this patch makes pmd_huge() return true when pmd_none() is
    true *and* pmd_present() is false. We don't have to worry about mixing up
    non-present pmd entry with normal pmd (pointing to leaf level pte entry)
    because pmd_present() is true in normal pmd.

    The same race condition could happen in (x86-specific) gup_pmd_range(),
    where this patch simply adds pmd_present() check instead of pmd_huge().
    This is because gup_pmd_range() is fast path. If we have non-present
    hugepage in this function, we will go into gup_huge_pmd(), then return 0
    at flag mask check, and finally fall back to the slow path.

    Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
    Signed-off-by: Naoya Horiguchi
    Cc: Hugh Dickins
    Cc: James Hogan
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Luiz Capitulino
    Cc: Nishanth Aravamudan
    Cc: Lee Schermerhorn
    Cc: Steve Capper
    Cc: [2.6.36+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Currently we have many duplicates in definitions around
    follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
    patch tries to remove the m. The basic idea is to put the default
    implementation for these functions in mm/hugetlb.c as weak symbols
    (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
    arch-specific code only when the arch needs it.

    For follow_huge_addr(), only powerpc and ia64 have their own
    implementation, and in all other architectures this function just returns
    ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as
    default.

    As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
    always return 0 in your architecture (like in ia64 or sparc,) it's never
    called (the callsite is optimized away) no matter how implemented it is.
    So in such architectures, we don't need arch-specific implementation.

    In some architecture (like mips, s390 and tile,) their current
    arch-specific follow_huge_(pmd|pud)() are effectively identical with the
    common code, so this patch lets these architecture use the common code.

    One exception is metag, where pmd_huge() could return non-zero but it
    expects follow_huge_pmd() to always return NULL. This means that we need
    arch-specific implementation which returns NULL. This behavior looks
    strange to me (because non-zero pmd_huge() implies that the architecture
    supports PMD-based hugepage, so follow_huge_pmd() can/should return some
    relevant value,) but that's beyond this cleanup patch, so let's keep it.

    Justification of non-trivial changes:
    - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
    patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
    is true when follow_huge_pmd() can be called (note that pmd_huge() has
    the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
    - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
    code. This patch forces these archs use PMD_MASK, but it's OK because
    they are identical in both archs.
    In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
    In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
    PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
    PTE_ORDER is always 0, so these are identical.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Hugh Dickins
    Cc: James Hogan
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Luiz Capitulino
    Cc: Nishanth Aravamudan
    Cc: Lee Schermerhorn
    Cc: Steve Capper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Pull MMC updates from Ulf Hansson:
    "MMC core:
    - Support for MMC power sequences.
    - SDIO function devicetree subnode parsing.
    - Refactor the hardware reset routines and enable it for SD cards.
    - Various code quality improvements, especially for slot-gpio.

    MMC host:
    - dw_mmc: Various fixes and cleanups.
    - dw_mmc: Convert to mmc_send_tuning().
    - moxart: Fix probe logic.
    - sdhci: Various fixes and cleanups
    - sdhci: Asynchronous request handling support.
    - sdhci-pxav3: Various fixes and cleanups.
    - sdhci-tegra: Fixes for T114, T124 and T132.
    - rtsx: Various fixes and cleanups.
    - rtsx: Support for SDIO.
    - sdhi/tmio: Refactor and cleanup of header files.
    - omap_hsmmc: Use slot-gpio and common MMC DT parser.
    - Make all hosts to deal with errors from mmc_of_parse().
    - sunxi: Various fixes and cleanups.
    - sdhci: Support for Fujitsu SDHCI controller f_sdh30"

    * tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc: (117 commits)
    mmc: sdhci-s3c: solve problem with sleeping in atomic context
    mmc: pwrseq: add driver for emmc hardware reset
    mmc: moxart: fix probe logic
    mmc: core: Invoke mmc_pwrseq_post_power_on() prior MMC_POWER_ON state
    mmc: pwrseq_simple: Add optional reference clock support
    mmc: pwrseq: Document optional clock for the simple power sequence
    mmc: pwrseq_simple: Extend to support more pins
    mmc: pwrseq: Document that simple sequence support more than one GPIO
    mmc: Add hardware dependencies for sdhci-pxav3 and sdhci-pxav2
    mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes
    mmc: sdhci-pxav3: Extend binding with SDIO3 conf reg for the Armada 38x
    mmc: sdhci-pxav3: Fix Armada 38x controller's caps according to erratum ERR-7878951
    mmc: sdhci-pxav3: Fix SDR50 and DDR50 capabilities for the Armada 38x flavor
    mmc: sdhci: switch voltage before sdhci_set_ios in runtime resume
    mmc: tegra: Write xfer_mode, CMD regs in together
    mmc: Resolve BKOPS compatability issue
    mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
    mmc: dw_mmc: rockchip: remove incorrect __exit_p()
    mmc: dw_mmc: exynos: remove incorrect __exit_p()
    mmc: Fix menuconfig alignment of MMC_SDHCI_* options
    ...

    Linus Torvalds
     
  • Pull input updates from Dmitry Torokhov:
    "The first round of updates for the input subsystem.

    A few new drivers (power button handler for AXP20x PMIC, tps65218
    power button driver, sun4i keys driver, regulator haptic driver, NI
    Ettus Research USRP E3x0 button, Alwinner A10/A20 PS/2 controller).

    Updates to Synaptics and ALPS touchpad drivers (with more to come
    later), brand new Focaltech PS/2 support, update to Cypress driver to
    handle Gen5 (in addition to Gen3) devices, and number of other fixups
    to various drivers as well as input core"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
    Input: elan_i2c - fix wrong %p extension
    Input: evdev - do not queue SYN_DROPPED if queue is empty
    Input: gscps2 - fix MODULE_DEVICE_TABLE invocation
    Input: synaptics - use dmax in input_mt_assign_slots
    Input: pxa27x_keypad - remove unnecessary ARM includes
    Input: ti_am335x_tsc - replace delta filtering with median filtering
    ARM: dts: AM335x: Make charge delay a DT parameter for TSC
    Input: ti_am335x_tsc - read charge delay from DT
    Input: ti_am335x_tsc - remove udelay in interrupt handler
    Input: ti_am335x_tsc - interchange touchscreen and ADC steps
    Input: MT - add support for balanced slot assignment
    Input: drv2667 - remove wrong and unneeded drv2667-haptics modalias
    Input: drv260x - remove wrong and unneeded drv260x-haptics modalias
    Input: cap11xx - remove wrong and unneeded cap11xx modalias
    Input: sun4i-ts - add support for touchpanel controller on A31
    Input: serio - add support for Alwinner A10/A20 PS/2 controller
    Input: gtco - use sign_extend32() for sign extension
    Input: elan_i2c - verify firmware signature applying it
    Input: elantech - remove stale comment from Kconfig
    Input: cyapa - off by one in cyapa_update_fw_store()
    ...

    Linus Torvalds
     
  • Pull sound updates from Takashi Iwai:
    "In this batch, you can find lots of cleanups through the whole
    subsystem, as our good New Year's resolution. Lots of LOCs and
    commits are about LINE6 driver that was promoted finally from staging
    tree, and as usual, there've been widely spread ASoC changes.

    Here some highlights:

    ALSA core changes
    - Embedding struct device into ALSA core structures
    - sequencer core cleanups / fixes
    - PCM msbits constraints cleanups / fixes
    - New SNDRV_PCM_TRIGGER_DRAIN command
    - PCM kerneldoc fixes, header cleanups
    - PCM code cleanups using more standard codes
    - Control notification ID fixes

    Driver cleanups
    - Cleanups of PCI PM callbacks
    - Timer helper usages cleanups
    - Simplification (e.g. argument reduction) of many driver codes

    HD-audio
    - Hotkey and LED support on HP laptops with Realtek codecs
    - Dock station support on HP laptops
    - Toshiba Satellite S50D fixup
    - Enhanced wallclock timestamp handling for HD-audio
    - Componentization to simplify the linkage between i915 and hd-audio
    drivers for Intel HDMI/DP

    USB-audio
    - Akai MPC Element support
    - Enhanced timestamp handling

    ASoC
    - Lots of refactoringin ASoC core, moving drivers to more data driven
    initialization and rationalizing a lot of DAPM usage
    - Much improved handling of CDCLK clocks on Samsung I2S controllers
    - Lots of driver specific cleanups and feature improvements
    - CODEC support for TI PCM514x and TLV320AIC3104 devices
    - Board support for Tegra systems with Realtek RT5677
    - New driver for Maxim max98357a
    - More enhancements / fixes for Intel SST driver

    Others
    - Promotion of LINE6 driver from staging along with lots of rewrites
    and cleanups
    - DT support for old non-ASoC atmel driver
    - oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
    - Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
    - A few more ak411x fixes for ice1724 boards"

    * tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (542 commits)
    ALSA: line6: toneport: Use explicit type for firmware version
    ALSA: line6: Use explicit type for serial number
    ALSA: line6: Return EIO if read/write not successful
    ALSA: line6: Return error if device not responding
    ALSA: line6: Add delay before reading status
    ASoC: Intel: Clean data after SST fw fetch
    ALSA: hda - Add docking station support for another HP machine
    ALSA: control: fix failure to return new numerical ID in 'replace' event data
    ALSA: usb: update trigger timestamp on first non-zero URB submitted
    ALSA: hda: read trigger_timestamp immediately after starting DMA
    ALSA: pcm: allow for trigger_tstamp snapshot in .trigger
    ALSA: pcm: don't override timestamp unconditionally
    ALSA: off by one bug in snd_riptide_joystick_probe()
    ASoC: rt5670: Set use_single_rw flag for regmap
    ASoC: rt286: Add rt288 codec support
    ASoC: max98357a: Fix build in !CONFIG_OF case
    ASoC: Intel: fix platform_no_drv_owner.cocci warnings
    ARM: dts: Switch Odroid X2/U2 to simple-audio-card
    ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes update
    ALSA: control: fix failure to return numerical ID in 'add' event
    ...

    Linus Torvalds
     

11 Feb, 2015

12 commits

  • Pull networking updates from David Miller:

    1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
    wrong, and this pull actually adds an extra commit on top of the
    branch I'm pulling to fix that up, so that the pre-merge state is
    ok. - Linus ]

    2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation. From Alexander Duyck.

    3) Remove sock_iocb altogether, from CHristoph Hellwig.

    4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

    5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

    6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

    7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
    Florian Westphal.

    8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

    9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

    10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

    11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

    12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms. From Neal Cardwell.

    13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

    14) Support xmit_more in be2net, from Sathya Perla.

    15) Group Policy extensions for vxlan, from Thomas Graf.

    16) Remove Checksum Offload support for vxlan, from Tom Herbert.

    17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
    Vlad Yasevich.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
    crypto: fix af_alg_make_sg() conversion to iov_iter
    ipv4: Namespecify TCP PMTU mechanism
    i40e: Fix for stats init function call in Rx setup
    tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
    openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
    ipv6: Make __ipv6_select_ident static
    ipv6: Fix fragment id assignment on LE arches.
    bridge: Fix inability to add non-vlan fdb entry
    net: Mellanox: Delete unnecessary checks before the function call "vunmap"
    cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
    ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
    net: dsa: Remove redundant phy_attach()
    IB/mlx4: Reset flow support for IB kernel ULPs
    IB/mlx4: Always use the correct port for mirrored multicast attachments
    net/bonding: Fix potential bad memory access during bonding events
    tipc: remove tipc_snprintf
    tipc: nl compat add noop and remove legacy nl framework
    tipc: convert legacy nl stats show to nl compat
    tipc: convert legacy nl net id get to nl compat
    tipc: convert legacy nl net id set to nl compat
    ...

    Linus Torvalds
     
  • Pull trivial tree changes from Jiri Kosina:
    "Patches from trivial.git that keep the world turning around.

    Mostly documentation and comment fixes, and a two corner-case code
    fixes from Alan Cox"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    kexec, Kconfig: spell "architecture" properly
    mm: fix cleancache debugfs directory path
    blackfin: mach-common: ints-priority: remove unused function
    doubletalk: probe failure causes OOPS
    ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
    msdos_fs.h: fix 'fields' in comment
    scsi: aic7xxx: fix comment
    ARM: l2c: fix comment
    ibmraid: fix writeable attribute with no store method
    dynamic_debug: fix comment
    doc: usbmon: fix spelling s/unpriviledged/unprivileged/
    x86: init_mem_mapping(): use capital BIOS in comment

    Linus Torvalds
     
  • Pull live patching infrastructure from Jiri Kosina:
    "Let me provide a bit of history first, before describing what is in
    this pile.

    Originally, there was kSplice as a standalone project that implemented
    stop_machine()-based patching for the linux kernel. This project got
    later acquired, and the current owner is providing live patching as a
    proprietary service, without any intentions to have their
    implementation merged.

    Then, due to rising user/customer demand, both Red Hat and SUSE
    started working on their own implementation (not knowing about each
    other), and announced first versions roughly at the same time [1] [2].

    The principle difference between the two solutions is how they are
    making sure that the patching is performed in a consistent way when it
    comes to different execution threads with respect to the semantic
    nature of the change that is being introduced.

    In a nutshell, kPatch is issuing stop_machine(), then looking at
    stacks of all existing processess, and if it decides that the system
    is in a state that can be patched safely, it proceeds insterting code
    redirection machinery to the patched functions.

    On the other hand, kGraft provides a per-thread consistency during one
    single pass of a process through the kernel and performs a lazy
    contignuous migration of threads from "unpatched" universe to the
    "patched" one at safe checkpoints.

    If interested in a more detailed discussion about the consistency
    models and its possible combinations, please see the thread that
    evolved around [3].

    It pretty quickly became obvious to the interested parties that it's
    absolutely impractical in this case to have several isolated solutions
    for one task to co-exist in the kernel. During a dedicated Live
    Kernel Patching track at LPC in Dusseldorf, all the interested parties
    sat together and came up with a joint aproach that would work for both
    distro vendors. Steven Rostedt took notes [4] from this meeting.

    And the foundation for that aproach is what's present in this pull
    request.

    It provides a basic infrastructure for function "live patching" (i.e.
    code redirection), including API for kernel modules containing the
    actual patches, and API/ABI for userspace to be able to operate on the
    patches (look up what patches are applied, enable/disable them, etc).

    It's relatively simple and minimalistic, as it's making use of
    existing kernel infrastructure (namely ftrace) as much as possible.
    It's also self-contained, in a sense that it doesn't hook itself in
    any other kernel subsystem (it doesn't even touch any other code).
    It's now implemented for x86 only as a reference architecture, but
    support for powerpc, s390 and arm is already in the works (adding
    arch-specific support basically boils down to teaching ftrace about
    regs-saving).

    Once this common infrastructure gets merged, both Red Hat and SUSE
    have agreed to immediately start porting their current solutions on
    top of this, abandoning their out-of-tree code. The plan basically is
    that each patch will be marked by flag(s) that would indicate which
    consistency model it is willing to use (again, the details have been
    sketched out already in the thread at [3]).

    Before this happens, the current codebase can be used to patch a large
    group of secruity/stability problems the patches for which are not too
    complex (in a sense that they don't introduce non-trivial change of
    function's return value semantics, they don't change layout of data
    structures, etc) -- this corresponds to LEAVE_FUNCTION &&
    SWITCH_FUNCTION semantics described at [3].

    This tree has been in linux-next since December.

    [1] https://lkml.org/lkml/2014/4/30/477
    [2] https://lkml.org/lkml/2014/7/14/857
    [3] https://lkml.org/lkml/2014/11/7/354
    [4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt

    [ The core code is introduced by the three commits authored by Seth
    Jennings, which got a lot of changes incorporated during numerous
    respins and reviews of the initial implementation. All the followup
    commits have materialized only after public tree has been created,
    so they were not folded into initial three commits so that the
    public tree doesn't get rebased ]"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: add missing newline to error message
    livepatch: rename config to CONFIG_LIVEPATCH
    livepatch: fix uninitialized return value
    livepatch: support for repatching a function
    livepatch: enforce patch stacking semantics
    livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING
    livepatch: fix deferred module patching order
    livepatch: handle ancient compilers with more grace
    livepatch: kconfig: use bool instead of boolean
    livepatch: samples: fix usage example comments
    livepatch: MAINTAINERS: add git tree location
    livepatch: use FTRACE_OPS_FL_IPMODIFY
    livepatch: move x86 specific ftrace handler code to arch/x86
    livepatch: samples: add sample live patching module
    livepatch: kernel: add support for live patching
    livepatch: kernel: add TAINT_LIVEPATCH

    Linus Torvalds
     
  • Merge misc updates from Andrew Morton:
    "Bite-sized chunks this time, to avoid the MTA ratelimiting woes.

    - fs/notify updates

    - ocfs2

    - some of MM"

    That laconic "some MM" is mainly the removal of remap_file_pages(),
    which is a big simplification of the VM, and which gets rid of a *lot*
    of random cruft and special cases because we no longer support the
    non-linear mappings that it used.

    From a user interface perspective, nothing has changed, because the
    remap_file_pages() syscall still exists, it's just done by emulating the
    old behavior by creating a lot of individual small mappings instead of
    one non-linear one.

    The emulation is slower than the old "native" non-linear mappings, but
    nobody really uses or cares about remap_file_pages(), and simplifying
    the VM is a big advantage.

    * emailed patches from Andrew Morton : (78 commits)
    memcg: zap memcg_slab_caches and memcg_slab_mutex
    memcg: zap memcg_name argument of memcg_create_kmem_cache
    memcg: zap __memcg_{charge,uncharge}_slab
    mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
    mm: hugetlb: fix type of hugetlb_treat_as_movable variable
    mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
    mm: memory: merge shared-writable dirtying branches in do_wp_page()
    mm: memory: remove ->vm_file check on shared writable vmas
    xtensa: drop _PAGE_FILE and pte_file()-related helpers
    x86: drop _PAGE_FILE and pte_file()-related helpers
    unicore32: drop pte_file()-related helpers
    um: drop _PAGE_FILE and pte_file()-related helpers
    tile: drop pte_file()-related helpers
    sparc: drop pte_file()-related helpers
    sh: drop _PAGE_FILE and pte_file()-related helpers
    score: drop _PAGE_FILE and pte_file()-related helpers
    s390: drop pte_file()-related helpers
    parisc: drop _PAGE_FILE and pte_file()-related helpers
    openrisc: drop _PAGE_FILE and pte_file()-related helpers
    nios2: drop _PAGE_FILE and pte_file()-related helpers
    ...

    Linus Torvalds
     
  • Pull ACPI and power management updates from Rafael Wysocki:
    "We have a few new features this time, including a new SFI-based
    cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new
    devfreq class for providing its governors with raw utilization data
    and a new ACPI driver for AMD SoCs.

    Still, the majority of changes here are reworks of existing code to
    make it more straightforward or to prepare it for implementing new
    features on top of it. The primary example is the rework of ACPI
    resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with
    support for IOAPIC hotplug implemented on top of it, but there is
    quite a number of changes of this kind in the cpufreq core, ACPICA,
    ACPI EC driver, ACPI processor driver and the generic power domains
    core code too.

    The most active developer is Viresh Kumar with his cpufreq changes.

    Specifics:

    - Rework of the core ACPI resources parsing code to fix issues in it
    and make using resource offsets more convenient and consolidation
    of some resource-handing code in a couple of places that have grown
    analagous data structures and code to cover the the same gap in the
    core (Jiang Liu, Thomas Gleixner, Lv Zheng).

    - ACPI-based IOAPIC hotplug support on top of the resources handling
    rework (Jiang Liu, Yinghai Lu).

    - ACPICA update to upstream release 20150204 including an interrupt
    handling rework that allows drivers to install raw handlers for
    ACPI GPEs which then become entirely responsible for the given GPE
    and the ACPICA core code won't touch it (Lv Zheng, David E Box,
    Octavian Purdila).

    - ACPI EC driver rework to fix several concurrency issues and other
    problems related to events handling on top of the ACPICA's new
    support for raw GPE handlers (Lv Zheng).

    - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power
    Subsystem) driver for Intel chips (Ken Xue).

    - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko
    Nikula).

    - Two new blacklist entries for machines (Samsung 730U3E/740U3E and
    510R) where the native backlight interface doesn't work correctly
    while the ACPI one does (Hans de Goede).

    - Rework of the ACPI processor driver's handling of idle states to
    make the code more straightforward and less bloated overall (Rafael
    J Wysocki).

    - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht,
    Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei
    Bai).

    - PCI core power management modification to avoid resuming (some)
    runtime-suspended devices during system suspend if they are in the
    right states already (Rafael J Wysocki).

    - New SFI-based cpufreq driver for Intel platforms using SFI
    (Srinidhi Kasagar).

    - cpufreq core fixes, cleanups and simplifications (Viresh Kumar,
    Doug Anderson, Wolfram Sang).

    - SkyLake CPU support and other updates for the intel_pstate driver
    (Kristen Carlson Accardi, Srinivas Pandruvada).

    - cpufreq-dt driver cleanup (Markus Elfring).

    - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla).

    - Generic power domains core code fixes and cleanups (Ulf Hansson).

    - Operating Performance Points (OPP) core code cleanups and kernel
    documentation update (Nishanth Menon).

    - New dabugfs interface to make the list of PM QoS constraints
    available to user space (Nishanth Menon).

    - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso).

    - New devfreq class (devfreq_event) to provide raw utilization data
    to devfreq governors (Chanwoo Choi).

    - Assorted minor fixes and cleanups related to power management
    (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel
    Machek, Todd E Brandt, Wonhong Kwon).

    - turbostat updates (Len Brown) and cpupower Makefile improvement
    (Sriram Raghunathan)"

    * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits)
    tools/power turbostat: relax dependency on APERF_MSR
    tools/power turbostat: relax dependency on invariant TSC
    Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
    tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
    tools/power turbostat: relax dependency on root permission
    ACPI / video: Add disable_native_backlight quirk for Samsung 510R
    ACPI / PM: Remove unneeded nested #ifdef
    USB / PM: Remove unneeded #ifdef and associated dead code
    intel_pstate: provide option to only use intel_pstate with HWP
    ACPI / EC: Add GPE reference counting debugging messages
    ACPI / EC: Add query flushing support
    ACPI / EC: Refine command storm prevention support
    ACPI / EC: Add command flushing support.
    ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag
    ACPI: add AMD ACPI2Platform device support for x86 system
    ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
    ACPI / EC: Update revision due to raw handler mode.
    ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.
    ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode.
    ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model
    ...

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    "Enumeration
    - Move domain assignment from arm64 to generic code (Lorenzo Pieralisi)
    - ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi)
    - ARM: Move to generic PCI domains (Lorenzo Pieralisi)
    - Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado)
    - Add and use generic config accessors on ARM, PowerPC (Rob Herring)

    Resource management
    - Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi)
    - Fix infinite loop with ROM image of size 0 (Michel Dänzer)

    PCI device hotplug
    - Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas)

    Virtualization
    - Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson)
    - Add DMA alias quirk for Adaptec 3405 (Alex Williamson)
    - Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson)
    - Add ACS quirk for Emulex NICs (Vasundhara Volam)

    MSI
    - Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang)

    Freescale Layerscape host bridge driver
    - Fix platform_no_drv_owner.cocci warnings (Julia Lawall)

    NVIDIA Tegra host bridge driver
    - Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach)

    Renesas R-Car host bridge driver
    - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)

    TI Keystone host bridge driver
    - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
    - Fix misspelling of current function in debug output (Julia Lawall)

    Xilinx AXI host bridge driver
    - Fix harmless format string warning (Arnd Bergmann)

    Miscellaneous
    - Use standard parsing functions for ASPM sysfs setters (Chris J Arges)
    - Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao)
    - Delete unnecessary NULL pointer checks (Markus Elfring)
    - Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki)
    - Include clk.h instead of clk-private.h (Stephen Boyd)"

    * tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
    PCI: Add pci_device_to_OF_node() stub for !CONFIG_OF
    PCI: xilinx: Convert to use generic config accessors
    PCI: xgene: Convert to use generic config accessors
    PCI: tegra: Convert to use generic config accessors
    PCI: rcar: Convert to use generic config accessors
    PCI: generic: Convert to use generic config accessors
    powerpc/powermac: Convert PCI to use generic config accessors
    powerpc/fsl_pci: Convert PCI to use generic config accessors
    ARM: ks8695: Convert PCI to use generic config accessors
    ARM: sa1100: Convert PCI to use generic config accessors
    ARM: integrator: Convert PCI to use generic config accessors
    PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver
    ARM: dts: versatile: add PCI controller binding
    of/pci: Free resources on failure in of_pci_get_host_bridge_resources()
    PCI: versatile: Add DT docs for ARM Versatile PB PCIe driver
    PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR
    r8169: use PCI define for Max_Read_Request_Size
    [SCSI] esas2r: use PCI define for Max_Read_Request_Size
    tile: use PCI define for Max_Read_Request_Size
    rapidio/tsi721: use PCI define for Max_Read_Request_Size
    ...

    Linus Torvalds
     
  • We've replaced remap_file_pages(2) implementation with emulation. Nobody
    creates non-linear mapping anymore.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Max Filippov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We've replaced remap_file_pages(2) implementation with emulation. Nobody
    creates non-linear mapping anymore.

    Signed-off-by: Kirill A. Shutemov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We've replaced remap_file_pages(2) implementation with emulation. Nobody
    creates non-linear mapping anymore.

    Signed-off-by: Kirill A. Shutemov
    Cc: Guan Xuetao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We've replaced remap_file_pages(2) implementation with emulation. Nobody
    creates non-linear mapping anymore.

    Signed-off-by: Kirill A. Shutemov
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We've replaced remap_file_pages(2) implementation with emulation. Nobody
    creates non-linear mapping anymore.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We've replaced remap_file_pages(2) implementation with emulation. Nobody
    creates non-linear mapping anymore.

    This patch also increase number of bits availble for swap offset.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov