27 Jul, 2016

8 commits

  • We always have vma->vm_mm around.

    Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Page tables can bite a relatively big chunk off system memory and their
    allocations are easy to trigger from userspace, so they should be
    accounted to kmemcg.

    This patch marks page table allocations as __GFP_ACCOUNT for x86. Note
    we must not charge allocations of kernel page tables, because they can
    be shared among processes from different cgroups so accounting them to a
    particular one can pin other cgroups for indefinitely long. So we clear
    __GFP_ACCOUNT flag if a page table is allocated for the kernel.

    Link: http://lkml.kernel.org/r/7d5c54f6a2bcbe76f03171689440003d87e6c742.1464079538.git.vdavydov@virtuozzo.com
    Signed-off-by: Vladimir Davydov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Eric Dumazet
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • This allows an arch which needs to do special handing with respect to
    different page size when flushing tlb to implement the same in mmu
    gather.

    Link: http://lkml.kernel.org/r/1465049193-22197-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Michael Ellerman
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Vlastimil Babka
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • This updates the generic and arch specific implementation to return true
    if we need to do a tlb flush. That means if a __tlb_remove_page
    indicate a flush is needed, the page we try to remove need to be tracked
    and added again after the flush. We need to track it because we have
    already update the pte to none and we can't just loop back.

    This change is done to enable us to do a tlb_flush when we try to flush
    a range that consists of different page sizes. For architectures like
    ppc64, we can do a range based tlb flush and we need to track page size
    for that. When we try to remove a huge page, we will force a tlb flush
    and starts a new mmu gather.

    [aneesh.kumar@linux.vnet.ibm.com: mm-change-the-interface-for-__tlb_remove_page-v3]
    Link: http://lkml.kernel.org/r/1465049193-22197-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1464860389-29019-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Michael Ellerman
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Vlastimil Babka
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • We don't need to check this always. The idea here is to capture the
    wrong usage of find_linux_pte_or_hugepte and we can do that by
    occasionally running with DEBUG_VM enabled.

    Link: http://lkml.kernel.org/r/1464692688-6612-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Reviewed-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • We are having build failure with m32r and the error message being:

    ERROR: "__ucmpdi2" [lib/842/842_decompress.ko] undefined!
    ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined!
    ERROR: "__ucmpdi2" [drivers/scsi/sd_mod.ko] undefined!
    ERROR: "__ucmpdi2" [drivers/media/i2c/adv7842.ko] undefined!
    ERROR: "__ucmpdi2" [drivers/md/bcache/bcache.ko] undefined!
    ERROR: "__ucmpdi2" [drivers/iio/imu/inv_mpu6050/inv-mpu6050.ko] undefined!

    __ucmpdi2 is introduced to m32r architecture taking example from other
    architectures like h8300, microblaze, mips.

    Link: http://lkml.kernel.org/r/1465509213-4280-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sudip Mukherjee
     
  • Before, the stack protector flag was sanity checked before .config had
    been reprocessed. This meant the build couldn't be aborted early, and
    only a warning could be emitted followed later by the compiler blowing
    up with an unknown flag. This has caused a lot of confusion over time,
    so this splits the flag selection from sanity checking and performs the
    sanity checking after the make has been restarted from a reprocessed
    .config, so builds can be aborted as early as possible now.

    Additionally moves the x86-specific sanity check to the same location,
    since it suffered from the same warn-then-wait-for-compiler-failure
    problem.

    Link: http://lkml.kernel.org/r/20160712223043.GA11664@www.outflux.net
    Signed-off-by: Kees Cook
    Cc: Michal Marek
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • __GFP_REPEAT has a rather weak semantic but since it has been introduced
    around 2.6.12 it has been ignored for low order allocations.

    PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
    flag is for more than order-2. This means that this flag has never been
    actually useful here because it has always been used only for
    PAGE_ALLOC_COSTLY requests.

    Link: http://lkml.kernel.org/r/1464599699-30131-5-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

26 Jul, 2016

17 commits

  • Pull irq updates from Thomas Gleixner:
    "The irq department delivers:

    - new core infrastructure to allow better management of multi-queue
    devices (interrupt spreading, node aware descriptor allocation ...)

    - a new interrupt flow handler to support the new fangled Intel VMD
    devices.

    - yet another new interrupt controller driver.

    - a series of fixes which addresses sparse warnings, missing
    includes, missing static declarations etc from Ben Dooks.

    - a fix for the error handling in the hierarchical domain allocation
    code.

    - the usual pile of small updates to core and driver code"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    genirq: Fix missing irq allocation affinity hint
    irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
    irq/Documentation: Correct result of echnoing 5 to smp_affinity
    MAINTAINERS: Remove Jiang Liu from irq domains
    genirq/msi: Fix broken debug output
    genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
    genirq/msi: Make use of affinity aware allocations
    genirq: Use affinity hint in irqdesc allocation
    genirq: Add affinity hint to irq allocation
    genirq: Introduce IRQD_AFFINITY_MANAGED flag
    genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
    irqchip/s3c24xx: Fixup IO accessors for big endian
    irqchip/exynos-combiner: Fix usage of __raw IO
    irqdomain: Fix disposal of mappings for interrupt hierarchies
    irqchip/aspeed-vic: Add irq controller for Aspeed
    doc/devicetree: Add Aspeed VIC bindings
    x86/PCI/VMD: Use untracked irq handler
    genirq: Add untracked irq handler
    irqchip/mips-gic: Populate irq_domain names
    irqchip/gicv3-its: Implement two-level(indirect) device table support
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "This update provides the following changes:

    - The rework of the timer wheel which addresses the shortcomings of
    the current wheel (cascading, slow search for next expiring timer,
    etc). That's the first major change of the wheel in almost 20
    years since Finn implemted it.

    - A large overhaul of the clocksource drivers init functions to
    consolidate the Device Tree initialization

    - Some more Y2038 updates

    - A capability fix for timerfd

    - Yet another clock chip driver

    - The usual pile of updates, comment improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
    tick/nohz: Optimize nohz idle enter
    clockevents: Make clockevents_subsys static
    clocksource/drivers/time-armada-370-xp: Fix return value check
    timers: Implement optimization for same expiry time in mod_timer()
    timers: Split out index calculation
    timers: Only wake softirq if necessary
    timers: Forward the wheel clock whenever possible
    timers/nohz: Remove pointless tick_nohz_kick_tick() function
    timers: Optimize collect_expired_timers() for NOHZ
    timers: Move __run_timers() function
    timers: Remove set_timer_slack() leftovers
    timers: Switch to a non-cascading wheel
    timers: Reduce the CPU index space to 256k
    timers: Give a few structs and members proper names
    hlist: Add hlist_is_singular_node() helper
    signals: Use hrtimer for sigtimedwait()
    timers: Remove the deprecated mod_timer_pinned() API
    timers, net/ipv4/inet: Initialize connection request timers as pinned
    timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
    timers, drivers/tty/metag_da: Initialize the poll timer as pinned
    ...

    Linus Torvalds
     
  • Pull x86 fix from Ingo Molnar:
    "Leftover fix from the v4.7 cycle: adds a reboot quirk"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk

    Linus Torvalds
     
  • Pull x86 timer updates from Ingo Molnar:
    "The main change in this tree is the reworking, fixing and extension of
    the TSC frequency enumeration code (by Len Brown)"

    * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tsc: Remove the unused check_tsc_disabled()
    x86/tsc: Enumerate BXT tsc_khz via CPUID
    x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
    x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
    x86/tsc_msr: Add Airmont reference clock values
    x86/tsc_msr: Correct Silvermont reference clock values
    x86/tsc_msr: Update comments, expand definitions
    x86/tsc_msr: Remove debugging messages
    x86/tsc_msr: Identify Intel-specific code
    Revert "x86/tsc: Add missing Cherrytrail frequency to the table"

    Linus Torvalds
     
  • Pull x86 platform updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Intel-SoC enhancements (Andy Shevchenko)

    - Intel CPU symbolic model definition rework (Dave Hansen)

    - ... other misc changes"

    * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    x86/sfi: Enable enumeration of SD devices
    x86/pci: Use MRFLD abbreviation for Merrifield
    x86/platform/intel-mid: Make vertical indentation consistent
    x86/platform/intel-mid: Mark regulators explicitly defined
    x86/platform/intel-mid: Rename mrfl.c to mrfld.c
    x86/platform/intel-mid: Enable spidev on Intel Edison boards
    x86/platform/intel-mid: Extend PWRMU to support Penwell
    x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
    x86/platform/intel-mid: Add pinctrl for Intel Merrifield
    x86/platform/intel-mid: Enable GPIO expanders on Edison
    x86/platform/intel-mid: Add Power Management Unit driver
    x86/platform/atom/punit: Enable support for Merrifield
    x86/platform/intel_mid_pci: Rework IRQ0 workaround
    x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
    x86, mmc: Use Intel family name macros for mmc driver
    x86/intel_telemetry: Use Intel family name macros for telemetry driver
    x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
    x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
    x86/platform: Use new Intel model number macros
    x86/intel_idle: Use Intel family macros for intel_idle
    ...

    Linus Torvalds
     
  • Pull x86 fpu updates from Ingo Molnar:
    "The main x86 FPU changes in this cycle were:

    - a large series of cleanups, fixes and enhancements to re-enable the
    XSAVES instruction on Intel CPUs - which is the most advanced
    instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)

    - Add FPU tracepoints for the FPU state machine (Dave Hansen)"

    * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/fpu: Do not BUG_ON() in early FPU code
    x86/fpu/xstate: Re-enable XSAVES
    x86/fpu/xstate: Fix fpstate_init() for XRSTORS
    x86/fpu/xstate: Return NULL for disabled xstate component address
    x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
    x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
    x86/fpu/xstate: Fix XSTATE component offset print out
    x86/fpu/xstate: Fix PTRACE frames for XSAVES
    x86/fpu/xstate: Fix supervisor xstate component offset
    x86/fpu/xstate: Align xstate components according to CPUID
    x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
    x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
    x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
    x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
    x86/fpu: Add tracepoints to dump FPU state at key points

    Linus Torvalds
     
  • Pull x86 stackdump update from Ingo Molnar:
    "A number of stackdump enhancements"

    * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/dumpstack: Add show_stack_regs() and use it
    printk: Make the printk*once() variants return a value
    x86/dumpstack: Honor supplied @regs arg

    Linus Torvalds
     
  • Pull x86 cleanups from Ingo Molnar:
    "Three small cleanups"

    * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lguest: Read offset of device_cap later
    lguest: Read length of device_cap later
    x86: Do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB

    Linus Torvalds
     
  • Pull x86 build updates from Ingo Molnar:
    "A build system fix and a cleanup"

    * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kbuild: Remove stale asm-generic wrappers
    kbuild, x86: Track generated headers with generated-y

    Linus Torvalds
     
  • Pull x86 boot updates from Ingo Molnar:
    "The main changes:

    - add initial commits to randomize kernel memory section virtual
    addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
    (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

    - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
    Cook)

    - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

    - misc cleanups/fixes"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot: Simplify EBDA-vs-BIOS reservation logic
    x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
    x86/boot: Reorganize and clean up the BIOS area reservation code
    x86/mm: Do not reference phys addr beyond kernel
    x86/mm: Add memory hotplug support for KASLR memory randomization
    x86/mm: Enable KASLR for vmalloc memory regions
    x86/mm: Enable KASLR for physical mapping memory regions
    x86/mm: Implement ASLR for kernel memory regions
    x86/mm: Separate variable for trampoline PGD
    x86/mm: Add PUD VA support for physical mapping
    x86/mm: Update physical mapping variable names
    x86/mm: Refactor KASLR entropy functions
    x86/KASLR: Fix boot crash with certain memory configurations
    x86/boot/64: Add forgotten end of function marker
    x86/KASLR: Allow randomization below the load address
    x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
    x86/KASLR: Randomize virtual address separately
    x86/KASLR: Clarify identity map interface
    x86/boot: Refuse to build with data relocations
    x86/KASLR, x86/power: Remove x86 hibernation restrictions

    Linus Torvalds
     
  • Pull x86 mm updates from Ingo Molnar:
    "Various x86 low level modifications:

    - preparatory work to support virtually mapped kernel stacks (Andy
    Lutomirski)

    - support for 64-bit __get_user() on 32-bit kernels (Benjamin
    LaHaise)

    - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

    - MPX enhancements (Dave Hansen)

    - mremap() extension to allow remapping of the special VDSO vma, for
    purposes of user level context save/restore (Dmitry Safonov)

    - hweight and entry code cleanups (Borislav Petkov)

    - bitops code generation optimizations and cleanups with modern GCC
    (H. Peter Anvin)

    - syscall entry code optimizations (Paolo Bonzini)"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
    x86/mm/cpa: Add missing comment in populate_pdg()
    x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
    x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
    x86/smp: Remove unnecessary initialization of thread_info::cpu
    x86/smp: Remove stack_smp_processor_id()
    x86/uaccess: Move thread_info::addr_limit to thread_struct
    x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
    x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
    x86/dumpstack: When OOPSing, rewind the stack before do_exit()
    x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
    x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
    x86/dumpstack: Try harder to get a call trace on stack overflow
    x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
    x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
    x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
    x86/mm: Use pte_none() to test for empty PTE
    x86/mm: Disallow running with 32-bit PTEs to work around erratum
    x86/mm: Ignore A/D bits in pte/pmd/pud_none()
    x86/mm: Move swap offset/type up in PTE to work around erratum
    x86/entry: Inline enter_from_user_mode()
    ...

    Linus Torvalds
     
  • Pull x86/apic updates from Ingo Molnar:
    "Misc cleanups and a small fix"

    * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/apic: Remove the unused struct apic::apic_id_mask field
    x86/apic: Fix misspelled APIC
    x86/ioapic: Simplify ioapic_setup_resources()

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:

    - introduce and use task_rcu_dereference()/try_get_task_struct() to fix
    and generalize task_struct handling (Oleg Nesterov)

    - do various per entity load tracking (PELT) fixes and optimizations
    (Peter Zijlstra)

    - cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)

    - introduce consolidated cputime output file cpuacct.usage_all and
    related refactorings (Zhao Lei)

    - ... plus misc fixes and enhancements

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
    sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
    sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
    sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
    sched/fair: Rework throttle_count sync
    sched/core: Fix sched_getaffinity() return value kerneldoc comment
    sched/fair: Reorder cgroup creation code
    sched/fair: Apply more PELT fixes
    sched/fair: Fix PELT integrity for new tasks
    sched/cgroup: Fix cpu_cgroup_fork() handling
    sched/fair: Fix PELT integrity for new groups
    sched/fair: Fix and optimize the fork() path
    sched/cputime: Add steal time support to full dynticks CPU time accounting
    sched/cputime: Fix prev steal time accouting during CPU hotplug
    KVM: Fix steal clock warp during guest CPU hotplug
    sched/debug: Always show 'nr_migrations'
    sched/fair: Use task_rcu_dereference()
    sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
    sched/idle: Optimize the generic idle loop
    sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "With over 300 commits it's been a busy cycle - with most of the work
    concentrated on the tooling side (as it should).

    The main kernel side enhancements were:

    - Add per event callchain limit: Recently we introduced a sysctl to
    tune the max-stack for all events for which callchains were
    requested:

    $ sysctl kernel.perf_event_max_stack
    kernel.perf_event_max_stack = 127

    Now this patch introduces a way to configure this per event, i.e.
    this becomes possible:

    $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a

    allowing finer tuning of how much buffer space callchains use.

    This uses an u16 from the reserved space at the end, leaving
    another u16 for future use.

    There has been interest in even finer tuning, namely to control the
    max stack for kernel and userspace callchains separately. Further
    discussion is needed, we may for instance use the remaining u16 for
    that and when it is present, assume that the sample_max_stack
    introduced in this patch applies for the kernel, and the u16 left
    is used for limiting the userspace callchain (Arnaldo Carvalho de
    Melo)

    - Optimize AUX event (hardware assisted side-band event) delivery
    (Kan Liang)

    - Rework Intel family name macro usage (this is partially x86 arch
    work) (Dave Hansen)

    - Refine and fix Intel LBR support (David Carrillo-Cisneros)

    - Add support for Intel 'TopDown' events (Andi Kleen)

    - Intel uncore PMU driver fixes and enhancements (Kan Liang)

    - ... other misc changes.

    Here's an incomplete list of the tooling enhancements (but there's
    much more, see the shortlog and the git log for details):

    - Support cross unwinding, i.e. collecting '--call-graph dwarf'
    perf.data files in one machine and then doing analysis in another
    machine of a different hardware architecture. This enables, for
    instance, to do:

    $ perf record -a --call-graph dwarf

    on a x86-32 or aarch64 system and then do 'perf report' on it on a
    x86_64 workstation (He Kuang)

    - Allow reading from a backward ring buffer (one setup via
    sys_perf_event_open() with perf_event_attr.write_backward = 1)
    (Wang Nan)

    - Finish merging initial SDT (Statically Defined Traces) support, see
    cset comments for details about how it all works (Masami Hiramatsu)

    - Support attaching eBPF programs to tracepoints (Wang Nan)

    - Add demangling of symbols in programs written in the Rust language
    (David Tolnay)

    - Add support for tracepoints in the python binding, including an
    example, that sets up and parses sched:sched_switch events,
    tools/perf/python/tracepoint.py (Jiri Olsa)

    - Introduce --stdio-color to set up the color output mode selection
    in 'annotate' and 'report', allowing emit color escape sequences
    when redirecting the output of these tools (Arnaldo Carvalho de
    Melo)

    - Add 'callindent' option to 'perf script -F', to indent the Intel PT
    call stack, making this output more ftrace-like (Adrian Hunter,
    Andi Kleen)

    - Allow dumping the object files generated by llvm when processing
    eBPF scriptlet events (Wang Nan)

    - Add stackcollapse.py script to help generating flame graphs (Paolo
    Bonzini)

    - Add --ldlat option to 'perf mem' to specify load latency for loads
    event (e.g. cpu/mem-loads/ ) (Jiri Olsa)

    - Tooling support for Intel TopDown counters, recently added to the
    kernel (Andi Kleen)"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
    perf tests: Add is_printable_array test
    perf tools: Make is_printable_array global
    perf script python: Fix string vs byte array resolving
    perf probe: Warn unmatched function filter correctly
    perf cpu_map: Add more helpers
    perf stat: Balance opening and reading events
    tools: Copy linux/{hash,poison}.h and check for drift
    perf tools: Remove include/linux/list.h from perf's MANIFEST
    tools: Copy the bitops files accessed from the kernel and check for drift
    Remove: kernel unistd*h files from perf's MANIFEST, not used
    perf tools: Remove tools/perf/util/include/linux/const.h
    perf tools: Remove tools/perf/util/include/asm/byteorder.h
    perf tools: Add missing linux/compiler.h include to perf-sys.h
    perf jit: Remove some no-op error handling
    perf jit: Add missing curly braces
    objtool: Initialize variable to silence old compiler
    objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
    perf record: Add --tail-synthesize option
    perf session: Don't warn about out of order event if write_backward is used
    perf tools: Enable overwrite settings
    ...

    Linus Torvalds
     
  • Pull RAS updates from Ingo Molnar:
    "The biggest change in this cycle was an enhancement by Yazen Ghannam
    to reduce the number of MCE error injection related IPIs.

    The rest are smaller fixes"

    * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Fix mce_rdmsrl() warning message
    x86/RAS/AMD: Reduce the number of IPIs when prepping error injection
    x86/mce/AMD: Increase size of the bank_map type
    x86/mce: Do not use bank 1 for APEI generated error logs

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The locking tree was busier in this cycle than the usual pattern - a
    couple of major projects happened to coincide.

    The main changes are:

    - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
    across all SMP architectures (Peter Zijlstra)

    - add atomic_fetch_{inc/dec}() as well, using the generic primitives
    (Davidlohr Bueso)

    - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
    Waiman Long)

    - optimize smp_cond_load_acquire() on arm64 and implement LSE based
    atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    on arm64 (Will Deacon)

    - introduce smp_acquire__after_ctrl_dep() and fix various barrier
    mis-uses and bugs (Peter Zijlstra)

    - after discovering ancient spin_unlock_wait() barrier bugs in its
    implementation and usage, strengthen its semantics and update/fix
    usage sites (Peter Zijlstra)

    - optimize mutex_trylock() fastpath (Peter Zijlstra)

    - ... misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
    locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
    locking/static_keys: Fix non static symbol Sparse warning
    locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
    locking/atomic, arch/tile: Fix tilepro build
    locking/atomic, arch/m68k: Remove comment
    locking/atomic, arch/arc: Fix build
    locking/Documentation: Clarify limited control-dependency scope
    locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
    locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
    locking/atomic, arch/mips: Convert to _relaxed atomics
    locking/atomic, arch/alpha: Convert to _relaxed atomics
    locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
    locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
    locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    locking/atomic: Fix atomic64_relaxed() bits
    locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    ...

    Linus Torvalds
     
  • Pull EFI updates from Ingo Molnar:
    "The biggest change in this cycle were SGI/UV related changes that
    clean up and fix UV boot quirks and problems.

    There's also various smaller cleanups and refinements"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi: Reorganize the GUID table to make it easier to read
    x86/efi: Remove the unused efi_get_time() function
    x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
    x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
    efi: Convert efi_call_virt() to efi_call_virt_pointer()
    x86/efi: Remove unused variable 'efi'
    efi: Document #define FOO_PROTOCOL_GUID layout
    efibc: Report more information in the error messages

    Linus Torvalds
     

25 Jul, 2016

2 commits

  • Pull USB updates from Greg KH:
    "Here's the big USB driver update for 4.8-rc1. Lots of the normal
    stuff in here, musb, gadget, xhci, and other updates and fixes. All
    of the details are in the shortlog.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'usb-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (169 commits)
    cdc-acm: beautify probe()
    cdc-wdm: use the common CDC parser
    cdc-acm: cleanup error handling
    cdc-acm: use the common parser
    usbnet: move the CDC parser into USB core
    usb: musb: sunxi: Simplify dr_mode handling
    usb: musb: sunxi: make unexported symbols static
    usb: musb: cppi41: add dma channel tracepoints
    usb: musb: cppi41: move struct cppi41_dma_channel to header
    usb: musb: cleanup cppi_dma header
    usb: musb: gadget: add usb-request tracepoints
    usb: musb: host: add urb tracepoints
    usb: musb: add tracepoints to dump interrupt events
    usb: musb: add tracepoints for register access
    usb: musb: dsps: use musb register read/write wrappers instead
    usb: musb: switch dev_dbg to tracepoints
    usb: musb: add tracepoints support for debugging
    usb: quirks: Add no-lpm quirk for Elan
    phy: rcar-gen3-usb2: fix mutex_lock calling in interrupt
    phy: rockhip-usb: use devm_add_action_or_reset()
    ...

    Linus Torvalds
     
  • Pull tty/serial driver updates from Greg KH:
    "Here is the big tty and serial driver update for 4.8-rc1.

    Lots of good cleanups from Jiri on a number of vt and other tty
    related things, and the normal driver updates. Full details are in
    the shortlog.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'tty-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits)
    tty/serial: atmel: enforce tasklet init and termination sequences
    serial: sh-sci: Stop transfers in sci_shutdown()
    serial: 8250_ingenic: drop #if conditional surrounding earlycon code
    serial: 8250_mtk: drop !defined(MODULE) conditional
    serial: 8250_uniphier: drop !defined(MODULE) conditional
    earlycon: mark earlycon code as __used iif the caller is built-in
    tty/serial/8250: use mctrl_gpio helpers
    serial: mctrl_gpio: enable API usage only for initialized mctrl_gpios struct
    serial: mctrl_gpio: add modem control read routine
    tty/serial/8250: make UART_MCR register access consistent
    serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV
    serial: 8250_dma: Export serial8250_rx_dma_flush()
    dmaengine: hsu: Export hsu_dma_get_status()
    tty: serial: 8250: add CON_CONSDEV to flags
    tty: serial: samsung: add byte-order aware bit functions
    tty: serial: samsung: fixup accessors for endian
    serial: sirf: make fifo functions static
    serial: mps2-uart: make driver explicitly non-modular
    serial: mvebu-uart: free the IRQ in ->shutdown()
    serial/bcm63xx_uart: use correct alias naming
    ...

    Linus Torvalds
     

24 Jul, 2016

2 commits

  • In commit:

    21cbc2822aa1 ("x86/mm/cpa: Unbreak populate_pgd(): stop trying to deallocate failed PUDs")

    I intended to add this comment, but I failed at using git.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/242baf8612394f4e31216f96d13c4d2e9b90d1b7.1469293159.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • Valdis Kletnieks bisected a boot failure back to this recent commit:

    360cb4d15567 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")

    I broke the case where a PUD table got allocated -- populate_pud()
    would wander off a pgd_none entry and get lost. I'm not sure how
    this survived my testing.

    Fix the original issue in a much simpler way. The problem
    was that, if we allocated a PUD table, failed to populate it, and
    freed it, another CPU could potentially keep using the PGD entry we
    installed (either by copying it via vmalloc_fault or by speculatively
    caching it). There's a straightforward fix: simply leave the
    top-level entry in place if this happens. This can't waste any
    significant amount of memory -- there are at most 256 entries like
    this systemwide and, as a practical matter, if we hit this failure
    path repeatedly, we're likely to reuse the same page anyway.

    For context, this is a reversion with this hunk added in:

    if (ret < 0) {
    + /*
    + * Leave the PUD page in place in case some other CPU or thread
    + * already found it, but remove any useless entries we just
    + * added to it.
    + */
    - unmap_pgd_range(cpa->pgd, addr,
    + unmap_pud_range(pgd_entry, addr,
    addr + (cpa->numpages << PAGE_SHIFT));
    return ret;
    }

    This effectively open-codes what the now-deleted unmap_pgd_range()
    function used to do except that unmap_pgd_range() used to try to
    free the page as well.

    Reported-by: Valdis Kletnieks
    Signed-off-by: Andy Lutomirski
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Mike Krinkin
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

23 Jul, 2016

2 commits

  • Pull m68k upddates from Geert Uytterhoeven:
    - assorted spelling fixes
    - defconfig updates

    * tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
    m68k/defconfig: Update defconfigs for v4.7-rc2
    m68k: Assorted spelling fixes

    Linus Torvalds
     
  • Pull ARM SoC fixes from Olof Johansson:
    "A handful of fixes before final release:

    Marvell Armada:
    - One to fix a typo in the devicetree specifying memory ranges for
    the crypto engine
    - Two to deal with marking PCI and device-memory as strongly ordered
    to avoid hardware deadlocks, in particular when enabling above
    crypto driver.
    - Compile fix for PM

    Allwinner:
    - DT clock fixes to deal with u-boot-enabled framebuffer (simplefb).
    - Make R8 (C.H.I.P. SoC) inherit system compatibility from A13 to
    make clocks register proper.

    Tegra:
    - Fix SD card voltage setting on the Tegra3 Beaver dev board

    Misc:
    - Two maintainers updates for STM32 and STi platforms"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ARM: tegra: beaver: Allow SD card voltage to be changed
    MAINTAINERS: update STi maintainer list
    MAINTAINERS: update STM32 maintainers list
    ARM: mvebu: compile pm code conditionally
    ARM: dts: sun7i: Fix pll3x2 and pll7x2 not having a parent clock
    ARM: dts: sunxi: Add pll3 to simplefb nodes clocks lists
    ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
    ARM: mvebu: map PCI I/O regions strongly ordered
    ARM: mvebu: fix HW I/O coherency related deadlocks
    ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13

    Linus Torvalds
     

22 Jul, 2016

3 commits

  • Both the intent and the effect of reserve_bios_regions() is simple:
    reserve the range from the apparent BIOS start (suitably filtered)
    through 1MB and, if the EBDA start address is sensible, extend that
    reservation downward to cover the EBDA as well.

    The code is overcomplicated, though, and contains head-scratchers
    like:

    if (ebda_start < BIOS_START_MIN)
    ebda_start = BIOS_START_MAX;

    That snipped is trying to say "if ebda_start < BIOS_START_MIN,
    ignore it".

    Simplify it: reorder the code so that it makes sense. This should
    have no functional effect under any circumstances.

    Signed-off-by: Andy Lutomirski
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Mario Limonciello
    Cc: Matthew Garrett
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Link: http://lkml.kernel.org/r/ef89c0c761be20ead8bd9a3275743e6259b6092a.1469135598.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • It doesn't just control probing for the EBDA -- it controls whether we
    detect and reserve the
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Mario Limonciello
    Cc: Matthew Garrett
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • I don't think it is really possible to have a system where CPUID
    enumerates support for XSAVE but that it does not have FP/SSE
    (they are "legacy" features and always present).

    But, I did manage to hit this case in qemu when I enabled its
    somewhat shaky XSAVE support. The bummer is that the FPU is set
    up before we parse the command-line or have *any* console support
    including earlyprintk. That turned what should have been an easy
    thing to debug in to a bit more of an odyssey.

    So a BUG() here is worthless. All it does it guarantee that
    if/when we hit this case we have an empty console. So, remove
    the BUG() and try to limp along by disabling XSAVE and trying to
    continue. Add a comment on why we are doing this, and also add
    a common "out_disable" path for leaving fpu__init_system_xstate().

    Signed-off-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Quentin Casasnovas
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

21 Jul, 2016

1 commit

  • So the reserve_ebda_region() code has accumulated a number of
    problems over the years that make it really difficult to read
    and understand:

    - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
    interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...

    - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
    parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
    super confusing to read.

    - It does not help at all that we have various memory range markers, half of which
    are 'start of range', half of which are 'end of range' - but this crucial
    property is not obvious in the naming at all ... gave me a headache trying to
    understand all this.

    - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
    obvious, all values here are addresses!), while it does not highlight that it's
    the _start_ of the EBDA region ...

    - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
    that is a pointer to a value, not a memory range address!

    - The function name itself is a misnomer: it says 'reserve_ebda_region()' while
    its main purpose is to reserve all the firmware ROM typically between 640K and
    1MB, while the 'EBDA' part is only a small part of that ...

    - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
    too should be about whether to reserve firmware areas in the paravirt case.

    - In fact thinking about this as 'end of RAM' is confusing: what this function
    *really* wants to reserve is firmware data and code areas! Once the thinking is
    inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
    'reserved area' notion everything becomes a lot clearer.

    To improve all this rewrite the whole code (without changing the logic):

    - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
    and propagate this concept through all the variable names and constants.

    BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES

    BIOS_START_MIN // was: INSANE_CUTOFF

    ebda_start // was: ebda_addr
    bios_start // was: lowmem

    BIOS_START_MAX // was: LOWMEM_CAP

    - Then clean up the name of the function itself by renaming it
    to reserve_bios_regions() and renaming the ::ebda_search paravirt
    flag to ::reserve_bios_regions.

    - Fix up all the comments (fix typos), harmonize and simplify their
    formulation and remove comments that become unnecessary due to
    the much better naming all around.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 Jul, 2016

1 commit


15 Jul, 2016

4 commits

  • The only user verify_local_APIC() had been removed by commit:

    4399c03c6780 ("x86/apic: Remove verify_local_APIC()")

    ... so there is no need to keep it.

    Signed-off-by: Wei Jiangang
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: boris.ostrovsky@oracle.com
    Cc: bsd@redhat.com
    Cc: david.vrabel@citrix.com
    Cc: jgross@suse.com
    Cc: konrad.wilk@oracle.com
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1468463046-20849-1-git-send-email-weijg.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Wei Jiangang
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • check_tsc_disabled() was introduced by commit:

    c73deb6aecda ("perf/x86: Add ability to calculate TSC from perf sample timestamps")

    The only caller was arch_perf_update_userpage(), which had been refactored
    by commit:

    d8b11a0cbd1c ("perf/x86: Clean up cap_user_time* setting")

    ... so no need keep and export it any more.

    Signed-off-by: Wei Jiangang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: a.p.zijlstra@chello.nl
    Cc: adrian.hunter@intel.com
    Cc: bp@suse.de
    Link: http://lkml.kernel.org/r/1468570330-25810-1-git-send-email-weijg.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Wei Jiangang
     
  • Don't use the same syscall numbers for 2 different syscalls:

    534 x32 preadv compat_sys_preadv64
    535 x32 pwritev compat_sys_pwritev64
    534 x32 preadv2 compat_sys_preadv2
    535 x32 pwritev2 compat_sys_pwritev2

    Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
    is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
    and compat_sys_pwritev64().

    Signed-off-by: H.J. Lu
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Christoph Hellwig
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
    Signed-off-by: Ingo Molnar

    H.J. Lu