27 Nov, 2018

1 commit

  • commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream

    On 5-level paging the LDT remap area is placed in the middle of the KASLR
    randomization region and it can overlap with the direct mapping, the
    vmalloc or the vmap area.

    The LDT mapping is per mm, so it cannot be moved into the P4D page table
    next to the CPU_ENTRY_AREA without complicating PGD table allocation for
    5-level paging.

    The 4 PGD slot gap just before the direct mapping is reserved for
    hypervisors, so it cannot be used.

    Move the direct mapping one slot deeper and use the resulting gap for the
    LDT remap area. The resulting layout is the same for 4 and 5 level paging.

    [ tglx: Massaged changelog ]

    Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Andy Lutomirski
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: dave.hansen@linux.intel.com
    Cc: peterz@infradead.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: bhe@redhat.com
    Cc: willy@infradead.org
    Cc: linux-mm@kvack.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Sasha Levin

    Kirill A. Shutemov
     

10 Sep, 2018

1 commit

  • Fix a few issues in Documentation/x86/earlyprintk.txt:

    - correct typos, punctuation, missing word, wrong word
    - change product name from Netchip to NetChip
    - expand where to add "earlyprintk=dbg"

    Signed-off-by: Randy Dunlap
    Cc: Eric W. Biederman
    Cc: Jason Wessel
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Cc: linux-doc@vger.kernel.org
    Cc: linux-usb@vger.kernel.org
    Link: http://lkml.kernel.org/r/d0c40ac3-7659-6374-dbda-23d3d2577f30@infradead.org
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

14 Aug, 2018

2 commits

  • Pull x86 timer updates from Thomas Gleixner:
    "Early TSC based time stamping to allow better boot time analysis.

    This comes with a general cleanup of the TSC calibration code which
    grew warts and duct taping over the years and removes 250 lines of
    code. Initiated and mostly implemented by Pavel with help from various
    folks"

    * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    x86/kvmclock: Mark kvm_get_preset_lpj() as __init
    x86/tsc: Consolidate init code
    sched/clock: Disable interrupts when calling generic_sched_clock_init()
    timekeeping: Prevent false warning when persistent clock is not available
    sched/clock: Close a hole in sched_clock_init()
    x86/tsc: Make use of tsc_calibrate_cpu_early()
    x86/tsc: Split native_calibrate_cpu() into early and late parts
    sched/clock: Use static key for sched_clock_running
    sched/clock: Enable sched clock early
    sched/clock: Move sched clock initialization and merge with generic clock
    x86/tsc: Use TSC as sched clock early
    x86/tsc: Initialize cyc2ns when tsc frequency is determined
    x86/tsc: Calibrate tsc only once
    ARM/time: Remove read_boot_clock64()
    s390/time: Remove read_boot_clock64()
    timekeeping: Default boot time offset to local_clock()
    timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
    s390/time: Add read_persistent_wall_and_boot_offset()
    x86/xen/time: Output xen sched_clock time from 0
    x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
    ...

    Linus Torvalds
     
  • Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner:
    "Add support for pseudo-locked cache regions.

    Cache Allocation Technology (CAT) allows on certain CPUs to isolate a
    region of cache and 'lock' it. Cache pseudo-locking builds on the fact
    that a CPU can still read and write data pre-allocated outside its
    current allocated area on cache hit. With cache pseudo-locking data
    can be preloaded into a reserved portion of cache that no application
    can fill, and from that point on will only serve cache hits. The cache
    pseudo-locked memory is made accessible to user space where an
    application can map it into its virtual address space and thus have a
    region of memory with reduced average read latency.

    The locking is not perfect and gets totally screwed by WBINDV and
    similar mechanisms, but it provides a reasonable enhancement for
    certain types of latency sensitive applications.

    The implementation extends the current CAT mechanism and provides a
    generally useful exclusive CAT mode on which it builds the extra
    pseude-locked regions"

    * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    x86/intel_rdt: Disable PMU access
    x86/intel_rdt: Fix possible circular lock dependency
    x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
    x86/intel_rdt: Support restoration of subset of permissions
    x86/intel_rdt: Fix cleanup of plr structure on error
    x86/intel_rdt: Move pseudo_lock_region_clear()
    x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
    x86/intel_rdt: Support L3 cache performance event of Broadwell
    x86/intel_rdt: More precise L2 hit/miss measurements
    x86/intel_rdt: Create character device exposing pseudo-locked region
    x86/intel_rdt: Create debugfs files for pseudo-locking testing
    x86/intel_rdt: Create resctrl debug area
    x86/intel_rdt: Ensure RDT cleanup on exit
    x86/intel_rdt: Resctrl files reflect pseudo-locked information
    x86/intel_rdt: Support creation/removal of pseudo-locked region
    x86/intel_rdt: Pseudo-lock region creation/removal core
    x86/intel_rdt: Discover supported platforms via prefetch disable bits
    x86/intel_rdt: Add utilities to test pseudo-locked region possibility
    x86/intel_rdt: Split resource group removal in two
    x86/intel_rdt: Enable entering of pseudo-locksetup mode
    ...

    Linus Torvalds
     

20 Jul, 2018

1 commit

  • Currently, the notsc kernel parameter disables the use of the TSC by
    sched_clock(). However, this parameter does not prevent the kernel from
    accessing tsc in other places.

    The only rationale to boot with notsc is to avoid timing discrepancies on
    multi-socket systems where TSC are not properly synchronized, and thus
    exclude TSC from being used for time keeping. But that prevents using TSC
    as sched_clock() as well, which is not necessary as the core sched_clock()
    implementation can handle non synchronized TSC based sched clocks just
    fine.

    However, there is another method to solve the above problem: booting with
    tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
    just excludes it from timekeeping.

    So there is no real reason to keep notsc, but for compatibility reasons the
    parameter has to stay. Make it behave like 'tsc=unstable' instead.

    [ tglx: Massaged changelog ]

    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Dou Liyang
    Reviewed-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: peterz@infradead.org
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com

    Pavel Tatashin
     

07 Jul, 2018

1 commit

  • The current NUMA emulation capabilities for splitting System RAM by a
    fixed size or by a set number of nodes may result in some nodes being
    larger than others. The implementation prioritizes establishing a
    minimum usable memory size over satisfying the requested number of NUMA
    nodes.

    Introduce a uniform split capability that evenly partitions each
    physical NUMA node into N emulated nodes. For example numa=fake=3U
    creates 6 emulated nodes total on a system that has 2 physical nodes.

    This capability is useful for debugging and evaluating platform
    memory-side-cache capabilities as described by the ACPI HMAT (see
    5.2.27.5 Memory Side Cache Information Structure in ACPI 6.2a)

    Compare numa=fake=6 that results in only 5 nodes being created against
    numa=fake=3U which takes the 2 physical nodes and evenly divides them.

    numa=fake=6
    available: 5 nodes (0-4)
    node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
    node 0 size: 2648 MB
    node 0 free: 2443 MB
    node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    node 1 size: 2672 MB
    node 1 free: 2442 MB
    node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
    node 2 size: 5291 MB
    node 2 free: 5278 MB
    node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    node 3 size: 2677 MB
    node 3 free: 2665 MB
    node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    node 4 size: 2676 MB
    node 4 free: 2663 MB
    node distances:
    node 0 1 2 3 4
    0: 10 20 10 20 20
    1: 20 10 20 10 10
    2: 10 20 10 20 20
    3: 20 10 20 10 10
    4: 20 10 20 10 10

    numa=fake=3U
    available: 6 nodes (0-5)
    node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
    node 0 size: 2900 MB
    node 0 free: 2637 MB
    node 1 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
    node 1 size: 3023 MB
    node 1 free: 3012 MB
    node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
    node 2 size: 2015 MB
    node 2 free: 2004 MB
    node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    node 3 size: 2704 MB
    node 3 free: 2522 MB
    node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    node 4 size: 2709 MB
    node 4 free: 2698 MB
    node 5 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
    node 5 size: 2612 MB
    node 5 free: 2601 MB
    node distances:
    node 0 1 2 3 4 5
    0: 10 10 10 20 20 20
    1: 10 10 10 20 20 20
    2: 10 10 10 20 20 20
    3: 20 20 20 10 10 10
    4: 20 20 20 10 10 10
    5: 20 20 20 10 10 10

    Signed-off-by: Dan Williams
    Cc: David Rientjes
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Wei Yang
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/153089328617.27680.14930758266174305832.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Dan Williams
     

03 Jul, 2018

1 commit

  • When a resource group enters pseudo-locksetup mode it reflects that the
    platform supports cache pseudo-locking and the resource group is unused,
    ready to be used for a pseudo-locked region. Until it is set up as a
    pseudo-locked region the resource group is "locked down" such that no new
    tasks or cpus can be assigned to it. This is accomplished in a user visible
    way by making the cpus, cpus_list, and tasks resctrl files inaccassible
    (user cannot read from or write to these files).

    When the resource group changes to pseudo-locked mode it represents a cache
    pseudo-locked region. While not appropriate to make any changes to the cpus
    assigned to this region it is useful to make it easy for the user to see
    which cpus are associated with the pseudo-locked region.

    Modify the permissions of the cpus/cpus_list file when the resource group
    changes to pseudo-locked mode to support reading (not writing). The
    information presented to the user when reading the file are the cpus
    associated with the pseudo-locked region.

    Signed-off-by: Reinette Chatre
    Signed-off-by: Thomas Gleixner
    Cc: fenghua.yu@intel.com
    Cc: tony.luck@intel.com
    Cc: vikas.shivappa@linux.intel.com
    Cc: gavin.hindman@intel.com
    Cc: jithu.joseph@intel.com
    Cc: dave.hansen@intel.com
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/12756b7963b6abc1bffe8fb560b87b75da827bd1.1530421961.git.reinette.chatre@intel.com

    Reinette Chatre
     

24 Jun, 2018

1 commit

  • Deeper C-states impact cache content through shrinking of the cache or
    flushing entire cache to memory before reducing power to the cache.
    Deeper C-states will thus negatively impact the pseudo-locked regions.

    To avoid impacting pseudo-locked regions C-states are limited on
    pseudo-locked region creation so that cores associated with the
    pseudo-locked region are prevented from entering deeper C-states.
    This is accomplished by requesting a CPU latency target which will
    prevent the core from entering C6 across all supported platforms.

    Signed-off-by: Reinette Chatre
    Signed-off-by: Thomas Gleixner
    Cc: fenghua.yu@intel.com
    Cc: tony.luck@intel.com
    Cc: vikas.shivappa@linux.intel.com
    Cc: gavin.hindman@intel.com
    Cc: jithu.joseph@intel.com
    Cc: dave.hansen@intel.com
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/1ef4f99dd6ba12fa6fb44c5a1141e75f952b9cd9.1529706536.git.reinette.chatre@intel.com

    Reinette Chatre
     

23 Jun, 2018

2 commits

  • Add description of Cache Pseudo-Locking feature, its interface, as well as
    an example of its usage.

    Signed-off-by: Reinette Chatre
    Signed-off-by: Thomas Gleixner
    Cc: fenghua.yu@intel.com
    Cc: tony.luck@intel.com
    Cc: vikas.shivappa@linux.intel.com
    Cc: gavin.hindman@intel.com
    Cc: jithu.joseph@intel.com
    Cc: dave.hansen@intel.com
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/6e118c15d2c254a27b8891783505cd1bb94a2b10.1529706536.git.reinette.chatre@intel.com

    Reinette Chatre
     
  • By default resource groups allow sharing of their cache allocations. There
    is nothing that prevents a resource group from configuring a cache
    allocation that overlaps with that of an existing resource group.

    To enable resource groups to specify that their cache allocations cannot be
    shared a resource group "mode" is introduced to support two possible modes:
    "shareable" and "exclusive". A "shareable" resource group allows sharing of
    its cache allocations, an "exclusive" resource group does not. A new
    resctrl file "mode" associated with each resource group is used to
    communicate its (the associated resource group's) mode setting and allow
    the mode to be changed. The new "mode" file as well as two other resctrl
    files, "bit_usage" and "size", are introduced in this series.

    Add documentation for the three new resctrl files as well as one example
    demonstrating their use.

    Signed-off-by: Reinette Chatre
    Signed-off-by: Thomas Gleixner
    Cc: fenghua.yu@intel.com
    Cc: tony.luck@intel.com
    Cc: vikas.shivappa@linux.intel.com
    Cc: gavin.hindman@intel.com
    Cc: jithu.joseph@intel.com
    Cc: dave.hansen@intel.com
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/f03a3059ec40ae719be6f3fba9f446bb055e0064.1529706536.git.reinette.chatre@intel.com

    Reinette Chatre
     

05 Jun, 2018

1 commit

  • Pull x86 cache resource controller updates from Thomas Gleixner:
    "An update for the Intel Resource Director Technolgy (RDT) which adds a
    feedback driven software controller to runtime adjust the bandwidth
    allocation MSRs.

    This makes the allocations more accurate and allows to use bandwidth
    values in understandable units (MB/s) instead of using percentage
    based allocations as the original, still available, interface.

    The software controller can be enabled with a new mount option for the
    resctrl filesystem"

    * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
    x86/intel_rdt/mba_sc: Prepare for feedback loop
    x86/intel_rdt/mba_sc: Add schemata support
    x86/intel_rdt/mba_sc: Add initialization support
    x86/intel_rdt/mba_sc: Enable/disable MBA software controller
    x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)

    Linus Torvalds
     

28 May, 2018

3 commits


19 May, 2018

1 commit

  • Add documentation about the feedback loop mechanism (MBA software
    controller) which lets the user specify the memory bandwidth allocation
    in MBps. This includes some changes to "schemata" formati with
    examples.

    Signed-off-by: Vikas Shivappa
    Signed-off-by: Thomas Gleixner
    Cc: ravi.v.shankar@intel.com
    Cc: tony.luck@intel.com
    Cc: fenghua.yu@intel.com
    Cc: vikas.shivappa@intel.com
    Cc: ak@linux.intel.com
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/1524263781-14267-2-git-send-email-vikas.shivappa@linux.intel.com

    Vikas Shivappa
     

16 Apr, 2018

1 commit

  • Pull x86 fixes from Thomas Gleixner:
    "A set of fixes and updates for x86:

    - Address a swiotlb regression which was caused by the recent DMA
    rework and made driver fail because dma_direct_supported() returned
    false

    - Fix a signedness bug in the APIC ID validation which caused invalid
    APIC IDs to be detected as valid thereby bloating the CPU possible
    space.

    - Fix inconsisten config dependcy/select magic for the MFD_CS5535
    driver.

    - Fix a corruption of the physical address space bits when encryption
    has reduced the address space and late cpuinfo updates overwrite
    the reduced bit information with the original value.

    - Dominiks syscall rework which consolidates the architecture
    specific syscall functions so all syscalls can be wrapped with the
    same macros. This allows to switch x86/64 to struct pt_regs based
    syscalls. Extend the clearing of user space controlled registers in
    the entry patch to the lower registers"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/apic: Fix signedness bug in APIC ID validity checks
    x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
    x86/olpc: Fix inconsistent MFD_CS5535 configuration
    swiotlb: Use dma_direct_supported() for swiotlb_ops
    syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
    syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
    syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
    syscalls/core, syscalls/x86: Clean up syscall stub naming convention
    syscalls/x86: Extend register clearing on syscall entry to lower registers
    syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
    syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
    syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
    syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
    syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
    x86/syscalls: Don't pointlessly reload the system call number
    x86/mm: Fix documentation of module mapping range with 4-level paging
    x86/cpuid: Switch to 'static const' specifier

    Linus Torvalds
     

12 Apr, 2018

1 commit


06 Apr, 2018

1 commit

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    kfifo: fix inaccurate comment
    tools/thermal: tmon: fix for segfault
    net: Spelling s/stucture/structure/
    edd: don't spam log if no EDD information is present
    Documentation: Fix early-microcode.txt references after file rename
    tracing: Block comments should align the * on each line
    treewide: Fix typos in printk
    GenWQE: Fix a typo in two comments
    treewide: Align function definition open/close braces

    Linus Torvalds
     

03 Apr, 2018

1 commit

  • Commit:

    f5a40711fa58 ("x86/mm: Set MODULES_END to 0xffffffffff000000")

    changed MODULES_END back to a fixed value, but didn't update the documentation
    of memory layout for 4-level paging.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Andrey Ryabinin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: f5a40711fa58 ("x86/mm: Set MODULES_END to 0xffffffffff000000")
    Link: http://lkml.kernel.org/r/20180402121025.10244-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     

27 Mar, 2018

1 commit

  • The file Documentation/x86/early-microcode.txt was renamed to
    Documentation/x86/microcode.txt in 0e3258753f81, but it was still
    referenced by its old name in a three places:

    * Documentation/x86/00-INDEX
    * arch/x86/Kconfig
    * arch/x86/kernel/cpu/microcode/amd.c

    This commit updates these references accordingly.

    Fixes: 0e3258753f81 ("x86/microcode: Document the three loading methods")
    Signed-off-by: Jaak Ristioja
    Signed-off-by: Jiri Kosina

    Jaak Ristioja
     

15 Mar, 2018

1 commit


01 Mar, 2018

1 commit


26 Feb, 2018

1 commit


23 Feb, 2018

1 commit

  • topology_sibling_cpumask() is the correct thread-related topology
    function in the kernel:

    s/topology_sibling_mask/topology_sibling_cpumask

    Signed-off-by: Dou Liyang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: corbet@lwn.net
    Cc: linux-doc@vger.kernel.org
    Link: http://lkml.kernel.org/r/20180222084812.14497-1-douly.fnst@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Dou Liyang
     

16 Feb, 2018

1 commit

  • All pieces of the puzzle are in place and we can now allow to boot with
    CONFIG_X86_5LEVEL=y on a machine without LA57 support.

    Kernel will detect that LA57 is missing and fold p4d at runtime.

    Update the documentation and the Kconfig option description to reflect the
    change.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andy Lutomirski
    Cc: Arjan van de Ven
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Woodhouse
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     

02 Feb, 2018

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the set of "big" driver core patches for 4.16-rc1.

    The majority of the work here is in the firmware subsystem, with
    reworks to try to attempt to make the code easier to handle in the
    long run, but no functional change. There's also some tree-wide sysfs
    attribute fixups with lots of acks from the various subsystem
    maintainers, as well as a handful of other normal fixes and changes.

    And finally, some license cleanups for the driver core and sysfs code.

    All have been in linux-next for a while with no reported issues"

    * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
    device property: Define type of PROPERTY_ENRTY_*() macros
    device property: Reuse property_entry_free_data()
    device property: Move property_entry_free_data() upper
    firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
    firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
    USB: serial: keyspan: Drop firmware Kconfig options
    sysfs: remove DEBUG defines
    sysfs: use SPDX identifiers
    drivers: base: add coredump driver ops
    sysfs: add attribute specification for /sysfs/devices/.../coredump
    test_firmware: fix missing unlock on error in config_num_requests_store()
    test_firmware: make local symbol test_fw_config static
    sysfs: turn WARN() into pr_warn()
    firmware: Fix a typo in fallback-mechanisms.rst
    treewide: Use DEVICE_ATTR_WO
    treewide: Use DEVICE_ATTR_RO
    treewide: Use DEVICE_ATTR_RW
    sysfs.h: Use octal permissions
    component: add debugfs support
    bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
    ...

    Linus Torvalds
     

30 Jan, 2018

1 commit

  • Pull x86/cache updates from Thomas Gleixner:
    "A set of patches which add support for L2 cache partitioning to the
    Intel RDT facility"

    * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/intel_rdt: Add command line parameter to control L2_CDP
    x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
    x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
    x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
    x86/intel_rdt: Add L2CDP support in documentation
    x86/intel_rdt: Update documentation

    Linus Torvalds
     

25 Jan, 2018

1 commit


22 Jan, 2018

1 commit

  • Pull x86 pti fixes from Thomas Gleixner:
    "A small set of fixes for the meltdown/spectre mitigations:

    - Make kprobes aware of retpolines to prevent probes in the retpoline
    thunks.

    - Make the machine check exception speculation protected. MCE used to
    issue an indirect call directly from the ASM entry code. Convert
    that to a direct call into a C-function and issue the indirect call
    from there so the compiler can add the retpoline protection,

    - Make the vmexit_fill_RSB() assembly less stupid

    - Fix a typo in the PTI documentation"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
    x86/pti: Document fix wrong index
    kprobes/x86: Disable optimizing on the function jumps to indirect thunk
    kprobes/x86: Blacklist indirect thunk functions for kprobes
    retpoline: Introduce start/end markers of indirect thunk
    x86/mce: Make machine check speculation protected

    Linus Torvalds
     

19 Jan, 2018

1 commit

  • In section , fix wrong index.

    Signed-off-by: zhenwei.pi
    Signed-off-by: Thomas Gleixner
    Cc: dave.hansen@linux.intel.com
    Link: https://lkml.kernel.org/r/1516237492-27739-1-git-send-email-zhenwei.pi@youruncloud.com

    zhenwei.pi
     

18 Jan, 2018

2 commits

  • L2 and L3 Code and Data Prioritization (CDP) can be enabled separately.
    The existing mount parameter "cdp" is only for enabling L3 CDP and will be
    kept for backwards compability.

    Add a new mount parameter 'cdpl2' for L2 CDP.

    [ tglx: Made changelog readable ]

    Signed-off-by: Fenghua Yu
    Signed-off-by: Thomas Gleixner
    Cc: "Ravi V Shankar"
    Cc: "Tony Luck"
    Cc: Vikas"
    Cc: Sai Praneeth"
    Cc: Reinette"
    Link: https://lkml.kernel.org/r/1513810644-78015-3-git-send-email-fenghua.yu@intel.com

    Fenghua Yu
     
  • With more flag bits in /proc/cpuinfo for RDT, it's better to classify the
    bits for readability.

    Some previously missing bits are added as well.

    Signed-off-by: Fenghua Yu
    Signed-off-by: Thomas Gleixner
    Cc: "Ravi V Shankar"
    Cc: "Tony Luck"
    Cc: Vikas"
    Cc: Sai Praneeth"
    Cc: Reinette"
    Link: https://lkml.kernel.org/r/1513810644-78015-2-git-send-email-fenghua.yu@intel.com

    Fenghua Yu
     

15 Jan, 2018

1 commit

  • Pull x86 pti updates from Thomas Gleixner:
    "This contains:

    - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
    disabled. This seems to cause issues on a virtual machine at least
    and is incorrect according to the AMD manual.

    - a PTI bugfix which disables the perf BTS facility if PTI is
    enabled. The BTS AUX buffer is not globally visible and causes the
    CPU to fault when the mapping disappears on switching CR3 to user
    space. A full fix which restores BTS on PTI is non trivial and will
    be worked on.

    - PTI bugfixes for EFI and trusted boot which make sure that the user
    space visible page table entries have the NX bit cleared

    - removal of dead code in the PTI pagetable setup functions

    - add PTI documentation

    - add a selftest for vsyscall to verify that the kernel actually
    implements what it advertises.

    - a sysfs interface to expose vulnerability and mitigation
    information so there is a coherent way for users to retrieve the
    status.

    - the initial spectre_v2 mitigations, aka retpoline:

    + The necessary ASM thunk and compiler support

    + The ASM variants of retpoline and the conversion of affected ASM
    code

    + Make LFENCE serializing on AMD so it can be used as speculation
    trap

    + The RSB fill after vmexit

    - initial objtool support for retpoline

    As I said in the status mail this is the most of the set of patches
    which should go into 4.15 except two straight forward patches still on
    hold:

    - the retpoline add on of LFENCE which waits for ACKs

    - the RSB fill after context switch

    Both should be ready to go early next week and with that we'll have
    covered the major holes of spectre_v2 and go back to normality"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
    x86,perf: Disable intel_bts when PTI
    security/Kconfig: Correct the Documentation reference for PTI
    x86/pti: Fix !PCID and sanitize defines
    selftests/x86: Add test_vsyscall
    x86/retpoline: Fill return stack buffer on vmexit
    x86/retpoline/irq32: Convert assembler indirect jumps
    x86/retpoline/checksum32: Convert assembler indirect jumps
    x86/retpoline/xen: Convert Xen hypercall indirect jumps
    x86/retpoline/hyperv: Convert assembler indirect jumps
    x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
    x86/retpoline/entry: Convert entry assembler indirect jumps
    x86/retpoline/crypto: Convert crypto assembler indirect jumps
    x86/spectre: Add boot time option to select Spectre v2 mitigation
    x86/retpoline: Add initial retpoline support
    objtool: Allow alternatives to be ignored
    objtool: Detect jumps to retpoline thunks
    x86/pti: Make unpoison of pgd for trusted boot work for real
    x86/alternatives: Fix optimize_nops() checking
    sysfs/cpu: Fix typos in vulnerability documentation
    x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
    ...

    Linus Torvalds
     

07 Jan, 2018

1 commit

  • Add some details about how PTI works, what some of the downsides
    are, and how to debug it when things go wrong.

    Also document the kernel parameter: 'pti/nopti'.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Randy Dunlap
    Reviewed-by: Kees Cook
    Cc: Moritz Lipp
    Cc: Daniel Gruss
    Cc: Michael Schwarz
    Cc: Richard Fellner
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Cc: Hugh Dickins
    Cc: Andi Lutomirsky
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com

    Dave Hansen
     

06 Jan, 2018

1 commit

  • Pull more x86 pti fixes from Thomas Gleixner:
    "Another small stash of fixes for fallout from the PTI work:

    - Fix the modules vs. KASAN breakage which was caused by making
    MODULES_END depend of the fixmap size. That was done when the cpu
    entry area moved into the fixmap, but now that we have a separate
    map space for that this is causing more issues than it solves.

    - Use the proper cache flush methods for the debugstore buffers as
    they are mapped/unmapped during runtime and not statically mapped
    at boot time like the rest of the cpu entry area.

    - Make the map layout of the cpu_entry_area consistent for 4 and 5
    level paging and fix the KASLR vaddr_end wreckage.

    - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
    nvidia gfx drivers by dropping the GPL export. The subject line of
    the commit tells it the other way around, but I noticed that too
    late.

    - Fix the ASM alternative macros so they can be used in the middle of
    an inline asm block.

    - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
    vector is properly identified. The Spectre mitigations will come
    with their own bug bits later"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
    x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
    x86/tlb: Drop the _GPL from the cpu_tlbstate export
    x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
    x86/kaslr: Fix the vaddr_end mess
    x86/mm: Map cpu_entry_area at the same place on 4/5 level
    x86/mm: Set MODULES_END to 0xffffffffff000000

    Linus Torvalds
     

05 Jan, 2018

3 commits

  • vaddr_end for KASLR is only documented in the KASLR code itself and is
    adjusted depending on config options. So it's not surprising that a change
    of the memory layout causes KASLR to have the wrong vaddr_end. This can map
    arbitrary stuff into other areas causing hard to understand problems.

    Remove the whole ifdef magic and define the start of the cpu_entry_area to
    be the end of the KASLR vaddr range.

    Add documentation to that effect.

    Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
    Reported-by: Benjamin Gilbert
    Signed-off-by: Thomas Gleixner
    Tested-by: Benjamin Gilbert
    Cc: Andy Lutomirski
    Cc: Greg Kroah-Hartman
    Cc: stable
    Cc: Dave Hansen
    Cc: Peter Zijlstra
    Cc: Thomas Garnier ,
    Cc: Alexander Kuleshov
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos

    Thomas Gleixner
     
  • There is no reason for 4 and 5 level pagetables to have a different
    layout. It just makes determining vaddr_end for KASLR harder than
    necessary.

    Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Benjamin Gilbert
    Cc: Greg Kroah-Hartman
    Cc: stable
    Cc: Dave Hansen
    Cc: Peter Zijlstra
    Cc: Thomas Garnier ,
    Cc: Alexander Kuleshov
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos

    Thomas Gleixner
     
  • Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
    kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.

    So passing page unaligned address to kasan_populate_zero_shadow() have two
    possible effects:

    1) It may leave one page hole in supposed to be populated area. After commit
    21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
    hole happens to be in the shadow covering fixmap area and leads to crash:

    BUG: unable to handle kernel paging request at fffffbffffe8ee04
    RIP: 0010:check_memory_region+0x5c/0x190

    Call Trace:

    memcpy+0x1f/0x50
    ghes_copy_tofrom_phys+0xab/0x180
    ghes_read_estatus+0xfb/0x280
    ghes_notify_nmi+0x2b2/0x410
    nmi_handle+0x115/0x2c0
    default_do_nmi+0x57/0x110
    do_nmi+0xf8/0x150
    end_repeat_nmi+0x1a/0x1e

    Note, the crash likely disappeared after commit 92a0f81d8957, which
    changed kasan_populate_zero_shadow() call the way it was before
    commit 21506525fb8d.

    2) Attempt to load module near MODULES_END will fail, because
    __vmalloc_node_range() called from kasan_module_alloc() will hit the
    WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.

    To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
    which means that MODULES_END should be 8*PAGE_SIZE aligned.

    The whole point of commit f06bdd4001c2 was to move MODULES_END down if
    NR_CPUS is big, so the cpu_entry_area takes a lot of space.
    But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
    the cpu_entry_area is no longer in fixmap, so we could just set
    MODULES_END to a fixed 8*PAGE_SIZE aligned address.

    Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
    Reported-by: Jakub Kicinski
    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Cc: Andy Lutomirski
    Cc: Thomas Garnier
    Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com

    Andrey Ryabinin
     

30 Dec, 2017

1 commit

  • Pull x86 page table isolation updates from Thomas Gleixner:
    "This is the final set of enabling page table isolation on x86:

    - Infrastructure patches for handling the extra page tables.

    - Patches which map the various bits and pieces which are required to
    get in and out of user space into the user space visible page
    tables.

    - The required changes to have CR3 switching in the entry/exit code.

    - Optimizations for the CR3 switching along with documentation how
    the ASID/PCID mechanism works.

    - Updates to dump pagetables to cover the user space page tables for
    W+X scans and extra debugfs files to analyze both the kernel and
    the user space visible page tables

    The whole functionality is compile time controlled via a config switch
    and can be turned on/off on the command line as well"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    x86/ldt: Make the LDT mapping RO
    x86/mm/dump_pagetables: Allow dumping current pagetables
    x86/mm/dump_pagetables: Check user space page table for WX pages
    x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
    x86/mm/pti: Add Kconfig
    x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
    x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
    x86/mm: Use INVPCID for __native_flush_tlb_single()
    x86/mm: Optimize RESTORE_CR3
    x86/mm: Use/Fix PCID to optimize user/kernel switches
    x86/mm: Abstract switching CR3
    x86/mm: Allow flushing for future ASID switches
    x86/pti: Map the vsyscall page if needed
    x86/pti: Put the LDT in its own PGD if PTI is on
    x86/mm/64: Make a full PGD-entry size hole in the memory map
    x86/events/intel/ds: Map debug buffers in cpu_entry_area
    x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
    x86/mm/pti: Map ESPFIX into user space
    x86/mm/pti: Share entry text PMD
    x86/entry: Align entry text section to PMD boundary
    ...

    Linus Torvalds
     

24 Dec, 2017

1 commit

  • With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
    The LDT is per process, i.e. per mm.

    An earlier approach mapped the LDT on context switch into a fixmap area,
    but that's a big overhead and exhausted the fixmap space when NR_CPUS got
    big.

    Take advantage of the fact that there is an address space hole which
    provides a completely unused pgd. Use this pgd to manage per-mm LDT
    mappings.

    This has a down side: the LDT isn't (currently) randomized, and an attack
    that can write the LDT is instant root due to call gates (thanks, AMD, for
    leaving call gates in AMD64 but designing them wrong so they're only useful
    for exploits). This can be mitigated by making the LDT read-only or
    randomizing the mapping, either of which is strightforward on top of this
    patch.

    This will significantly slow down LDT users, but that shouldn't matter for
    important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
    old libc implementations.

    [ tglx: Cleaned it up. ]

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Andy Lutomirski