02 Oct, 2020

2 commits

  • Change additional instances that could use sysfs_emit and sysfs_emit_at
    that the coccinelle script could not convert.

    o macros creating show functions with ## concatenation
    o unbound sprintf uses with buf+len for start of output to sysfs_emit_at
    o returns with ?: tests and sprintf to sysfs_emit
    o sysfs output with struct class * not struct device * arguments

    Miscellanea:

    o remove unnecessary initializations around these changes
    o consistently use int len for return length of show functions
    o use octal permissions and not S_
    o rename a few show function names so DEVICE_ATTR_ can be used
    o use DEVICE_ATTR_ADMIN_RO where appropriate
    o consistently use const char *output for strings
    o checkpatch/style neatening

    Signed-off-by: Joe Perches
    Link: https://lore.kernel.org/r/8bc24444fe2049a9b2de6127389b57edfdfe324d.1600285923.git.joe@perches.com
    Signed-off-by: Greg Kroah-Hartman

    Joe Perches
     
  • Convert the various sprintf fmaily calls in sysfs device show functions
    to sysfs_emit and sysfs_emit_at for PAGE_SIZE buffer safety.

    Done with:

    $ spatch -sp-file sysfs_emit_dev.cocci --in-place --max-width=80 .

    And cocci script:

    $ cat sysfs_emit_dev.cocci
    @@
    identifier d_show;
    identifier dev, attr, buf;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    expression chr;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    identifier len;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    return len;
    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    identifier len;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    return len;
    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    identifier len;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    return len;
    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    identifier len;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {

    return len;
    }

    @@
    identifier d_show;
    identifier dev, attr, buf;
    expression chr;
    @@

    ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
    {
    ...
    - strcpy(buf, chr);
    - return strlen(buf);
    + return sysfs_emit(buf, chr);
    }

    Signed-off-by: Joe Perches
    Link: https://lore.kernel.org/r/3d033c33056d88bbe34d4ddb62afd05ee166ab9a.1600285923.git.joe@perches.com
    Signed-off-by: Greg Kroah-Hartman

    Joe Perches
     

20 Apr, 2020

1 commit

  • SRBDS is an MDS-like speculative side channel that can leak bits from the
    random number generator (RNG) across cores and threads. New microcode
    serializes the processor access during the execution of RDRAND and
    RDSEED. This ensures that the shared buffer is overwritten before it is
    released for reuse.

    While it is present on all affected CPU models, the microcode mitigation
    is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
    cases where TSX is not supported or has been disabled with TSX_CTRL.

    The mitigation is activated by default on affected processors and it
    increases latency for RDRAND and RDSEED instructions. Among other
    effects this will reduce throughput from /dev/urandom.

    * Enable administrator to configure the mitigation off when desired using
    either mitigations=off or srbds=off.

    * Export vulnerability status via sysfs

    * Rename file-scoped macros to apply for non-whitelist table initializations.

    [ bp: Massage,
    - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
    - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
    - flip check in cpu_set_bug_bits() to save an indentation level,
    - reflow comments.
    jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
    tglx: Dropped the fused off magic for now
    ]

    Signed-off-by: Mark Gross
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Tony Luck
    Reviewed-by: Pawan Gupta
    Reviewed-by: Josh Poimboeuf
    Tested-by: Neelima Krishnan

    Mark Gross
     

31 Mar, 2020

1 commit

  • Pull core SMP updates from Thomas Gleixner:
    "CPU (hotplug) updates:

    - Support for locked CSD objects in smp_call_function_single_async()
    which allows to simplify callsites in the scheduler core and MIPS

    - Treewide consolidation of CPU hotplug functions which ensures the
    consistency between the sysfs interface and kernel state. The low
    level functions cpu_up/down() are now confined to the core code and
    not longer accessible from random code"

    * tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
    cpu/hotplug: Hide cpu_up/down()
    cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
    torture: Replace cpu_up/down() with add/remove_cpu()
    firmware: psci: Replace cpu_up/down() with add/remove_cpu()
    xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
    parisc: Replace cpu_up/down() with add/remove_cpu()
    sparc: Replace cpu_up/down() with add/remove_cpu()
    powerpc: Replace cpu_up/down() with add/remove_cpu()
    x86/smp: Replace cpu_up/down() with add/remove_cpu()
    arm64: hibernate: Use bringup_hibernate_cpu()
    cpu/hotplug: Provide bringup_hibernate_cpu()
    arm64: Use reboot_cpu instead of hardconding it to 0
    arm64: Don't use disable_nonboot_cpus()
    ARM: Use reboot_cpu instead of hardcoding it to 0
    ARM: Don't use disable_nonboot_cpus()
    ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
    cpu/hotplug: Create a new function to shutdown nonboot cpus
    cpu/hotplug: Add new {add,remove}_cpu() functions
    sched/core: Remove rq.hrtick_csd_pending
    ...

    Linus Torvalds
     

25 Mar, 2020

1 commit

  • Use separate functions for the device core to bring a CPU up and down.

    Users outside the device core must use add/remove_cpu() which will take
    care of extra housekeeping work like keeping sysfs in sync.

    Make cpu_up/down() static and replace the extra layer of indirection.

    [ tglx: Removed the extra wrapper functions and adjusted function names ]

    Signed-off-by: Qais Yousef
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com

    Qais Yousef
     

11 Mar, 2020

2 commits


04 Nov, 2019

1 commit

  • Some processors may incur a machine check error possibly resulting in an
    unrecoverable CPU lockup when an instruction fetch encounters a TLB
    multi-hit in the instruction TLB. This can occur when the page size is
    changed along with either the physical address or cache type. The relevant
    erratum can be found here:

    https://bugzilla.kernel.org/show_bug.cgi?id=205195

    There are other processors affected for which the erratum does not fully
    disclose the impact.

    This issue affects both bare-metal x86 page tables and EPT.

    It can be mitigated by either eliminating the use of large pages or by
    using careful TLB invalidations when changing the page size in the page
    tables.

    Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
    MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
    are mitigated against this issue.

    Signed-off-by: Vineela Tummalapalli
    Co-developed-by: Pawan Gupta
    Signed-off-by: Pawan Gupta
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Vineela Tummalapalli
     

28 Oct, 2019

1 commit

  • Add the sysfs reporting file for TSX Async Abort. It exposes the
    vulnerability and the mitigation state similar to the existing files for
    the other hardware vulnerabilities.

    Sysfs file path is:
    /sys/devices/system/cpu/vulnerabilities/tsx_async_abort

    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Tested-by: Neelima Krishnan
    Reviewed-by: Mark Gross
    Reviewed-by: Tony Luck
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     

14 May, 2019

1 commit

  • Pull x86 MDS mitigations from Thomas Gleixner:
    "Microarchitectural Data Sampling (MDS) is a hardware vulnerability
    which allows unprivileged speculative access to data which is
    available in various CPU internal buffers. This new set of misfeatures
    has the following CVEs assigned:

    CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
    CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
    CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
    CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory

    MDS attacks target microarchitectural buffers which speculatively
    forward data under certain conditions. Disclosure gadgets can expose
    this data via cache side channels.

    Contrary to other speculation based vulnerabilities the MDS
    vulnerability does not allow the attacker to control the memory target
    address. As a consequence the attacks are purely sampling based, but
    as demonstrated with the TLBleed attack samples can be postprocessed
    successfully.

    The mitigation is to flush the microarchitectural buffers on return to
    user space and before entering a VM. It's bolted on the VERW
    instruction and requires a microcode update. As some of the attacks
    exploit data structures shared between hyperthreads, full protection
    requires to disable hyperthreading. The kernel does not do that by
    default to avoid breaking unattended updates.

    The mitigation set comes with documentation for administrators and a
    deeper technical view"

    * 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    x86/speculation/mds: Fix documentation typo
    Documentation: Correct the possible MDS sysfs values
    x86/mds: Add MDSUM variant to the MDS documentation
    x86/speculation/mds: Add 'mitigations=' support for MDS
    x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
    x86/speculation/mds: Fix comment
    x86/speculation/mds: Add SMT warning message
    x86/speculation: Move arch_smt_update() call to after mitigation decisions
    x86/speculation/mds: Add mds=full,nosmt cmdline option
    Documentation: Add MDS vulnerability documentation
    Documentation: Move L1TF to separate directory
    x86/speculation/mds: Add mitigation mode VMWERV
    x86/speculation/mds: Add sysfs reporting for MDS
    x86/speculation/mds: Add mitigation control for MDS
    x86/speculation/mds: Conditionally clear CPU buffers on idle entry
    x86/kvm/vmx: Add MDS protection when L1D Flush is not active
    x86/speculation/mds: Clear CPU buffers on exit to user
    x86/speculation/mds: Add mds_clear_cpu_buffers()
    x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
    x86/speculation/mds: Add BUG_MSBDS_ONLY
    ...

    Linus Torvalds
     

07 Mar, 2019

2 commits

  • Pull driver core updates from Greg KH:
    "Here is the big driver core patchset for 5.1-rc1

    More patches than "normal" here this merge window, due to some work in
    the driver core by Alexander Duyck to rework the async probe
    functionality to work better for a number of devices, and independant
    work from Rafael for the device link functionality to make it work
    "correctly".

    Also in here is:

    - lots of BUS_ATTR() removals, the macro is about to go away

    - firmware test fixups

    - ihex fixups and simplification

    - component additions (also includes i915 patches)

    - lots of minor coding style fixups and cleanups.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
    driver core: platform: remove misleading err_alloc label
    platform: set of_node in platform_device_register_full()
    firmware: hardcode the debug message for -ENOENT
    driver core: Add missing description of new struct device_link field
    driver core: Fix PM-runtime for links added during consumer probe
    drivers/component: kerneldoc polish
    async: Add cmdline option to specify drivers to be async probed
    driver core: Fix possible supplier PM-usage counter imbalance
    PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
    driver: platform: Support parsing GpioInt 0 in platform_get_irq()
    selftests: firmware: fix verify_reqs() return value
    Revert "selftests: firmware: remove use of non-standard diff -Z option"
    Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
    device: Fix comment for driver_data in struct device
    kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
    sysfs: remove unused include of kernfs-internal.h
    driver core: Postpone DMA tear-down until after devres release
    driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
    PM-runtime: Take suppliers into account in __pm_runtime_set_status()
    device.h: Add __cold to dev_ logging functions
    ...

    Linus Torvalds
     
  • Add the sysfs reporting file for MDS. It exposes the vulnerability and
    mitigation state similar to the existing files for the other speculative
    hardware vulnerabilities.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Borislav Petkov
    Reviewed-by: Jon Masters
    Tested-by: Jon Masters

    Thomas Gleixner
     

19 Feb, 2019

1 commit

  • All device objects in the driver model contain fields that control the
    handling of various power management activities. However, it's not
    always useful. There are few instances where pseudo devices are added
    to the model just to take advantage of many other features like
    kobjects, udev events, and so on. One such example is cpu devices and
    their caches.

    The sysfs for the cpu caches are managed by adding devices with cpu
    as the parent in cpu_device_create() when secondary cpu is brought
    online. Generally when the secondary CPUs are hotplugged back in as part
    of resume from suspend-to-ram, we call cpu_device_create() from the cpu
    hotplug state machine while the cpu device associated with that CPU is
    not yet ready to be resumed as the device_resume() call happens bit
    later. It's not really needed to set the flag is_prepared for cpu
    devices as they are mostly pseudo device and hotplug framework deals
    with state machine and not managed through the cpu device.

    This often results in annoying warning when resuming:
    Enabling non-boot CPUs ...
    CPU1: Booted secondary processor
    cache: parent cpu1 should not be sleeping
    CPU1 is up
    CPU2: Booted secondary processor
    cache: parent cpu2 should not be sleeping
    CPU2 is up
    .... and so on.

    So in order to fix these kind of errors, we could just completely avoid
    doing any power management related initialisations and operations if
    they are not used by these devices.

    Add no_pm flags to indicate that the device doesn't require any sort of
    PM activities and all of them can be completely skipped. We can use the
    same flag to also avoid adding not used *power* sysfs entries for these
    devices. For now, lets use this for cpu cache devices.

    Reviewed-by: Ulf Hansson
    Signed-off-by: Sudeep Holla
    Tested-by: Eugeniu Rosca
    Signed-off-by: Rafael J. Wysocki

    Sudeep Holla
     

01 Feb, 2019

1 commit


21 Jun, 2018

1 commit

  • L1TF core kernel workarounds are cheap and normally always enabled, However
    they still should be reported in sysfs if the system is vulnerable or
    mitigated. Add the necessary CPU feature/bug bits.

    - Extend the existing checks for Meltdowns to determine if the system is
    vulnerable. All CPUs which are not vulnerable to Meltdown are also not
    vulnerable to L1TF

    - Check for 32bit non PAE and emit a warning as there is no practical way
    for mitigation due to the limited physical address bits

    - If the system has more than MAX_PA/2 physical memory the invert page
    workarounds don't protect the system against the L1TF attack anymore,
    because an inverted physical address will also point to valid
    memory. Print a warning in this case and report that the system is
    vulnerable.

    Add a function which returns the PFN limit for the L1TF mitigation, which
    will be used in follow up patches for sanity and range checks.

    [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]

    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Acked-by: Dave Hansen

    Andi Kleen
     

03 May, 2018

1 commit

  • Add the sysfs file for the new vulerability. It does not do much except
    show the words 'Vulnerable' for recent x86 cores.

    Intel cores prior to family 6 are known not to be vulnerable, and so are
    some Atoms and some Xeon Phi.

    It assumes that older Cyrix, Centaur, etc. cores are immune.

    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Borislav Petkov
    Reviewed-by: Ingo Molnar

    Konrad Rzeszutek Wilk
     

15 Mar, 2018

1 commit


02 Feb, 2018

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the set of "big" driver core patches for 4.16-rc1.

    The majority of the work here is in the firmware subsystem, with
    reworks to try to attempt to make the code easier to handle in the
    long run, but no functional change. There's also some tree-wide sysfs
    attribute fixups with lots of acks from the various subsystem
    maintainers, as well as a handful of other normal fixes and changes.

    And finally, some license cleanups for the driver core and sysfs code.

    All have been in linux-next for a while with no reported issues"

    * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
    device property: Define type of PROPERTY_ENRTY_*() macros
    device property: Reuse property_entry_free_data()
    device property: Move property_entry_free_data() upper
    firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
    firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
    USB: serial: keyspan: Drop firmware Kconfig options
    sysfs: remove DEBUG defines
    sysfs: use SPDX identifiers
    drivers: base: add coredump driver ops
    sysfs: add attribute specification for /sysfs/devices/.../coredump
    test_firmware: fix missing unlock on error in config_num_requests_store()
    test_firmware: make local symbol test_fw_config static
    sysfs: turn WARN() into pr_warn()
    firmware: Fix a typo in fallback-mechanisms.rst
    treewide: Use DEVICE_ATTR_WO
    treewide: Use DEVICE_ATTR_RO
    treewide: Use DEVICE_ATTR_RW
    sysfs.h: Use octal permissions
    component: add debugfs support
    bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
    ...

    Linus Torvalds
     

15 Jan, 2018

1 commit

  • Pull x86 pti updates from Thomas Gleixner:
    "This contains:

    - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
    disabled. This seems to cause issues on a virtual machine at least
    and is incorrect according to the AMD manual.

    - a PTI bugfix which disables the perf BTS facility if PTI is
    enabled. The BTS AUX buffer is not globally visible and causes the
    CPU to fault when the mapping disappears on switching CR3 to user
    space. A full fix which restores BTS on PTI is non trivial and will
    be worked on.

    - PTI bugfixes for EFI and trusted boot which make sure that the user
    space visible page table entries have the NX bit cleared

    - removal of dead code in the PTI pagetable setup functions

    - add PTI documentation

    - add a selftest for vsyscall to verify that the kernel actually
    implements what it advertises.

    - a sysfs interface to expose vulnerability and mitigation
    information so there is a coherent way for users to retrieve the
    status.

    - the initial spectre_v2 mitigations, aka retpoline:

    + The necessary ASM thunk and compiler support

    + The ASM variants of retpoline and the conversion of affected ASM
    code

    + Make LFENCE serializing on AMD so it can be used as speculation
    trap

    + The RSB fill after vmexit

    - initial objtool support for retpoline

    As I said in the status mail this is the most of the set of patches
    which should go into 4.15 except two straight forward patches still on
    hold:

    - the retpoline add on of LFENCE which waits for ACKs

    - the RSB fill after context switch

    Both should be ready to go early next week and with that we'll have
    covered the major holes of spectre_v2 and go back to normality"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
    x86,perf: Disable intel_bts when PTI
    security/Kconfig: Correct the Documentation reference for PTI
    x86/pti: Fix !PCID and sanitize defines
    selftests/x86: Add test_vsyscall
    x86/retpoline: Fill return stack buffer on vmexit
    x86/retpoline/irq32: Convert assembler indirect jumps
    x86/retpoline/checksum32: Convert assembler indirect jumps
    x86/retpoline/xen: Convert Xen hypercall indirect jumps
    x86/retpoline/hyperv: Convert assembler indirect jumps
    x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
    x86/retpoline/entry: Convert entry assembler indirect jumps
    x86/retpoline/crypto: Convert crypto assembler indirect jumps
    x86/spectre: Add boot time option to select Spectre v2 mitigation
    x86/retpoline: Add initial retpoline support
    objtool: Allow alternatives to be ignored
    objtool: Detect jumps to retpoline thunks
    x86/pti: Make unpoison of pgd for trusted boot work for real
    x86/alternatives: Fix optimize_nops() checking
    sysfs/cpu: Fix typos in vulnerability documentation
    x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
    ...

    Linus Torvalds
     

08 Jan, 2018

1 commit

  • As the meltdown/spectre problem affects several CPU architectures, it makes
    sense to have common way to express whether a system is affected by a
    particular vulnerability or not. If affected the way to express the
    mitigation should be common as well.

    Create /sys/devices/system/cpu/vulnerabilities folder and files for
    meltdown, spectre_v1 and spectre_v2.

    Allow architectures to override the show function.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: Dave Hansen
    Cc: Linus Torvalds
    Cc: Borislav Petkov
    Cc: David Woodhouse
    Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de

    Thomas Gleixner
     

08 Dec, 2017

1 commit

  • It's good to have SPDX identifiers in all files to make it easier to
    audit the kernel tree for correct licenses.

    Update the driver core files files with the correct SPDX license
    identifier based on the license text in the file itself. The SPDX
    identifier is a legally binding shorthand, which can be used instead of
    the full boiler plate text.

    This work is based on a script and data from Thomas Gleixner, Philippe
    Ombredanne, and Kate Stewart.

    Cc: Johannes Berg
    Cc: "Luis R. Rodriguez"
    Cc: William Breathitt Gray
    Cc: Thomas Gleixner
    Cc: Kate Stewart
    Cc: Philippe Ombredanne
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

14 Nov, 2017

1 commit

  • Pull power management updates from Rafael Wysocki:
    "There are no real big ticket items here this time.

    The most noticeable change is probably the relocation of the OPP
    (Operating Performance Points) framework to its own directory under
    drivers/ as it has grown big enough for that. Also Viresh is now going
    to maintain it and send pull requests for it to me, so you will see
    this change in the git history going forward (but still not right
    now).

    Another noticeable set of changes is the modifications of the PM core,
    the PCI subsystem and the ACPI PM domain to allow of more integration
    between system-wide suspend/resume and runtime PM. For now it's just a
    way to avoid resuming devices from runtime suspend unnecessarily
    during system suspend (if the driver sets a flag to indicate its
    readiness for that) and in the works is an analogous mechanism to
    allow devices to stay suspended after system resume.

    In addition to that, we have some changes related to supporting
    frequency-invariant CPU utilization metrics in the scheduler and in
    the schedutil cpufreq governor on ARM and changes to add support for
    device performance states to the generic power domains (genpd)
    framework.

    The rest is mostly fixes and cleanups of various sorts.

    Specifics:

    - Relocate the OPP (Operating Performance Points) framework to its
    own directory under drivers/ and add support for power domain
    performance states to it (Viresh Kumar).

    - Modify the PM core, the PCI bus type and the ACPI PM domain to
    support power management driver flags allowing device drivers to
    specify their capabilities and preferences regarding the handling
    of devices with enabled runtime PM during system suspend/resume and
    clean up that code somewhat (Rafael Wysocki, Ulf Hansson).

    - Add frequency-invariant accounting support to the task scheduler on
    ARM and ARM64 (Dietmar Eggemann).

    - Fix PM QoS device resume latency framework to prevent "no
    restriction" requests from overriding requests with specific
    requirements and drop the confusing PM_QOS_FLAG_REMOTE_WAKEUP
    device PM QoS flag (Rafael Wysocki).

    - Drop legacy class suspend/resume operations from the PM core and
    drop legacy bus type suspend and resume callbacks from ARM/locomo
    (Rafael Wysocki).

    - Add min/max frequency support to devfreq and clean it up somewhat
    (Chanwoo Choi).

    - Rework wakeup support in the generic power domains (genpd)
    framework and update some of its users accordingly (Geert
    Uytterhoeven).

    - Convert timers in the PM core to use timer_setup() (Kees Cook).

    - Add support for exposing the SLP_S0 (Low Power S0 Idle) residency
    counter based on the LPIT ACPI table on Intel platforms (Srinivas
    Pandruvada).

    - Add per-CPU PM QoS resume latency support to the ladder cpuidle
    governor (Ramesh Thomas).

    - Fix a deadlock between the wakeup notify handler and the notifier
    removal in the ACPI core (Ville Syrjälä).

    - Fix a cpufreq schedutil governor issue causing it to use stale
    cached frequency values sometimes (Viresh Kumar).

    - Fix an issue in the system suspend core support code causing wakeup
    events detection to fail in some cases (Rajat Jain).

    - Fix the generic power domains (genpd) framework to prevent the PM
    core from using the direct-complete optimization with it as that is
    guaranteed to fail (Ulf Hansson).

    - Fix a minor issue in the cpuidle core and clean it up a bit (Gaurav
    Jindal, Nicholas Piggin).

    - Fix and clean up the intel_idle and ARM cpuidle drivers (Jason
    Baron, Len Brown, Leo Yan).

    - Fix a couple of minor issues in the OPP framework and clean it up
    (Arvind Yadav, Fabio Estevam, Sudeep Holla, Tobias Jordan).

    - Fix and clean up some cpufreq drivers and fix a minor issue in the
    cpufreq statistics code (Arvind Yadav, Bhumika Goyal, Fabio
    Estevam, Gautham Shenoy, Gustavo Silva, Marek Szyprowski, Masahiro
    Yamada, Robert Jarzmik, Zumeng Chen).

    - Fix minor issues in the system suspend and hibernation core, in
    power management documentation and in the AVS (Adaptive Voltage
    Scaling) framework (Helge Deller, Himanshu Jha, Joe Perches, Rafael
    Wysocki).

    - Fix some issues in the cpupower utility and document that Shuah
    Khan is going to maintain it going forward (Prarit Bhargava, Shuah
    Khan)"

    * tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (88 commits)
    tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore
    tools/power/cpupower: Add 64 bit library detection
    intel_idle: Graceful probe failure when MWAIT is disabled
    cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
    freezer: Fix typo in freezable_schedule_timeout() comment
    PM / s2idle: Clear the events_check_enabled flag
    cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
    cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
    cpufreq: arm_big_little: make function arguments and structure pointer const
    cpuidle: Avoid assignment in if () argument
    cpuidle: Clean up cpuidle_enable_device() error handling a bit
    ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock
    PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
    cpuidle: ladder: Add per CPU PM QoS resume latency support
    PM / QoS: Fix device resume latency framework
    PM / domains: Rework governor code to be more consistent
    PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
    soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
    soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
    ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
    ...

    Linus Torvalds
     

08 Nov, 2017

1 commit

  • The special value of 0 for device resume latency PM QoS means
    "no restriction", but there are two problems with that.

    First, device resume latency PM QoS requests with 0 as the
    value are always put in front of requests with positive
    values in the priority lists used internally by the PM QoS
    framework, causing 0 to be chosen as an effective constraint
    value. However, that 0 is then interpreted as "no restriction"
    effectively overriding the other requests with specific
    restrictions which is incorrect.

    Second, the users of device resume latency PM QoS have no
    way to specify that *any* resume latency at all should be
    avoided, which is an artificial limitation in general.

    To address these issues, modify device resume latency PM QoS to
    use S32_MAX as the "no constraint" value and 0 as the "no
    latency at all" one and rework its users (the cpuidle menu
    governor, the genpd QoS governor and the runtime PM framework)
    to follow these changes.

    Also add a special "n/a" value to the corresponding user space I/F
    to allow user space to indicate that it cannot accept any resume
    latencies at all for the given device.

    Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints)
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
    Reported-by: Reinette Chatre
    Signed-off-by: Rafael J. Wysocki
    Tested-by: Reinette Chatre
    Tested-by: Geert Uytterhoeven
    Tested-by: Tero Kristo
    Reviewed-by: Ramesh Thomas

    Rafael J. Wysocki
     

01 Nov, 2017

1 commit

  • This reverts commit 0cc2b4e5a020 (PM / QoS: Fix device resume latency PM
    QoS) as it introduced regressions on multiple systems and the fix-up
    in commit 2a9a86d5c813 (PM / QoS: Fix default runtime_pm device resume
    latency) does not address all of them.

    The original problem that commit 0cc2b4e5a020 was attempting to fix
    will be addressed later.

    Fixes: 0cc2b4e5a020 (PM / QoS: Fix device resume latency PM QoS)
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

27 Oct, 2017

1 commit

  • We want to centralize the isolation features, to be done by the housekeeping
    subsystem and scheduler domain isolation is a significant part of it.

    No intended behaviour change, we just reuse the housekeeping cpumask
    and core code.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-11-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

24 Oct, 2017

1 commit

  • The special value of 0 for device resume latency PM QoS means
    "no restriction", but there are two problems with that.

    First, device resume latency PM QoS requests with 0 as the
    value are always put in front of requests with positive
    values in the priority lists used internally by the PM QoS
    framework, causing 0 to be chosen as an effective constraint
    value. However, that 0 is then interpreted as "no restriction"
    effectively overriding the other requests with specific
    restrictions which is incorrect.

    Second, the users of device resume latency PM QoS have no
    way to specify that *any* resume latency at all should be
    avoided, which is an artificial limitation in general.

    To address these issues, modify device resume latency PM QoS to
    use S32_MAX as the "no constraint" value and 0 as the "no
    latency at all" one and rework its users (the cpuidle menu
    governor, the genpd QoS governor and the runtime PM framework)
    to follow these changes.

    Also add a special "n/a" value to the corresponding user space I/F
    to allow user space to indicate that it cannot accept any resume
    latencies at all for the given device.

    Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints)
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
    Reported-by: Reinette Chatre
    Tested-by: Reinette Chatre
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Alex Shi
    Cc: All applicable

    Rafael J. Wysocki
     

09 Sep, 2017

1 commit

  • First, number of CPUs can't be negative number.

    Second, different signnnedness leads to suboptimal code in the following
    cases:

    1)
    kmalloc(nr_cpu_ids * sizeof(X));

    "int" has to be sign extended to size_t.

    2)
    while (loff_t *pos < nr_cpu_ids)

    MOVSXD is 1 byte longed than the same MOV.

    Other cases exist as well. Basically compiler is told that nr_cpu_ids
    can't be negative which can't be deduced if it is "int".

    Code savings on allyesconfig kernel: -3KB

    add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
    function old new delta
    coretemp_cpu_online 450 512 +62
    rcu_init_one 1234 1272 +38
    pci_device_probe 374 399 +25

    ...

    pgdat_reclaimable_pages 628 556 -72
    select_fallback_rq 446 369 -77
    task_numa_find_cpu 1923 1807 -116

    Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

30 Jan, 2017

1 commit

  • The cpu-dma PM QoS constraint impacts all the cpus in the system. There
    is no way to let the user to choose a PM QoS constraint per cpu.

    The following patch exposes to the userspace a per cpu based sysfs file
    in order to let the userspace to change the value of the PM QoS latency
    constraint.

    This change is inoperative in its form and the cpuidle governors have to
    take into account the per cpu latency constraint in addition to the
    global cpu-dma latency constraint in order to operate properly.

    BTW
    The pm_qos_resume_latency usage defined in
    Documentation/ABI/testing/sysfs-devices-power
    The /sys/devices/.../power/pm_qos_resume_latency_us attribute
    contains the PM QoS resume latency limit for the given device,
    which is the maximum allowed time it can take to resume the
    device, after it has been suspended at run time, from a resume
    request to the moment the device will be ready to process I/O,
    in microseconds. If it is equal to 0, however, this means that
    the PM QoS resume latency may be arbitrary.

    Signed-off-by: Alex Shi
    Signed-off-by: Rafael J. Wysocki

    Alex Shi
     

31 Aug, 2016

1 commit

  • This patch could reduce one branch in this function. Also
    make the code more readble.

    Signed-off-by: Alex Shi
    Acked-by: Daniel Lezcano
    To: linux-kernel@vger.kernel.org
    To: Greg Kroah-Hartman
    Cc: linux-pm@vger.kernel.org
    Cc: Ulf Hansson
    Cc: Daniel Lezcano
    Signed-off-by: Greg Kroah-Hartman

    Alex Shi
     

21 Jan, 2016

1 commit

  • The only user of the lvalue-ness of the cpu_*_mask variables is in
    drivers/base/cpu.c, and that is mostly a work-around for the fact that not
    even const variables can be used in static initialization. Now that the
    underlying struct cpumasks are exposed we can take their address.

    Signed-off-by: Rasmus Villemoes
    Acked-by: Rusty Russell
    Acked-by: Greg Kroah-Hartman
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

06 Aug, 2015

1 commit

  • In commit 0db0628d9012 ("kernel: delete __cpuinit usage from all core
    kernel files") cpu_up() lost its __cpuinit annotation, vanishing the
    need for cpu_subsys_online() to have a __ref annotation. Just drop it
    to be able to catch real section mismatches in the future.

    Signed-off-by: Mathias Krause
    Cc: Paul Gortmaker
    Signed-off-by: Greg Kroah-Hartman

    Mathias Krause
     

20 May, 2015

2 commits

  • Currently there is no way to query which CPUs are in nohz_full
    mode from userspace.

    Export the CPU list running in nohz_full mode in sysfs,
    specifically in the file /sys/devices/system/cpu/nohz_full

    This can be used by system management tools like libvirt,
    openstack, and others to ensure proper task placement.

    Signed-off-by: Rik van Riel
    Acked-by: Mike Galbraith
    Acked-by: Chris Metcalf
    Signed-off-by: Greg Kroah-Hartman

    Rik van Riel
     
  • After system bootup, there is no totally reliable way to see
    which CPUs are isolated, because the kernel may modify the
    CPUs specified on the isolcpus= kernel command line option.

    Export the CPU list that actually got isolated in sysfs,
    specifically in the file /sys/devices/system/cpu/isolated

    This can be used by system management tools like libvirt,
    openstack, and others to ensure proper placement of tasks.

    Suggested-by: Li Zefan
    Signed-off-by: Rik van Riel
    Acked-by: Mike Galbraith
    Acked-by: Chris Metcalf
    Signed-off-by: Greg Kroah-Hartman

    Rik van Riel
     

14 Feb, 2015

1 commit

  • printk and friends can now format bitmaps using '%*pb[l]'. cpumask
    and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
    respectively which can be used to generate the two printf arguments
    necessary to format the specified cpu/nodemask.

    * Line termination only requires one extra space at the end of the
    buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting.

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

08 Nov, 2014

2 commits

  • This patch adds a new function to create per-cpu devices.
    This helps in:
    1. reusing the device infrastructure to create any cpu related
    attributes and corresponding sysfs instead of creating and
    dealing with raw kobjects directly
    2. retaining the legacy path(/sys/devices/system/cpu/..) to support
    existing sysfs ABI
    3. avoiding to create links in the bus directory pointing to the
    device as there would be per-cpu instance of these devices with
    the same name since dev->bus is not populated to cpu_sysbus on
    purpose

    Signed-off-by: Sudeep Holla
    Tested-by: Stephen Boyd
    Cc: Greg Kroah-Hartman
    Cc: David Herrmann
    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Sudeep Holla
     
  • Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap
    to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null
    terminated buffer with newline.

    This patch creates a new helper function cpumap_print_to_pagebuf in
    cpumask.h using newly added bitmap_print_to_pagebuf and consolidates
    most of those sysfs functions using the new helper function.

    Signed-off-by: Sudeep Holla
    Suggested-by: Stephen Boyd
    Tested-by: Stephen Boyd
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Bjorn Helgaas
    Acked-by: Peter Zijlstra (Intel)
    Cc: Greg Kroah-Hartman
    Cc: x86@kernel.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Sudeep Holla
     

19 Feb, 2014

2 commits

  • The x86 CPU feature modalias handling existed before it was reimplemented
    generically. This patch aligns the x86 handling so that it
    (a) reuses some more code that is now generic;
    (b) uses the generic format for the modalias module metadata entry, i.e., it
    now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of
    the 'x86cpu:vendor:VVVV:family:FFFF:model:MMMM:feature:,XXXX,YYYY' that was
    used before.

    Signed-off-by: Ard Biesheuvel
    Acked-by: H. Peter Anvin
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     
  • This patch adds support for advertising optional CPU features over udev
    using the modalias, and for declaring compatibility with/dependency upon
    such a feature in a module.

    The mapping between feature numbers and actual features should be provided
    by the architecture in a file called which exports the
    following functions/macros:
    - cpu_feature(FEAT), a preprocessor macro that maps token FEAT to a
    numeric index;
    - bool cpu_have_feature(n), returning whether this CPU has support for
    feature #n;
    - MAX_CPU_FEATURES, an upper bound for 'n' in the previous function.

    The feature can then be enabled by setting CONFIG_GENERIC_CPU_AUTOPROBE
    for the architecture.

    For instance, a module that registers its module init function using

    module_cpu_feature_match(FEAT_X, module_init_function)

    will be probed automatically when the CPU's support for the 'FEAT_X'
    feature is advertised over udev, and will only allow the module to be
    loaded by hand if the 'FEAT_X' feature is supported.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     

01 Oct, 2013

1 commit

  • cpu_hotplug_driver_lock() serializes CPU online/offline operations
    when ARCH_CPU_PROBE_RELEASE is set. This lock interface is no longer
    necessary with the following reason:

    - lock_device_hotplug() now protects CPU online/offline operations,
    including the probe & release interfaces enabled by
    ARCH_CPU_PROBE_RELEASE. The use of cpu_hotplug_driver_lock() is
    redundant.
    - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
    is defined, which is misleading and is only enabled on powerpc.

    This patch removes the cpu_hotplug_driver_lock() interface. As
    a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
    probe & release interface as intended. There is no functional change
    in this patch.

    Signed-off-by: Toshi Kani
    Reviewed-by: Nathan Fontenot
    Signed-off-by: Rafael J. Wysocki

    Toshi Kani
     

25 Sep, 2013

1 commit

  • lock_device_hotplug[_sysfs]() serializes CPU & Memory online/offline
    and hotplug operations. However, this lock is not held in the debug
    interfaces below that initiate CPU online/offline operations.

    - _debug_hotplug_cpu(), cpu0 hotplug test interface enabled by
    CONFIG_DEBUG_HOTPLUG_CPU0.
    - cpu_probe_store() and cpu_release_store(), cpu hotplug test interface
    enabled by CONFIG_ARCH_CPU_PROBE_RELEASE.

    This patch changes the above interfaces to hold lock_device_hotplug().

    Signed-off-by: Toshi Kani
    Acked-by: Greg Kroah-Hartman
    Acked-by: Yasuaki Ishimatsu
    Signed-off-by: Rafael J. Wysocki

    Toshi Kani