06 Jun, 2018

1 commit

  • Pull power management updates from Rafael Wysocki:
    "These include a significant update of the generic power domains
    (genpd) and Operating Performance Points (OPP) frameworks, mostly
    related to the introduction of power domain performance levels,
    cpufreq updates (new driver for Qualcomm Kryo processors, updates of
    the existing drivers, some core fixes, schedutil governor
    improvements), PCI power management fixes, ACPI workaround for
    EC-based wakeup events handling on resume from suspend-to-idle, and
    major updates of the turbostat and pm-graph utilities.

    Specifics:

    - Introduce power domain performance levels into the the generic
    power domains (genpd) and Operating Performance Points (OPP)
    frameworks (Viresh Kumar, Rajendra Nayak, Dan Carpenter).

    - Fix two issues in the runtime PM framework related to the
    initialization and removal of devices using device links (Ulf
    Hansson).

    - Clean up the initialization of drivers for devices in PM domains
    (Ulf Hansson, Geert Uytterhoeven).

    - Fix a cpufreq core issue related to the policy sysfs interface
    causing CPU online to fail for CPUs sharing one cpufreq policy in
    some situations (Tao Wang).

    - Make it possible to use platform-specific suspend/resume hooks in
    the cpufreq-dt driver and make the Armada 37xx DVFS use that
    feature (Viresh Kumar, Miquel Raynal).

    - Optimize policy transition notifications in cpufreq (Viresh Kumar).

    - Improve the iowait boost mechanism in the schedutil cpufreq
    governor (Patrick Bellasi).

    - Improve the handling of deferred frequency updates in the schedutil
    cpufreq governor (Joel Fernandes, Dietmar Eggemann, Rafael Wysocki,
    Viresh Kumar).

    - Add a new cpufreq driver for Qualcomm Kryo (Ilia Lin).

    - Fix and clean up some cpufreq drivers (Colin Ian King, Dmitry
    Osipenko, Doug Smythies, Luc Van Oostenryck, Simon Horman, Viresh
    Kumar).

    - Fix the handling of PCI devices with the DPM_SMART_SUSPEND flag set
    and update stale comments in the PCI core PM code (Rafael Wysocki).

    - Work around an issue related to the handling of EC-based wakeup
    events in the ACPI PM core during resume from suspend-to-idle if
    the EC has been put into the low-power mode (Rafael Wysocki).

    - Improve the handling of wakeup source objects in the PM core (Doug
    Berger, Mahendran Ganesh, Rafael Wysocki).

    - Update the driver core to prevent deferred probe from breaking
    suspend/resume ordering (Feng Kan).

    - Clean up the PM core somewhat (Bjorn Helgaas, Ulf Hansson, Rafael
    Wysocki).

    - Make the core suspend/resume code and cpufreq support the RT patch
    (Sebastian Andrzej Siewior, Thomas Gleixner).

    - Consolidate the PM QoS handling in cpuidle governors (Rafael
    Wysocki).

    - Fix a possible crash in the hibernation core (Tetsuo Handa).

    - Update the rockchip-io Adaptive Voltage Scaling (AVS) driver (David
    Wu).

    - Update the turbostat utility (fixes, cleanups, new CPU IDs, new
    command line options, built-in "Low Power Idle" counters support,
    new POLL and POLL% columns) and add an entry for it to MAINTAINERS
    (Len Brown, Artem Bityutskiy, Chen Yu, Laura Abbott, Matt Turner,
    Prarit Bhargava, Srinivas Pandruvada).

    - Update the pm-graph to version 5.1 (Todd Brandt).

    - Update the intel_pstate_tracer utility (Doug Smythies)"

    * tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (128 commits)
    tools/power turbostat: update version number
    tools/power turbostat: Add Node in output
    tools/power turbostat: add node information into turbostat calculations
    tools/power turbostat: remove num_ from cpu_topology struct
    tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
    tools/power turbostat: track thread ID in cpu_topology
    tools/power turbostat: Calculate additional node information for a package
    tools/power turbostat: Fix node and siblings lookup data
    tools/power turbostat: set max_num_cpus equal to the cpumask length
    tools/power turbostat: if --num_iterations, print for specific number of iterations
    tools/power turbostat: Add Cannon Lake support
    tools/power turbostat: delete duplicate #defines
    x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
    tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
    tools/power turbostat: add POLL and POLL% column
    tools/power turbostat: Fix --hide Pk%pc10
    tools/power turbostat: Build-in "Low Power Idle" counters support
    tools/power turbostat: Don't make man pages executable
    tools/power turbostat: remove blank lines
    tools/power turbostat: a small C-states dump readability immprovement
    ...

    Linus Torvalds
     

05 Jun, 2018

3 commits

  • Pull time/Y2038 updates from Thomas Gleixner:

    - Consolidate SySV IPC UAPI headers

    - Convert SySV IPC to the new COMPAT_32BIT_TIME mechanism

    - Cleanup the core interfaces and standardize on the ktime_get_* naming
    convention.

    - Convert the X86 platform ops to timespec64

    - Remove the ugly temporary timespec64 hack

    * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    x86: Convert x86_platform_ops to timespec64
    timekeeping: Add more coarse clocktai/boottime interfaces
    timekeeping: Add ktime_get_coarse_with_offset
    timekeeping: Standardize on ktime_get_*() naming
    timekeeping: Clean up ktime_get_real_ts64
    timekeeping: Remove timespec64 hack
    y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop
    y2038: ipc: Enable COMPAT_32BIT_TIME
    y2038: ipc: Use __kernel_timespec
    y2038: ipc: Report long times to user space
    y2038: ipc: Use ktime_get_real_seconds consistently
    y2038: xtensa: Extend sysvipc data structures
    y2038: powerpc: Extend sysvipc data structures
    y2038: sparc: Extend sysvipc data structures
    y2038: parisc: Extend sysvipc data structures
    y2038: mips: Extend sysvipc data structures
    y2038: arm64: Extend sysvipc compat data structures
    y2038: s390: Remove unneeded ipc uapi header files
    y2038: ia64: Remove unneeded ipc uapi header files
    y2038: alpha: Remove unneeded ipc uapi header files
    ...

    Linus Torvalds
     
  • Pull timers and timekeeping updates from Thomas Gleixner:

    - Core infrastucture work for Y2038 to address the COMPAT interfaces:

    + Add a new Y2038 safe __kernel_timespec and use it in the core
    code

    + Introduce config switches which allow to control the various
    compat mechanisms

    + Use the new config switch in the posix timer code to control the
    32bit compat syscall implementation.

    - Prevent bogus selection of CPU local clocksources which causes an
    endless reselection loop

    - Remove the extra kthread in the clocksource code which has no value
    and just adds another level of indirection

    - The usual bunch of trivial updates, cleanups and fixlets all over the
    place

    - More SPDX conversions

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    clocksource/drivers/mxs_timer: Switch to SPDX identifier
    clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
    clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
    clocksource/drivers/timer-imx-gpt: Remove outdated file path
    clocksource/drivers/arc_timer: Add comments about locking while read GFRC
    clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
    clocksource/drivers/sprd: Fix Kconfig dependency
    clocksource: Move inline keyword to the beginning of function declarations
    timer_list: Remove unused function pointer typedef
    timers: Adjust a kernel-doc comment
    tick: Prefer a lower rating device only if it's CPU local device
    clocksource: Remove kthread
    time: Change nanosleep to safe __kernel_* types
    time: Change types to new y2038 safe __kernel_* types
    time: Fix get_timespec64() for y2038 safe compat interfaces
    time: Add new y2038 safe __kernel_timespec
    posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
    time: Introduce CONFIG_COMPAT_32BIT_TIME
    time: Introduce CONFIG_64BIT_TIME in architectures
    compat: Enable compat_get/put_timespec64 always
    ...

    Linus Torvalds
     
  • Pull procfs updates from Al Viro:
    "Christoph's proc_create_... cleanups series"

    * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
    xfs, proc: hide unused xfs procfs helpers
    isdn/gigaset: add back gigaset_procinfo assignment
    proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
    tty: replace ->proc_fops with ->proc_show
    ide: replace ->proc_fops with ->proc_show
    ide: remove ide_driver_proc_write
    isdn: replace ->proc_fops with ->proc_show
    atm: switch to proc_create_seq_private
    atm: simplify procfs code
    bluetooth: switch to proc_create_seq_data
    netfilter/x_tables: switch to proc_create_seq_private
    netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
    neigh: switch to proc_create_seq_data
    hostap: switch to proc_create_{seq,single}_data
    bonding: switch to proc_create_seq_data
    rtc/proc: switch to proc_create_single_data
    drbd: switch to proc_create_single
    resource: switch to proc_create_seq_data
    staging/rtl8192u: simplify procfs code
    jfs: simplify procfs code
    ...

    Linus Torvalds
     

27 May, 2018

1 commit

  • timekeeping suspend/resume calls read_persistent_clock() which takes
    rtc_lock. That results in might sleep warnings because at that point
    we run with interrupts disabled.

    We cannot convert rtc_lock to a raw spinlock as that would trigger
    other might sleep warnings.

    As a workaround we disable the might sleep warnings by setting
    system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and
    restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock
    contention because hibernate / suspend to RAM is single-CPU at this
    point.

    In s2idle's case the system_state is set to SYSTEM_SUSPEND before
    timekeeping_suspend() which is invoked by the last CPU. In the resume
    case it set back to SYSTEM_RUNNING after timekeeping_resume() which is
    invoked by the first CPU in the resume case. The other CPUs will block
    on tick_freeze_lock.

    Signed-off-by: Thomas Gleixner
    [bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()]
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Rafael J. Wysocki

    Thomas Gleixner
     

19 May, 2018

5 commits

  • I have run into a couple of drivers using current_kernel_time()
    suffering from the y2038 problem, and they could be converted
    to using ktime_t, but don't have interfaces that skip the nanosecond
    calculation at the moment.

    This introduces ktime_get_coarse_with_offset() as a simpler
    variant of ktime_get_with_offset(), and adds wrappers for the
    three time domains we support with the existing function.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: Stephen Boyd
    Cc: y2038@lists.linaro.org
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/20180427134016.2525989-5-arnd@arndb.de

    Arnd Bergmann
     
  • The current_kernel_time64, get_monotonic_coarse64, getrawmonotonic64,
    get_monotonic_boottime64 and timekeeping_clocktai64 interfaces have
    rather inconsistent naming, and they differ in the calling conventions
    by passing the output either by reference or as a return value.

    Rename them to ktime_get_coarse_real_ts64, ktime_get_coarse_ts64,
    ktime_get_raw_ts64, ktime_get_boottime_ts64 and ktime_get_clocktai_ts64
    respectively, and provide the interfaces with macros or inline
    functions as needed.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: Stephen Boyd
    Cc: y2038@lists.linaro.org
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/20180427134016.2525989-4-arnd@arndb.de

    Arnd Bergmann
     
  • In a move to make ktime_get_*() the preferred driver interface into the
    timekeeping code, sanitizes ktime_get_real_ts64() to be a proper exported
    symbol rather than an alias for getnstimeofday64().

    The internal __getnstimeofday64() is no longer used, so remove that
    and merge it into ktime_get_real_ts64().

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: Stephen Boyd
    Cc: y2038@lists.linaro.org
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/20180427134016.2525989-3-arnd@arndb.de

    Arnd Bergmann
     
  • At this point, we have converted most of the kernel to use timespec64
    consistently in place of timespec, so it seems it's time to make
    timespec64 the native structure and define timespec in terms of that
    one on 64-bit architectures.

    Starting with gcc-5, the compiler can completely optimize away the
    timespec_to_timespec64 and timespec64_to_timespec functions on 64-bit
    architectures. With older compilers, we introduce a couple of extra
    copies of local variables, but those are easily avoided by using
    the timespec64 based interfaces consistently, as we do in most of the
    important code paths already.

    The main upside of removing the hack is that printing the tv_sec
    field of a timespec64 structure can now use the %lld format
    string on all architectures without a cast to time64_t. Without
    this patch, the field is a 'long' type and would have to be printed
    using %ld on 64-bit architectures.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: Stephen Boyd
    Cc: y2038@lists.linaro.org
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/20180427134016.2525989-2-arnd@arndb.de

    Arnd Bergmann
     
  • Merge upstream to pick up changes on which pending patches depend on.

    Thomas Gleixner
     

17 May, 2018

1 commit

  • The inline keyword was not at the beginning of the function declarations.
    Fix the following warnings triggered when using W=1:

    kernel/time/clocksource.c:456:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
    kernel/time/clocksource.c:457:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]

    Signed-off-by: Mathieu Malaterre
    Signed-off-by: Thomas Gleixner
    Cc: Stephen Boyd
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/20180516195943.31924-1-malat@debian.org

    Mathieu Malaterre
     

16 May, 2018

2 commits

  • Variant of proc_create_data that directly take a struct seq_operations
    argument + a private state size and drastically reduces the boilerplate
    code in the callers.

    All trivial callers converted over.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • for_each_cpu() unintuitively reports CPU0 as set independent of the actual
    cpumask content on UP kernels. This causes an unexpected PIT interrupt
    storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
    a result, the virtual machine can suffer from a strange random delay of 1~20
    minutes during boot-up, and sometimes it can hang forever.

    Protect if by checking whether the cpumask is empty before entering the
    for_each_cpu() loop.

    [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]

    Signed-off-by: Dexuan Cui
    Signed-off-by: Thomas Gleixner
    Cc: Josh Poulson
    Cc: "Michael Kelley (EOSG)"
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: stable@vger.kernel.org
    Cc: Rakib Mullick
    Cc: Jork Loeser
    Cc: Greg Kroah-Hartman
    Cc: Andrew Morton
    Cc: KY Srinivasan
    Cc: Linus Torvalds
    Cc: Alexey Dobriyan
    Cc: Dmitry Vyukov
    Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
    Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM

    Dexuan Cui
     

13 May, 2018

3 commits

  • Remove the 'printf_fn_t' typedef as it is not used.

    Signed-off-by: Chen Lin
    Signed-off-by: Thomas Gleixner
    Cc: sboyd@kernel.org
    Cc: john.stultz@linaro.org
    Link: https://lkml.kernel.org/r/1526053649-24229-1-git-send-email-chen45464546@163.com

    Chen Lin
     
  • Those three warnings can easily solved by using :: to indicate a
    code block:

    ./kernel/time/timer.c:1259: WARNING: Unexpected indentation.
    ./kernel/time/timer.c:1261: WARNING: Unexpected indentation.
    ./kernel/time/timer.c:1262: WARNING: Block quote ends without a blank line; unexpected unindent.

    While here, align the lines at the block.

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Thomas Gleixner
    Cc: Jonathan Corbet
    Cc: Stephen Boyd
    Cc: Linux Doc Mailing List
    Cc: Mauro Carvalho Chehab
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/f02e6a0ce27f3b5e33415d92d07a40598904b3ee.1525684985.git.mchehab%2Bsamsung@kernel.org

    Mauro Carvalho Chehab
     
  • Checking the equality of cpumask for both new and old tick device doesn't
    ensure that it's CPU local device. This will cause issue if a low rating
    clockevent tick device is registered first followed by the registration
    of higher rating clockevent tick device.

    In such case, clockevents_released list will never get emptied as both
    the devices get selected as preferred one and we will loop forever in
    clockevents_notify_released.

    Signed-off-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/1525881728-4858-1-git-send-email-sudeep.holla@arm.com

    Sudeep Holla
     

02 May, 2018

6 commits

  • The clocksource watchdog uses a work to spawn a kthread to run the
    watchdog. That is about as silly as it sounds, run the watchdog
    directly from the work.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Rafael J. Wysocki
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: diego.viola@gmail.com
    Cc: rui.zhang@intel.com
    Link: https://lkml.kernel.org/r/20180430100344.713862818@infradead.org

    Peter Zijlstra
     
  • Pick up urgent fixes to apply dependent cleanup patch

    Thomas Gleixner
     
  • AFAICS the hotplug code no longer uses this function.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Rafael J. Wysocki
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: diego.viola@gmail.com
    Cc: rui.zhang@intel.com
    Link: https://lkml.kernel.org/r/20180430100344.656525644@infradead.org

    Peter Zijlstra
     
  • When a registered clocksource gets marked unstable the watchdog_kthread
    will de-rate and re-select the clocksource. Ensure it also de-rates
    when getting called on an unregistered clocksource.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Rafael J. Wysocki
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: diego.viola@gmail.com
    Cc: rui.zhang@intel.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180430100344.594904898@infradead.org

    Peter Zijlstra
     
  • A number of places relies on list_empty(&cs->wd_list), however the
    list_head does not get initialized. Do so upon registration, such that
    thereafter it is possible to rely on list_empty() correctly reflecting
    the list membership status.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Diego Viola
    Reviewed-by: Rafael J. Wysocki
    Cc: stable@vger.kernel.org
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org

    Peter Zijlstra
     
  • Because of how the code flips between tsc-early and tsc clocksources
    it might need to mark one or both unstable. The current code in
    mark_tsc_unstable() only worked because previously it registered the
    tsc clocksource once and then never touched it.

    Since it now unregisters the tsc-early clocksource, it needs to know
    if a clocksource got unregistered and the current cs->mult test
    doesn't work for that. Instead use list_empty(&cs->list) to test for
    registration.

    Furthermore, since clocksource_mark_unstable() needs to place the cs
    on the wd_list, it links the cs->list and cs->wd_list serialization.
    It must not see a clocsource registered (!empty cs->list) but already
    past dequeue_watchdog(). So place {en,de}queue{,_watchdog}() under the
    same lock.

    Provided cs->list is initialized to empty, this then allows us to
    unconditionally use clocksource_mark_unstable(), regardless of the
    registration state.

    Fixes: aa83c45762a2 ("x86/tsc: Introduce early tsc clocksource")
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Rafael J. Wysocki
    Tested-by: Diego Viola
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: diego.viola@gmail.com
    Cc: rui.zhang@intel.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180502135312.GS12217@hirez.programming.kicks-ass.net

    Peter Zijlstra
     

26 Apr, 2018

2 commits

  • Revert commits

    92af4dcb4e1c ("tracing: Unify the "boot" and "mono" tracing clocks")
    127bfa5f4342 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
    7250a4047aa6 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
    d6c7270e913d ("timekeeping: Remove boot time specific code")
    f2d6fdbfd238 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
    d6ed449afdb3 ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
    72199320d49d ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")

    As stated in the pull request for the unification of CLOCK_MONOTONIC and
    CLOCK_BOOTTIME, it was clear that we might have to revert the change.

    As reported by several folks systemd and other applications rely on the
    documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
    changes. After resume daemons time out and other timeout related issues are
    observed. Rafael compiled this list:

    * systemd kills daemons on resume, after >WatchdogSec seconds
    of suspending (Genki Sky). [Verified that that's because systemd uses
    CLOCK_MONOTONIC and expects it to not include the suspend time.]

    * systemd-journald misbehaves after resume:
    systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
    corrupted or uncleanly shut down, renaming and replacing.
    (Mike Galbraith).

    * NetworkManager reports "networking disabled" and networking is broken
    after resume 50% of the time (Pavel). [May be because of systemd.]

    * MATE desktop dims the display and starts the screensaver right after
    system resume (Pavel).

    * Full system hang during resume (me). [May be due to systemd or NM or both.]

    That happens on debian and open suse systems.

    It's sad, that these problems were neither catched in -next nor by those
    folks who expressed interest in this change.

    Reported-by: Rafael J. Wysocki
    Reported-by: Genki Sky ,
    Reported-by: Pavel Machek
    Signed-off-by: Thomas Gleixner
    Cc: Dmitry Torokhov
    Cc: John Stultz
    Cc: Jonathan Corbet
    Cc: Kevin Easton
    Cc: Linus Torvalds
    Cc: Mark Salyzyn
    Cc: Michael Kerrisk
    Cc: Peter Zijlstra
    Cc: Petr Mladek
    Cc: Prarit Bhargava
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt

    Thomas Gleixner
     
  • Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
    did great debugging, which provided enough context to decode the problem.

    CPU 3 CPU 2

    idle
    start sched_timer expires = 712171000000
    queue->next = sched_timer
    start rdmavt timer. expires = 712172915662
    lock(baseof(CPU3))
    tick_nohz_stop_tick()
    tick = 716767000000 timerqueue_add(tmr)

    hrtimer_set_expires(sched_timer, tick);
    sched_timer->expires = 716767000000 expires < queue->next->expires)
    hrtimer_start(sched_timer) queue->next = tmr;
    lock(baseof(CPU3))
    unlock(baseof(CPU3))
    timerqueue_remove()
    timerqueue_add()

    ts->sched_timer is queued and queue->next is pointing to it, but then
    ts->sched_timer.expires is modified.

    This not only corrupts the ordering of the timerqueue RB tree, it also
    makes CPU2 see the new expiry time of timerqueue->next->expires when
    checking whether timerqueue->next needs to be updated. So CPU2 sees that
    the rdma timer is earlier than timerqueue->next and sets the rdma timer as
    new next.

    Depending on whether it had also seen the new time at RB tree enqueue, it
    might have queued the rdma timer at the wrong place and then after removing
    the sched_timer the RB tree is completely hosed.

    The problem was introduced with a commit which tried to solve inconsistency
    between the hrtimer in the tick_sched data and the underlying hardware
    clockevent. It split out hrtimer_set_expires() to store the new tick time
    in both the NOHZ and the NOHZ + HIGHRES case, but missed the fact that in
    the NOHZ + HIGHRES case the hrtimer might still be queued.

    Use hrtimer_start(timer, tick...) for the NOHZ + HIGHRES case which sets
    timer->expires after canceling the timer and move the hrtimer_set_expires()
    invocation into the NOHZ only code path which is not affected as it merily
    uses the hrtimer as next event storage so code pathes can be shared with
    the NOHZ + HIGHRES case.

    Fixes: d4af6d933ccf ("nohz: Fix spurious warning when hrtimer and clockevent get out of sync")
    Reported-by: "Wan Kaike"
    Signed-off-by: Thomas Gleixner
    Acked-by: Frederic Weisbecker
    Cc: "Marciniszyn Mike"
    Cc: Anna-Maria Gleixner
    Cc: linux-rdma@vger.kernel.org
    Cc: "Dalessandro Dennis"
    Cc: "Fleck John"
    Cc: stable@vger.kernel.org
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: "Weiny Ira"
    Cc: "linux-rdma@vger.kernel.org"
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804241637390.1679@nanos.tec.linutronix.de
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804242119210.1597@nanos.tec.linutronix.de

    Thomas Gleixner
     

19 Apr, 2018

6 commits

  • Change over clock_nanosleep syscalls to use y2038 safe
    __kernel_timespec times. This will enable changing over
    of these syscalls to use new y2038 safe syscalls when
    the architectures define the CONFIG_64BIT_TIME.

    Note that nanosleep syscall is deprecated and does not have a
    plan for making it y2038 safe. But, the syscall should work as
    before on 64 bit machines and on 32 bit machines, the syscall
    works correctly until y2038 as before using the existing compat
    syscall version. There is no new syscall for supporting 64 bit
    time_t on 32 bit architectures.

    Cc: linux-api@vger.kernel.org
    Signed-off-by: Deepa Dinamani
    Signed-off-by: Arnd Bergmann

    Deepa Dinamani
     
  • Change over clock_settime, clock_gettime and clock_getres
    syscalls to use __kernel_timespec times. This will enable
    changing over of these syscalls to use new y2038 safe syscalls
    when the architectures define the CONFIG_64BIT_TIME.

    Cc: linux-api@vger.kernel.org
    Signed-off-by: Deepa Dinamani
    Signed-off-by: Arnd Bergmann

    Deepa Dinamani
     
  • get/put_timespec64() interfaces will eventually be used for
    conversions between the new y2038 safe struct __kernel_timespec
    and struct timespec64.

    The new y2038 safe syscalls have a common entry for native
    and compat interfaces.
    On compat interfaces, the high order bits of nanoseconds
    should be zeroed out. This is because the application code
    or the libc do not guarantee zeroing of these. If used without
    zeroing, kernel might be at risk of using timespec values
    incorrectly.

    Note that clearing of bits is dependent on CONFIG_64BIT_TIME
    for now. This is until COMPAT_USE_64BIT_TIME has been handled
    correctly. x86 will be the first architecture that will use the
    CONFIG_64BIT_TIME.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Arnd Bergmann

    Deepa Dinamani
     
  • clock_gettime, clock_settime, clock_getres and clock_nanosleep
    compat syscalls are also repurposed to provide backward compatibility
    to support 32 bit time_t on 32 bit systems.

    Note that nanosleep compat syscall will also be treated the same way
    as the above syscalls as it shares common handler functions with
    clock_nanosleep. But, there is no plan to provide y2038 safe solution
    for nanosleep.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Arnd Bergmann

    Deepa Dinamani
     
  • These functions are used in the repurposed compat syscalls
    to provide backward compatibility for using 32 bit time_t
    on 32 bit systems.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Arnd Bergmann

    Deepa Dinamani
     
  • Commit a9445e47d897 ("posix-cpu-timers: Make set_process_cpu_timer()
    more robust") moved the check into the 'if' statement. Unfortunately,
    it did so on the right side of an && which means that it may get short
    circuited and never evaluated. This is easily reproduced with:

    $ cat loop.c
    void main() {
    struct rlimit res;
    /* set the CPU time limit */
    getrlimit(RLIMIT_CPU,&res);
    res.rlim_cur = 2;
    res.rlim_max = 2;
    setrlimit(RLIMIT_CPU,&res);

    while (1);
    }

    Which will hang forever instead of being killed. Fix this by pulling the
    evaluation out of the if statement but checking the return value instead.

    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1568337
    Fixes: a9445e47d897 ("posix-cpu-timers: Make set_process_cpu_timer() more robust")
    Signed-off-by: Laura Abbott
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Cc: "Max R . P . Grossmann"
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/20180417215742.2521-1-labbott@redhat.com

    Laura Abbott
     

17 Apr, 2018

2 commits

  • The __current_kernel_time() function based on 'struct timespec' is no
    longer recommended for new code, and the only user of this function has
    been replaced by commit 6909e29fdefb ("kdb: use __ktime_get_real_seconds
    instead of __current_kernel_time").

    Remove the obsolete interface.

    Signed-off-by: Baolin Wang
    Signed-off-by: Thomas Gleixner
    Cc: arnd@arndb.de
    Cc: sboyd@kernel.org
    Cc: broonie@kernel.org
    Cc: john.stultz@linaro.org
    Link: https://lkml.kernel.org/r/1a9dbea7ee2cda7efe9ed330874075cf17fdbff6.1523596316.git.baolin.wang@linaro.org

    Baolin Wang
     
  • Convert the clockevents driver from old-style printk() to pr_info() and
    pr_cont(), to fix split kernel messages like below:

    Clockevents: could not switch to one-shot mode:
    dummy_timer is not functional.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Thomas Gleixner
    Cc: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/1522942018-14471-1-git-send-email-geert%2Brenesas@glider.be

    Geert Uytterhoeven
     

16 Apr, 2018

1 commit

  • Pull x86 fixes from Thomas Gleixner:
    "A set of fixes and updates for x86:

    - Address a swiotlb regression which was caused by the recent DMA
    rework and made driver fail because dma_direct_supported() returned
    false

    - Fix a signedness bug in the APIC ID validation which caused invalid
    APIC IDs to be detected as valid thereby bloating the CPU possible
    space.

    - Fix inconsisten config dependcy/select magic for the MFD_CS5535
    driver.

    - Fix a corruption of the physical address space bits when encryption
    has reduced the address space and late cpuinfo updates overwrite
    the reduced bit information with the original value.

    - Dominiks syscall rework which consolidates the architecture
    specific syscall functions so all syscalls can be wrapped with the
    same macros. This allows to switch x86/64 to struct pt_regs based
    syscalls. Extend the clearing of user space controlled registers in
    the entry patch to the lower registers"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/apic: Fix signedness bug in APIC ID validity checks
    x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
    x86/olpc: Fix inconsistent MFD_CS5535 configuration
    swiotlb: Use dma_direct_supported() for swiotlb_ops
    syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
    syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
    syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
    syscalls/core, syscalls/x86: Clean up syscall stub naming convention
    syscalls/x86: Extend register clearing on syscall entry to lower registers
    syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
    syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
    syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
    syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
    syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
    x86/syscalls: Don't pointlessly reload the system call number
    x86/mm: Fix documentation of module mapping range with 4-level paging
    x86/cpuid: Switch to 'static const' specifier

    Linus Torvalds
     

13 Apr, 2018

1 commit

  • Pull kdb updates from Jason Wessel:

    - fix 2032 time access issues and new compiler warnings

    - minor regression test cleanup

    - formatting fixes for end user use of kdb

    * tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
    kdb: use memmove instead of overlapping memcpy
    kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts()
    kdb: bl: don't use tab character in output
    kdb: drop newline in unknown command output
    kdb: make "mdr" command repeat
    kdb: use __ktime_get_real_seconds instead of __current_kernel_time
    misc: kgdbts: Display progress of asynchronous tests

    Linus Torvalds
     

11 Apr, 2018

1 commit

  • * pm-cpuidle:
    tick-sched: avoid a maybe-uninitialized warning
    cpuidle: Add definition of residency to sysfs documentation
    time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
    nohz: Avoid duplication of code related to got_idle_tick
    nohz: Gather tick_sched booleans under a common flag field
    cpuidle: menu: Avoid selecting shallow states with stopped tick
    cpuidle: menu: Refine idle state selection for running tick
    sched: idle: Select idle state before stopping the tick
    time: hrtimer: Introduce hrtimer_next_event_without()
    time: tick-sched: Split tick_nohz_stop_sched_tick()
    cpuidle: Return nohz hint from cpuidle_select()
    jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
    sched: idle: Do not stop the tick before cpuidle_idle_call()
    sched: idle: Do not stop the tick upfront in the idle loop
    time: tick-sched: Reorganize idle tick management code

    * pm-qos:
    PM / QoS: mark expected switch fall-throughs

    Rafael J. Wysocki
     

10 Apr, 2018

1 commit

  • The use of bitfields seems to confuse gcc, leading to a false-positive
    warning in all compiler versions:

    kernel/time/tick-sched.c: In function 'tick_nohz_idle_exit':
    kernel/time/tick-sched.c:538:2: error: 'now' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    This introduces a temporary variable to track the flags so gcc
    doesn't have to evaluate twice, eliminating the code path that
    leads to the warning.

    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85301
    Fixes: 1cae544d42d2 ("nohz: Gather tick_sched booleans under a common flag field")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Rafael J. Wysocki

    Arnd Bergmann
     

09 Apr, 2018

4 commits