03 Jul, 2018

10 commits

  • Oleg explains the reason we could hit park+park is that
    smpboot_update_cpumask_percpu_thread()'s

    for_each_cpu_and(cpu, &tmp, cpu_online_mask)
    smpboot_park_kthread();

    turns into:

    for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
    smpboot_park_kthread();

    on UP, ignoring the mask. But since we just completely removed that
    function, this is no longer relevant.

    So revert commit:

    b1f5b378e126 ("kthread: Allow kthread_park() on a parked kthread")

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Now that the sole use of the whole smpboot_*cpumask() API is gone,
    remove it.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Oleg suggested to replace the "watchdog/%u" threads with
    cpu_stop_work. That removes one thread per CPU while at the same time
    fixes softlockup vs SCHED_DEADLINE.

    But more importantly, it does away with the single
    smpboot_update_cpumask_percpu_thread() user, which allows
    cleanups/shrinkage of the smpboot interface.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Gaurav reports that commit:

    85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")

    isn't working for him. Because of the following race:

    > controller Thread CPUHP Thread
    > takedown_cpu
    > kthread_park
    > kthread_parkme
    > Set KTHREAD_SHOULD_PARK
    > smpboot_thread_fn
    > set Task interruptible
    >
    >
    > wake_up_process
    > if (!(p->state & state))
    > goto out;
    >
    > Kthread_parkme
    > SET TASK_PARKED
    > schedule
    > raw_spin_lock(&rq->lock)
    > ttwu_remote
    > waiting for __task_rq_lock
    > context_switch
    >
    > finish_lock_switch
    >
    >
    >
    > Case TASK_PARKED
    > kthread_park_complete
    >
    >
    > SET Running

    Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
    handling is buggered because the TASK_DEAD thing is done with
    preemption disabled, the current code can still complete early on
    preemption :/

    So basically revert that earlier fix and go with a variant of the
    alternative mentioned in the commit. Promote TASK_PARKED to special
    state to avoid the store-store issue on task->state leading to the
    WARN in kthread_unpark() -> __kthread_bind().

    But in addition, add wait_task_inactive() to kthread_park() to ensure
    the task really is PARKED when we return from kthread_park(). This
    avoids the whole kthread still gets migrated nonsense -- although it
    would be really good to get this done differently.

    Reported-by: Gaurav Kohli
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and
    everything happens at cfs_rq level. Currently util_est stays unchanged
    in such case and it keeps accounting the utilization of throttled tasks.
    This can somewhat make sense as we don't dequeue tasks but only throttled
    cfs_rq.

    If a task of another group is enqueued/dequeued and root cfs_rq becomes
    idle during the dequeue, util_est will be cleared whereas it was
    accounting util_est of throttled tasks before. So the behavior of util_est
    is not always the same regarding throttled tasks and depends of side
    activity. Furthermore, util_est will not be updated when the cfs_rq is
    unthrottled as everything happens at cfs_rq level. Main results is that
    util_est will stay null whereas we now have running tasks. We have to wait
    for the next dequeue/enqueue of the previously throttled tasks to get an
    up to date util_est.

    Remove the assumption that cfs_rq's estimated utilization of a CPU is 0
    if there is no running task so the util_est of a task remains until the
    latter is dequeued even if its cfs_rq has been throttled.

    Signed-off-by: Vincent Guittot
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Patrick Bellasi
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
    Link: http://lkml.kernel.org/r/1528972380-16268-1-git-send-email-vincent.guittot@linaro.org
    Signed-off-by: Ingo Molnar

    Vincent Guittot
     
  • When period gets restarted after some idle time, start_cfs_bandwidth()
    doesn't update the expiration information, expire_cfs_rq_runtime() will
    see cfs_rq->runtime_expires smaller than rq clock and go to the clock
    drift logic, wasting needless CPU cycles on the scheduler hot path.

    Update the global expiration in start_cfs_bandwidth() to avoid frequent
    expire_cfs_rq_runtime() calls once a new period begins.

    Signed-off-by: Xunlei Pang
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ben Segall
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20180620101834.24455-2-xlpang@linux.alibaba.com
    Signed-off-by: Ingo Molnar

    Xunlei Pang
     
  • I noticed that cgroup task groups constantly get throttled even
    if they have low CPU usage, this causes some jitters on the response
    time to some of our business containers when enabling CPU quotas.

    It's very simple to reproduce:

    mkdir /sys/fs/cgroup/cpu/test
    cd /sys/fs/cgroup/cpu/test
    echo 100000 > cpu.cfs_quota_us
    echo $$ > tasks

    then repeat:

    cat cpu.stat | grep nr_throttled # nr_throttled will increase steadily

    After some analysis, we found that cfs_rq::runtime_remaining will
    be cleared by expire_cfs_rq_runtime() due to two equal but stale
    "cfs_{b|q}->runtime_expires" after period timer is re-armed.

    The current condition to judge clock drift in expire_cfs_rq_runtime()
    is wrong, the two runtime_expires are actually the same when clock
    drift happens, so this condtion can never hit. The orginal design was
    correctly done by this commit:

    a9cf55b28610 ("sched: Expire invalid runtime")

    ... but was changed to be the current implementation due to its locking bug.

    This patch introduces another way, it adds a new field in both structures
    cfs_rq and cfs_bandwidth to record the expiration update sequence, and
    uses them to figure out if clock drift happens (true if they are equal).

    Signed-off-by: Xunlei Pang
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ben Segall
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
    Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
    Signed-off-by: Ingo Molnar

    Xunlei Pang
     
  • With commit:

    8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")

    the schedutil governor uses rq->rt.rt_nr_running to detect whether an
    RT task is currently running on the CPU and to set frequency to max
    if necessary.

    cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but
    rq->rt.rt_nr_running has not been updated yet when dequeue_top_rt_rq() is
    called so schedutil still considers that an RT task is running when the
    last task is dequeued. The update of rq->rt.rt_nr_running happens later
    in dequeue_rt_stack().

    In fact, we can take advantage of the sequence that the dequeue then
    re-enqueue rt entities when a rt task is enqueued or dequeued;
    As a result enqueue_top_rt_rq() is always called when a task is
    enqueued or dequeued and also when groups are throttled or unthrottled.
    The only place that not use enqueue_top_rt_rq() is when root rt_rq is
    throttled.

    Signed-off-by: Vincent Guittot
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: efault@gmx.de
    Cc: juri.lelli@redhat.com
    Cc: patrick.bellasi@arm.com
    Cc: viresh.kumar@linaro.org
    Fixes: 8f111bc357aa ('cpufreq/schedutil: Rewrite CPUFREQ_RT support')
    Link: http://lkml.kernel.org/r/1530021202-21695-1-git-send-email-vincent.guittot@linaro.org
    Signed-off-by: Ingo Molnar

    Vincent Guittot
     
  • Some people have reported that the warning in sched_tick_remote()
    occasionally triggers, especially in favour of some RCU-Torture
    pressure:

    WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0
    Modules linked in:
    CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    Workqueue: events_unbound sched_tick_remote
    RIP: 0010:sched_tick_remote+0xb6/0xc0
    Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa 0b eb
    +c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6
    Call Trace:
    process_one_work+0x1df/0x3b0
    worker_thread+0x44/0x3d0
    kthread+0xf3/0x130
    ? set_worker_desc+0xb0/0xb0
    ? kthread_create_worker_on_cpu+0x70/0x70
    ret_from_fork+0x35/0x40

    This happens when the remote tick applies on an idle task. Usually the
    idle_cpu() check avoids that, but it is performed before we lock the
    runqueue and it is therefore racy. It was intended to be that way in
    order to prevent from useless runqueue locks since idle task tick
    callback is a no-op.

    Now if the racy check slips out of our hands and we end up remotely
    ticking an idle task, the empty task_tick_idle() is harmless. Still
    it won't pass the WARN_ON_ONCE() test that ensures rq_clock_task() is
    not too far from curr->se.exec_start because update_curr_idle() doesn't
    update the exec_start value like other scheduler policies. Hence the
    reported false positive.

    So let's have another check, while the rq is locked, to make sure we
    don't remote tick on an idle task. The lockless idle_cpu() still applies
    to avoid unecessary rq lock contention.

    Reported-by: Jacek Tomaka
    Reported-by: Paul E. McKenney
    Reported-by: Anna-Maria Gleixner
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1530203381-31234-1-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

02 Jul, 2018

7 commits

  • Linus Torvalds
     
  • Pull btrfs fixes from David Sterba:
    "We have a few regression fixes for qgroup rescan status tracking and
    the vm_fault_t conversion that mixed up the error values"

    * tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    Btrfs: fix mount failure when qgroup rescan is in progress
    Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
    btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf

    Linus Torvalds
     
  • Pull vfs fix from Al Viro:
    "Followup to procfs-seq_file series this window"

    This fixes a memory leak by making sure that proc seq files release any
    private data on close. The 'proc_seq_open' has to be properly paired
    with 'proc_seq_release' that releases the extra private data.

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    proc: add proc_seq_release

    Linus Torvalds
     
  • Pull staging/IIO fixes from Greg KH:
    "Here are a few small staging and IIO driver fixes for 4.18-rc3.

    Nothing major or big, all just fixes for reported problems since
    4.18-rc1. All of these have been in linux-next this week with no
    reported problems"

    * tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: android: ion: Return an ERR_PTR in ion_map_kernel
    staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
    iio: imu: inv_mpu6050: Fix probe() failure on older ACPI based machines
    iio: buffer: fix the function signature to match implementation
    iio: mma8452: Fix ignoring MMA8452_INT_DRDY
    iio: tsl2x7x/tsl2772: avoid potential division by zero
    iio: pressure: bmp280: fix relative humidity unit

    Linus Torvalds
     
  • Pull tty/serial fixes from Greg KH:
    "Here are five fixes for the tty core and some serial drivers.

    The tty core ones fix some security and other issues reported by the
    syzbot that I have taken too long in responding to (sorry Tetsuo!).

    The 8350 serial driver fix resolves an issue of devices that used to
    work properly stopping working as they shouldn't have been added to a
    blacklist.

    All of these have been in linux-next for a few days with no reported
    issues"

    * tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    vt: prevent leaking uninitialized data to userspace via /dev/vcs*
    serdev: fix memleak on module unload
    serial: 8250_pci: Remove stalled entries in blacklist
    n_tty: Access echo_* variables carefully.
    n_tty: Fix stall at n_tty_receive_char_special().

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here is a number of USB gadget and other driver fixes for 4.18-rc3.

    There's a bunch of them here, most of them being gadget driver and
    xhci host controller fixes for reported issues (as normal), but there
    are also some new device ids, and some fixes for the typec code.

    There is an acpi core patch in here that was acked by the acpi
    maintainer as it is needed for the typec fixes in order to properly
    solve a problem in that driver.

    All of these have been in linux-next this week with no reported
    issues"

    * tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
    usb: chipidea: host: fix disconnection detect issue
    usb: typec: tcpm: fix logbuffer index is wrong if _tcpm_log is re-entered
    typec: tcpm: Fix a msecs vs jiffies bug
    NFC: pn533: Fix wrong GFP flag usage
    usb: cdc_acm: Add quirk for Uniden UBC125 scanner
    staging/typec: fix tcpci_rt1711h build errors
    usb: typec: ucsi: Fix for incorrect status data issue
    usb: typec: ucsi: acpi: Workaround for cache mode issue
    acpi: Add helper for deactivating memory region
    usb: xhci: increase CRS timeout value
    usb: xhci: tegra: fix runtime PM error handling
    usb: xhci: remove the code build warning
    xhci: Fix kernel oops in trace_xhci_free_virt_device
    xhci: Fix perceived dead host due to runtime suspend race with event handler
    dwc2: gadget: Fix ISOC IN DDMA PID bitfield value calculation
    usb: gadget: dwc2: fix memory leak in gadget_init()
    usb: gadget: composite: fix delayed_status race condition when set_interface
    usb: dwc2: fix isoc split in transfer with no data
    usb: dwc2: alloc dma aligned buffer for isoc split in
    usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub
    ...

    Linus Torvalds
     
  • Pull dma mapping fixlet from Christoph Hellwig:
    "Add a missing export required by riscv and unicore"

    * tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping:
    swiotlb: export swiotlb_dma_ops

    Linus Torvalds
     

01 Jul, 2018

7 commits

  • Pull parisc fixes and cleanups from Helge Deller:
    "Nothing exiting in this patchset, just

    - small cleanups of header files

    - default to 4 CPUs when building a SMP kernel

    - mark 16kB and 64kB page sizes broken

    - addition of the new io_pgetevents syscall"

    * 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Build kernel without -ffunction-sections
    parisc: Reduce debug output in unwind code
    parisc: Wire up io_pgetevents syscall
    parisc: Default to 4 SMP CPUs
    parisc: Convert printk(KERN_LEVEL) to pr_lvl()
    parisc: Mark 16kB and 64kB page sizes BROKEN
    parisc: Drop struct sigaction from not exported header file

    Linus Torvalds
     
  • Pull ARM SoC fixes from Olof Johansson:
    "A smaller batch for the end of the week (let's see if I can keep the
    weekly cadence going for once).

    All medium-grade fixes here, nothing worrisome:

    - Fixes for some fairly old bugs around SD card write-protect
    detection and GPIO interrupt assignments on Davinci.

    - Wifi module suspend fix for Hikey.

    - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one
    of which solves booting with third-party u-boot"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    arm64: dts: hikey960: Define wl1837 power capabilities
    arm64: dts: hikey: Define wl1835 power capabilities
    ARM64: dts: meson-gxl: fix Mali GPU compatible string
    ARM64: dts: meson-axg: fix ethernet stability issue
    ARM64: dts: meson-gx: fix ATF reserved memory region
    ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
    ARM64: dts: meson: fix register ranges for SD/eMMC
    ARM64: dts: meson: disable sd-uhs modes on the libretech-cc
    ARM: dts: da850: Fix interrups property for gpio
    ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD

    Linus Torvalds
     
  • …masahiroy/linux-kbuild

    Pull Kbuild fixes from Masahiro Yamada:

    - introduce __diag_* macros and suppress -Wattribute-alias warnings
    from GCC 8

    - fix stack protector test script for x86_64

    - fix line number handling in Kconfig

    - document that '#' starts a comment in Kconfig

    - handle P_SYMBOL property in dump debugging of Kconfig

    - correct help message of LD_DEAD_CODE_DATA_ELIMINATION

    - fix occasional segmentation faults in Kconfig

    * tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kconfig: loop boundary condition fix
    kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
    kconfig: handle P_SYMBOL in print_symbol()
    kconfig: document Kconfig source file comments
    kconfig: fix line numbers for if-entries in menu tree
    stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
    powerpc: Remove -Wattribute-alias pragmas
    disable -Wattribute-alias warning for SYSCALL_DEFINEx()
    kbuild: add macro for controlling warnings to linux/compiler.h

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "The biggest diffstat comes from self-test updates, plus there's entry
    code fixes, 5-level paging related fixes, console debug output fixes,
    and misc fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm: Clean up the printk()s in show_fault_oops()
    x86/mm: Drop unneeded __always_inline for p4d page table helpers
    x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y
    selftests/x86/sigreturn: Do minor cleanups
    selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
    x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
    x86/mm: Don't free P4D table when it is folded at runtime
    x86/entry/32: Add explicit 'l' instruction suffix
    x86/mm: Get rid of KERN_CONT in show_fault_oops()

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Tooling fixes mostly, plus a build warning fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    perf/core: Move inline keyword at the beginning of declaration
    tools/headers: Pick up latest kernel ABIs
    perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]
    perf script: Fix crash because of missing evsel->priv
    perf script: Add missing output fields in a hint
    perf bench: Fix numa report output code
    perf stat: Remove duplicate event counting
    perf alias: Rebuild alias expression string to make it comparable
    perf alias: Remove trailing newline when reading sysfs files
    perf tools: Fix a clang 7.0 compilation error
    tools include uapi: Synchronize bpf.h with the kernel
    tools include uapi: Update if_link.h to pick IFLA_{BRPORT_ISOLATED,VXLAN_TTL_INHERIT}
    tools include powerpc: Update arch/powerpc/include/uapi/asm/unistd.h copy to get 'rseq' syscall
    perf tools: Update x86's syscall_64.tbl, adding 'io_pgetevents' and 'rseq'
    tools headers uapi: Synchronize drm/drm.h
    perf intel-pt: Fix packet decoding of CYC packets
    perf tests: Add valid callback for parse-events test
    perf tests: Add event parsing error handling to parse events test
    perf report powerpc: Fix crash if callchain is empty
    perf test session topology: Fix test on s390
    ...

    Linus Torvalds
     
  • Pull selinux fix from Paul Moore:
    "One fairly straightforward patch to fix a longstanding issue where a
    process could stall while accessing files in selinuxfs and block
    everyone else due to a held mutex.

    The patch passes all our tests and looks to apply cleanly to your
    current tree"

    * tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    selinux: move user accesses in selinuxfs out of locked regions

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Small set of fixes for this series. Mostly just minor fixes, the only
    oddball in here is the sg change.

    The sg change came out of the stall fix for NVMe, where we added a
    mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
    sort-of ruins that, since we'd need to account for that. That's
    actually a generic problem, since lots of drivers need to allocate SG
    lists. So this just removes support for CONFIG_SG_DEBUG, which I added
    back in 2007 and to my knowledge it was never useful.

    Anyway, outside of that, this pull contains:

    - clone of request with special payload fix (Bart)

    - drbd discard handling fix (Bart)

    - SATA blk-mq stall fix (me)

    - chunk size fix (Keith)

    - double free nvme rdma fix (Sagi)"

    * tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
    sg: remove ->sg_magic member
    drbd: Fix drbd_request_prepare() discard handling
    blk-mq: don't queue more if we get a busy return
    block: Fix cloning of requests with a special payload
    nvme-rdma: fix possible double free of controller async event buffer
    block: Fix transfer when chunk sectors exceeds max

    Linus Torvalds
     

30 Jun, 2018

7 commits

  • Pull powerpc fixes from Michael Ellerman:
    "Two regression fixes, and a new syscall wire-up:

    - A fix for the recent conversion to time64_t in the powermac RTC
    routines, which caused time to go backward.

    - Another fix for fallout from the split PMD PTL conversion.

    - Wire up the new io_pgetevents() syscall.

    Thanks to: Aneesh Kumar K.V, Arnd Bergmann, Breno Leitao, Mathieu
    Malaterre"

    * tag 'powerpc-4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/powermac: Fix rtc read/write functions
    powerpc/mm/32: Fix pgtable_page_dtor call
    powerpc: Wire up io_pgetevents

    Linus Torvalds
     
  • …/git/nsekhar/linux-davinci into fixes

    This fixes polarity of SD card write-protect pin on DA850 EVM
    and fixes interrupt property for DA850 SoC GPIO as defined in
    device-tree.

    Both of these are not introduced with v4.18 merge but have
    existed prior.

    * tag 'davinci-fixes-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
    ARM: dts: da850: Fix interrups property for gpio
    ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD

    Signed-off-by: Olof Johansson <olof@lixom.net>

    Olof Johansson
     
  • ARM64: hisi fixes for 4.18

    - Added power capabilities for the mmc host controller on the
    hikey and hikey960 boards to avoid broken wifi.

    * tag 'hisi-fixes-for-4.18' of git://github.com/hisilicon/linux-hisi:
    arm64: dts: hikey960: Define wl1837 power capabilities
    arm64: dts: hikey: Define wl1835 power capabilities

    Signed-off-by: Olof Johansson

    Olof Johansson
     
  • …lman/linux-amlogic into fixes

    Amlogic fixes for v4.18-rc
    - minor 64-bit DT fixes

    * tag 'amlogic-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
    ARM64: dts: meson-gxl: fix Mali GPU compatible string
    ARM64: dts: meson-axg: fix ethernet stability issue
    ARM64: dts: meson-gx: fix ATF reserved memory region
    ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
    ARM64: dts: meson: fix register ranges for SD/eMMC
    ARM64: dts: meson: disable sd-uhs modes on the libretech-cc

    Signed-off-by: Olof Johansson <olof@lixom.net>

    Olof Johansson
     
  • Pull arm64 fixes from Catalin Marinas:

    - The alternatives patching code uses flush_icache_range() which itself
    uses alternatives. Change the code to use an unpatched variant of
    cache maintenance

    - Remove unnecessary ISBs from set_{pte,pmd,pud}

    - perf: xgene_pmu: Fix IOB SLOW PMU parser error

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}
    arm64: Avoid flush_icache_range() in alternatives patching code
    drivers/perf: xgene_pmu: Fix IOB SLOW PMU parser error

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:

    - a revert because of bugzilla #200045 (and some documentation about
    it)

    - another regression fix in the i2c-gpio driver

    - a leak fix for the i2c core

    * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: gpio: initialize SCL to HIGH again
    i2c: smbus: kill memory leak on emulated and failed DMA SMBus xfers
    i2c: algos: bit: mention our experience about initial states
    Revert "i2c: algo-bit: init the bus to a known state"

    Linus Torvalds
     
  • Pull ceph fix from Ilya Dryomov:
    "A trivial dentry leak fix from Zheng"

    * tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-client:
    ceph: fix dentry leak in splice_dentry()

    Linus Torvalds
     

29 Jun, 2018

9 commits

  • As suggested by Nick Piggin it seems we can drop the -ffunction-sections
    compile flag, now that the kernel uses thin archives. Testing with 32-
    and 64-bit kernel showed no difference in kernel size.

    Suggested-by: Nicholas Piggin
    Signed-off-by: Helge Deller

    Helge Deller
     
  • This was introduced more than a decade ago when sg chaining was
    added, but we never really caught anything with it. The scatterlist
    entry size can be critical, since drivers allocate it, so remove
    the magic member. Recently it's been triggering allocation stalls
    and failures in NVMe.

    Tested-by: Jordan Glover
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Pull PCI fixes from Bjorn Helgaas:

    - Fix crash caused by endpoint library initialization order change
    (Alan Douglas)

    - Fix shpchp NULL pointer dereference regression on non-ACPI platforms
    (Bjorn Helgaas)

    - Move PCI_DOMAINS selection to fix build regression (Lorenzo
    Pieralisi)

    * tag 'pci-v4.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    PCI: controller: Move PCI_DOMAINS selection to arch Kconfig
    PCI: Initialize endpoint library before controllers
    PCI: shpchp: Manage SHPC unconditionally on non-ACPI systems

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "These fix up recently added features (the Kryo cpufreq driver and
    performance states coverage in the generic power domains framework),
    add missing documentation for a recently added sysfs knob in the
    intel_pstate driver and fix an error in its documentation.

    Specifics:

    - Fix the initialization time error handling in the recently added
    Kryo cpufreq driver (Dan Carpenter).

    - Fix up the recently added coverage of performance states in the
    generic power domains (genpd) framework (Viresh Kumar).

    - Add missing documentation of the new hwp_dynamic_boost sysfs knob
    in the intel_pstate driver (Rafael Wysocki).

    - Fix incorrect sysfs path in the intel_pstate driver documentation
    (Rafael Wysocki)"

    * tag 'pm-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    Documentation: intel_pstate: Describe hwp_dynamic_boost sysfs knob
    Documentation: admin-guide: intel_pstate: Fix sysfs path
    PM / Domains: Rename opp_node to np
    PM / Domains: Fix return value of of_genpd_opp_to_performance_state()
    cpufreq: qcom-kryo: Fix error handling in probe()

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Nothing too major this round:

    - small set of mali-dp fixes

    - single meson fix

    - a bunch of amdgpu fixes (one makes non-4k page sizes not be a bad
    experience)"

    * tag 'drm-fixes-2018-06-29' of git://anongit.freedesktop.org/drm/drm:
    drm/amd/display: release spinlock before committing updates to stream
    drm/amdgpu:Support new VCN FW version naming convention
    drm/amdgpu: fix UBSAN: Undefined behaviour for amdgpu_fence.c
    drm/meson: Fix an un-handled error path in 'meson_drv_bind_master()'
    drm/amdgpu: GPU vs CPU page size fixes in amdgpu_vm_bo_split_mapping
    drm/amdgpu: Count disabled CRTCs in commit tail earlier
    drm/mali-dp: Rectify the width and height passed to rotmem_required()
    drm/arm/malidp: Preserve LAYER_FORMAT contents when setting format
    drm: mali-dp: Enable Global SE interrupts mask for DP500
    drm/arm/malidp: Ensure that the crtcs are shutdown before removing any encoder/connector

    Linus Torvalds
     
  • …evice-mapper/linux-dm

    Pull device mapper fixes from Mike Snitzer:

    - Fix dm core to use more efficient bio_split() instead of
    bio_clone_bioset(). Also fixes splitting bio that has integrity
    payload.

    - Three fixes related to properly validating DAX capabilities of a
    stacked DM device that will advertise DAX support.

    - Update DM writecache target to use 2-factor allocator arguments. Kees
    says this is the last related change for 4.18.

    - Fix DM zoned target to use GFP_NOIO to avoid triggering reclaim
    during IO submission (caught by lockdep).

    - Fix DM thinp to gracefully recover from running out of data space
    while a previous async discard completes (whereby freeing space).

    - Fix DM thinp's metadata transaction commit to avoid needless work.

    * tag 'for-4.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm: prevent DAX mounts if not supported
    dax: check for QUEUE_FLAG_DAX in bdev_dax_supported()
    pmem: only set QUEUE_FLAG_DAX for fsdax mode
    dm thin: handle running out of data space vs concurrent discard
    dm raid: don't use 'const' in function return
    dm zoned: avoid triggering reclaim from inside dmz_map()
    dm writecache: use 2-factor allocator arguments
    dm thin metadata: remove needless work from __commit_transaction
    dm: use bio_split() when splitting out the already processed bio

    Linus Torvalds
     
  • Pull single NVMe fix from Christoph.

    * 'nvme-4.18' of git://git.infradead.org/nvme:
    nvme-rdma: fix possible double free of controller async event buffer

    Jens Axboe
     
  • Fix the test that verifies whether bio_op(bio) represents a discard
    or write zeroes operation. Compile-tested only.

    Cc: Philipp Reisner
    Cc: Lars Ellenberg
    Fixes: 7435e9018f91 ("drbd: zero-out partial unaligned discards on local backend")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Some devices have different queue limits depending on the type of IO. A
    classic case is SATA NCQ, where some commands can queue, but others
    cannot. If we have NCQ commands inflight and encounter a non-queueable
    command, the driver returns busy. Currently we attempt to dispatch more
    from the scheduler, if we were able to queue some commands. But for the
    case where we ended up stopping due to BUSY, we should not attempt to
    retrieve more from the scheduler. If we do, we can get into a situation
    where we attempt to queue a non-queueable command, get BUSY, then
    successfully retrieve more commands from that scheduler and queue those.
    This can repeat forever, starving the non-queuable command indefinitely.

    Fix this by NOT attempting to pull more commands from the scheduler, if
    we get a BUSY return. This should also be more optimal in terms of
    letting requests stay in the scheduler for as long as possible, if we
    get a BUSY due to the regular out-of-tags condition.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe