16 May, 2018

4 commits


23 Apr, 2018

12 commits

  • Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Exynos, i915, vc4, amdgpu fixes.

    i915:
    - an oops fix
    - two race fixes
    - some gvt fixes

    amdgpu:
    - dark screen fix
    - clk/voltage fix
    - vega12 smu fix

    vc4:
    - memory leak fix

    exynos just drops some code"

    * tag 'drm-fixes-for-v4.17-rc2' of git://people.freedesktop.org/~airlied/linux: (23 commits)
    drm/amd/powerplay: header file interface to SMU update
    drm/amd/pp: Fix bug voltage can't be OD separately on VI
    drm/amd/display: Don't program bypass on linear regamma LUT
    drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state
    drm/i915/audio: Fix audio detection issue on GLK
    drm/i915: Call i915_perf_fini() on init_hw error unwind
    drm/i915/bios: filter out invalid DDC pins from VBT child devices
    drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6
    drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value
    drm/exynos: exynos_drm_fb -> drm_framebuffer
    drm/exynos: Move dma_addr out of exynos_drm_fb
    drm/exynos: Move GEM BOs to drm_framebuffer
    drm: Fix HDCP downstream dev count read
    drm/vc4: Fix memory leak during BO teardown
    drm/i915/execlists: Clear user-active flag on preemption completion
    drm/i915/gvt: Add drm_format_mod update
    drm/i915/gvt: Disable primary/sprite/cursor plane at virtual display initialization
    drm/i915/gvt: Delete redundant error message in fb_decode.c
    drm/i915/gvt: Cancel dma map when resetting ggtt entries
    drm/i915/gvt: Missed to cancel dma map for ggtt entries
    ...

    Linus Torvalds
     
  • - Fix a dark screen issue in DC
    - Fix clk/voltage dependency tracking for wattman
    - Update SMU interface for vega12

    * 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux:
    drm/amd/powerplay: header file interface to SMU update
    drm/amd/pp: Fix bug voltage can't be OD separately on VI
    drm/amd/display: Don't program bypass on linear regamma LUT

    Dave Airlie
     
  • …/kernel/git/daeinki/drm-exynos into drm-next

    Remove Exynos specific framebuffer structure and
    relevant functions.
    - it removes exynos_drm_fb structure which is a wrapper of
    drm_framebuffer and unnecessary two exynos specific callback
    functions, exynos_drm_destory() and exynos_drm_fb_create_handle()
    because we can reuse existing drm common callback ones instead.

    * tag 'exynos-drm-fixes-for-v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
    drm/exynos: exynos_drm_fb -> drm_framebuffer
    drm/exynos: Move dma_addr out of exynos_drm_fb
    drm/exynos: Move GEM BOs to drm_framebuffer
    drm/amdkfd: Deallocate SDMA queues correctly
    drm/amdkfd: Fix scratch memory with HWS enabled

    Dave Airlie
     
  • …/drm-intel into drm-next

    - Fix for FDO #105549: Avoid OOPS on bad VBT (Jani)
    - Fix rare pre-emption race (Chris)
    - Fix RC6 race against PM transitions (Tvrtko)

    * tag 'drm-intel-next-fixes-2018-04-19' of git://anongit.freedesktop.org/drm/drm-intel:
    drm/i915/audio: Fix audio detection issue on GLK
    drm/i915: Call i915_perf_fini() on init_hw error unwind
    drm/i915/bios: filter out invalid DDC pins from VBT child devices
    drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6
    drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value
    drm/i915/execlists: Clear user-active flag on preemption completion
    drm/i915/gvt: Add drm_format_mod update
    drm/i915/gvt: Disable primary/sprite/cursor plane at virtual display initialization
    drm/i915/gvt: Delete redundant error message in fb_decode.c
    drm/i915/gvt: Cancel dma map when resetting ggtt entries
    drm/i915/gvt: Missed to cancel dma map for ggtt entries
    drm/i915/gvt: Make MI_USER_INTERRUPT nop in cmd parser
    drm/i915/gvt: Mark expected switch fall-through in handle_g2v_notification
    drm/i915/gvt: throw error on unhandled vfio ioctls

    Dave Airlie
     
  • drm-misc-fixes:

    stable: vc4: Fix memory leak during BO teardown (Daniel)
    dp: Add i2c retry for LSPCON adapters (Imre)
    hdcp: Fix device count mask (Ramalingam)

    Cc: Daniel J Blueman
    Cc: Ramalingam C

    * tag 'drm-misc-fixes-2018-04-18-1' of git://anongit.freedesktop.org/drm/drm-misc:
    drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state
    drm: Fix HDCP downstream dev count read
    drm/vc4: Fix memory leak during BO teardown

    Dave Airlie
     
  • Pull cifs fixes from Steve French:
    "Various SMB3/CIFS fixes.

    There are three more security related fixes in progress that are not
    included in this set but they are still being tested and reviewed, so
    sending this unrelated set of smaller fixes now"

    * tag '4.17-rc1-SMB3-CIFS' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: fix typo in cifs_dbg
    cifs: do not allow creating sockets except with SMB1 posix exensions
    cifs: smbd: Dump SMB packet when configured
    cifs: smbd: Check for iov length on sending the last iov
    fs: cifs: Adding new return type vm_fault_t
    cifs: smb2ops: Fix NULL check in smb2_query_symlink

    Linus Torvalds
     
  • Pull btrfs fixes from David Sterba:
    "This contains a few fixups to the qgroup patches that were merged this
    dev cycle, unaligned access fix, blockgroup removal corner case fix
    and a small debugging output tweak"

    * tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    btrfs: print-tree: debugging output enhancement
    btrfs: Fix race condition between delayed refs and blockgroup removal
    btrfs: fix unaligned access in readdir
    btrfs: Fix wrong btrfs_delalloc_release_extents parameter
    btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
    btrfs: qgroup: Use independent and accurate per inode qgroup rsv
    btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A small set of fixes for x86:

    - Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which
    causes the possible CPU count to be wrong.

    - Prevent 32bit truncation in calc_hpet_ref() which causes the TSC
    calibration to fail

    - Fix the page table setup for temporary text mappings in the resume
    code which causes resume failures

    - Make the page table dump code handle HIGHPTE correctly instead of
    oopsing

    - Support for topologies where NUMA nodes share an LLC to prevent a
    invalid topology warning and further malfunction on such systems.

    - Remove the now unused pci-nommu code

    - Remove stale function declarations"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/power/64: Fix page-table setup for temporary text mapping
    x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y
    x86,sched: Allow topologies where NUMA nodes share an LLC
    x86/processor: Remove two unused function declarations
    x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
    x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
    x86: Remove pci-nommu.c

    Linus Torvalds
     
  • Pull timer fixes from Thomas Gleixner:
    "A small set of timer fixes:

    - Evaluate the -ETIME condition correctly in the imx tpm driver

    - Fix the evaluation order of a condition in posix cpu timers

    - Use pr_cont() in the clockevents code to prevent ugly message
    splitting

    - Remove __current_kernel_time() which is now unused to prevent that
    new users show up.

    - Remove a stale forward declaration"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clocksource/imx-tpm: Correct -ETIME return condition check
    posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated
    timekeeping: Remove __current_kernel_time()
    timers: Remove stale struct tvec_base forward declaration
    clockevents: Fix kernel messages split across multiple lines

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    "A larger set of updates for perf.

    Kernel:

    - Handle the SBOX uncore monitoring correctly on Broadwell CPUs which
    do not have SBOX.

    - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The
    percentage of preempting and non-preempting context switches help
    understanding the nature of workloads (CPU or IO bound) that are
    running on a machine. This adds the kernel facility and userspace
    changes needed to show this information in 'perf script' and 'perf
    report -D' (Alexey Budankov)

    - Remove a WARN_ON() in the trace/kprobes code which is pointless
    because the return error code is already telling the caller what's
    wrong.

    - Revert a fugly workaround for clang BPF targets.

    - Fix sample_max_stack maximum check and do not proceed when an error
    has been detect, return them to avoid misidentifying errors (Jiri
    Olsa)

    - Add SPDX idenitifiers and get rid of GPL boilderplate.

    Tools:

    - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)

    - Support MAP_FIXED_NOREPLACE, noticed when updating the
    tools/include/ copies (Arnaldo Carvalho de Melo)

    - Add '\n' at the end of parse-options error messages (Ravi Bangoria)

    - Add s390 support for detailed/verbose PMU event description (Thomas
    Richter)

    - perf annotate fixes and improvements:

    * Allow showing offsets in more than just jump targets, use the
    new 'O' hotkey in the TUI, config ~/.perfconfig
    annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho
    de Melo)

    * Use the resolved variable names from objdump disassembled lines
    to make them more compact, just like was already done for some
    instructions, like "mov", this eventually will be done more
    generally, but lets now add some more to the existing mechanism
    (Arnaldo Carvalho de Melo)

    - perf record fixes:

    * Change warning for missing topology sysfs entry to debug, as not
    all architectures have those files, s390 being one of those
    (Thomas Richter)

    * Remove old error messages about things that unlikely to be the
    root cause in modern systems (Andi Kleen)

    - perf sched fixes:

    * Fix -g/--call-graph documentation (Takuya Yamamoto)

    - perf stat:

    * Enable 1ms interval for printing event counters values in
    (Alexey Budankov)

    - perf test fixes:

    * Run dwarf unwind on arm32 (Kim Phillips)

    * Remove unused ptrace.h include from LLVM test, sidesteping older
    clang's lack of support for some asm constructs (Arnaldo
    Carvalho de Melo)

    * Fixup BPF test using epoll_pwait syscall function probe, to cope
    with the syscall routines renames performed in this development
    cycle (Arnaldo Carvalho de Melo)

    - perf version fixes:

    * Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version
    --build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as
    libaudit won't be used in that case, print info about
    syscall_table support instead (Jin Yao)

    - Build system fixes:

    * Use HAVE_..._SUPPORT used consistently (Jin Yao)

    * Restore READ_ONCE() C++ compatibility in tools/include (Mark
    Rutland)

    * Give hints about package names needed to build jvmti (Arnaldo
    Carvalho de Melo)"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
    perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
    coresight: Move to SPDX identifier
    perf test BPF: Fixup BPF test using epoll_pwait syscall function probe
    perf tests mmap: Show which tracepoint is failing
    perf tools: Add '\n' at the end of parse-options error messages
    perf record: Remove suggestion to enable APIC
    perf record: Remove misleading error suggestion
    perf hists browser: Clarify top/report browser help
    perf mem: Allow all record/report options
    perf trace: Support MAP_FIXED_NOREPLACE
    perf: Remove superfluous allocation error check
    perf: Fix sample_max_stack maximum check
    perf: Return proper values for user stack errors
    perf list: Add s390 support for detailed/verbose PMU event description
    perf script: Extend misc field decoding with switch out event type
    perf report: Extend raw dump (-D) out with switch out event type
    perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]
    tools/headers: Synchronize kernel ABI headers, v4.17-rc1
    trace_kprobe: Remove warning message "Could not insert probe at..."
    ...

    Linus Torvalds
     
  • Pull objtool fix from Thomas Gleixner:
    "A single fix for objtool so it uses the host C and LD flags and not
    the target ones"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    objtool: Support HOSTCFLAGS and HOSTLDFLAGS

    Linus Torvalds
     

22 Apr, 2018

6 commits

  • Pull /dev/random fixes from Ted Ts'o:
    "Fix some bugs in the /dev/random driver which causes getrandom(2) to
    unblock earlier than designed.

    Thanks to Jann Horn from Google's Project Zero for pointing this out
    to me"

    * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: add new ioctl RNDRESEEDCRNG
    random: crng_reseed() should lock the crng instance that it is modifying
    random: set up the NUMA crng instances after the CRNG is fully initialized
    random: use a different mixing algorithm for add_device_randomness()
    random: fix crng_ready() test

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:
    "A regression fix, new unit test infrastructure and a build fix:

    - Regression fix addressing support for the new NVDIMM label storage
    area access commands (_LSI, _LSR, and _LSW).

    The Intel specific version of these commands communicated the
    "Device Locked" status on the label-storage-information command.

    However, these new commands (standardized in ACPI 6.2) communicate
    the "Device Locked" status on the label-storage-read command, and
    the driver was missing the indication.

    Reading from locked persistent memory is similar to reading
    unmapped PCI memory space, returns all 1's.

    - Unit test infrastructure is added to regression test the "Device
    Locked" detection failure.

    - A build fix is included to allow the "of_pmem" driver to be built
    as a module and translate an Open Firmware described device to its
    local numa node"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    MAINTAINERS: Add backup maintainers for libnvdimm and DAX
    device-dax: allow MAP_SYNC to succeed
    Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error"
    libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()
    tools/testing/nvdimm: enable labels for nfit_test.1 dimms
    tools/testing/nvdimm: fix missing newline in nfit_test_dimm 'handle' attribute
    tools/testing/nvdimm: support nfit_test_dimm attributes under nfit_test.1
    tools/testing/nvdimm: allow custom error code injection
    libnvdimm, dimm: handle EACCES failures from label reads

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "A few small fixes:

    - a fix for the NULL-dereference in rawmidi compat ioctls, triggered
    by fuzzer

    - HD-audio Realtek codec quirks, a VIA controller fixup

    - a long-standing bug fix in LINE6 MIDI"

    * tag 'sound-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: rawmidi: Fix missing input substream checks in compat ioctls
    ALSA: hda/realtek - adjust the location of one mic
    ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags
    ALSA: hda - New VIA controller suppor no-snoop path
    ALSA: line6: Use correct endpoint type for midi output

    Linus Torvalds
     
  • Pull watchdog fixes from Wim Van Sebroeck:

    - fall-through fixes

    - MAINTAINER change for hpwdt

    - renesas-wdt: Add support for WDIOF_CARDRESET

    - aspeed: set bootstatus during probe

    * tag 'linux-watchdog-4.17-rc2' of git://www.linux-watchdog.org/linux-watchdog:
    aspeed: watchdog: Set bootstatus during probe
    watchdog: renesas-wdt: Add support for WDIOF_CARDRESET
    watchdog: wafer5823wdt: Mark expected switch fall-through
    watchdog: w83977f_wdt: Mark expected switch fall-through
    watchdog: sch311x_wdt: Mark expected switch fall-through
    watchdog: hpwdt: change maintainer.

    Linus Torvalds
     
  • …l/git/shuah/linux-kselftest

    Pull Kselftest fix from Shuah Khan:
    "A fix from Michael Ellerman to not run dnotify_test by default to
    prevent Kselftest running forever"

    * tag 'linux-kselftest-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    selftests/filesystems: Don't run dnotify_test by default

    Linus Torvalds
     
  • Pull arm64 fixes from Catalin Marinas:

    - kasan: avoid pfn_to_nid() before the page array is initialised

    - Fix typo causing the "upgrade" of known signals to SIGKILL

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: signal: don't force known signals to SIGKILL
    arm64: kasan: avoid pfn_to_nid() before page array is initialized

    Linus Torvalds
     

21 Apr, 2018

18 commits

  • Merge misc fixes from Andrew Morton:

    - "fork: unconditionally clear stack on fork" is a non-bugfix which got
    lost during the merge window - performance concerns appear to have
    been adequately addressed.

    - and a bunch of fixes

    * emailed patches from Andrew Morton :
    mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
    mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create()
    fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error
    kexec_file: do not add extra alignment to efi memmap
    proc: fix /proc/loadavg regression
    proc: revalidate kernel thread inodes to root:root
    autofs: mount point create should honour passed in mode
    MAINTAINERS: add personal addresses for Sascha and Uwe
    kasan: add no_sanitize attribute for clang builds
    rapidio: fix rio_dma_transfer error handling
    mm: enable thp migration for shmem thp
    writeback: safer lock nesting
    mm, pagemap: fix swap offset value for PMD migration entry
    mm: fix do_pages_move status handling
    fork: unconditionally clear stack on fork

    Linus Torvalds
     
  • …linux/kernel/git/acme/linux into perf/urgent

    Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:

    - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE].
    The percentage of preempting and non-preempting context switches help
    understanding the nature of workloads (CPU or IO bound) that are running
    on a machine. This adds the kernel facility and userspace changes needed
    to show this information in 'perf script' and 'perf report -D' (Alexey Budankov)

    - Remove old error messages about things that unlikely to be the root cause
    in modern systems (Andi Kleen)

    - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)

    - Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/
    copies (Arnaldo Carvalho de Melo)

    - Fixup BPF test using epoll_pwait syscall function probe, to cope with
    the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo)

    - Fix sample_max_stack maximum check and do not proceed when an error
    has been detect, return them to avoid misidentifying errors (Jiri Olsa)

    - Add '\n' at the end of parse-options error messages (Ravi Bangoria)

    - Add s390 support for detailed/verbose PMU event description (Thomas Richter)

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • f2fs specifies the __GFP_ZERO flag for allocating some of its pages.
    Unfortunately, the page cache also uses the mapping's GFP flags for
    allocating radix tree nodes. It always masked off the __GFP_HIGHMEM
    flag, and masks off __GFP_ZERO in some paths, but not all. That causes
    radix tree nodes to be allocated with a NULL list_head, which causes
    backtraces like:

    __list_del_entry+0x30/0xd0
    list_lru_del+0xac/0x1ac
    page_cache_tree_insert+0xd8/0x110

    The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through
    if they are ever used. Fix them all by using GFP_RECLAIM_MASK at the
    innermost location, and remove it from earlier in the callchain.

    Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org
    Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
    Signed-off-by: Matthew Wilcox
    Reported-by: Chris Fries
    Debugged-by: Minchan Kim
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Reviewed-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If there is heavy memory pressure, page allocation with __GFP_NOWAIT
    fails easily although it's order-0 request. I got below warning 9 times
    for normal boot.

    : page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
    .. snip ..
    Call trace:
    dump_backtrace+0x0/0x4
    dump_stack+0xa4/0xc0
    warn_alloc+0xd4/0x15c
    __alloc_pages_nodemask+0xf88/0x10fc
    alloc_slab_page+0x40/0x18c
    new_slab+0x2b8/0x2e0
    ___slab_alloc+0x25c/0x464
    __kmalloc+0x394/0x498
    memcg_kmem_get_cache+0x114/0x2b8
    kmem_cache_alloc+0x98/0x3e8
    mmap_region+0x3bc/0x8c0
    do_mmap+0x40c/0x43c
    vm_mmap_pgoff+0x15c/0x1e4
    sys_mmap+0xb0/0xc8
    el0_svc_naked+0x24/0x28
    Mem-Info:
    active_anon:17124 inactive_anon:193 isolated_anon:0
    active_file:7898 inactive_file:712955 isolated_file:55
    unevictable:0 dirty:27 writeback:18 unstable:0
    slab_reclaimable:12250 slab_unreclaimable:23334
    mapped:19310 shmem:212 pagetables:816 bounce:0
    free:36561 free_pcp:1205 free_cma:35615
    Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
    DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
    lowmem_reserve[]: 0 1842 1842
    Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
    lowmem_reserve[]: 0 0 0
    DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
    Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
    721350 total pagecache pages
    0 pages in swap cache
    Swap cache stats: add 0, delete 0, find 0/0
    Free swap = 0kB
    Total swap = 0kB
    945512 pages RAM
    0 pages HighMem/MovableOnly
    63408 pages reserved
    51200 pages cma reserved

    __memcg_schedule_kmem_cache_create() tries to create a shadow slab cache
    and the worker allocation failure is not really critical because we will
    retry on the next kmem charge. We might miss some charges but that
    shouldn't be critical. The excessive allocation failure report is not
    very helpful.

    [mhocko@kernel.org: changelog update]
    Link: http://lkml.kernel.org/r/20180418022912.248417-1-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Acked-by: Johannes Weiner
    Reviewed-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Minchan Kim
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is
    printing spurious messages under memory pressure due to map_addr == -ENOMEM.

    9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
    14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
    16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

    Complain only if -EEXIST, and use %px for printing the address.

    Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
    Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Cc: Andrei Vagin
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Chun-Yi reported a kernel warning message below:

    WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c()
    early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120

    The problem is x86 kexec_file_load adds extra alignment to the efi
    memmap: in bzImage64_load():

    efi_map_sz = efi_get_runtime_map_size();
    efi_map_sz = ALIGN(efi_map_sz, 16);

    And __efi_memmap_init maps with the size including the alignment bytes
    but efi_memmap_unmap use nr_maps * desc_size which does not include the
    extra bytes.

    The alignment in kexec code is only needed for the kexec buffer internal
    use Actually kexec should pass exact size of the efi memmap to 2nd
    kernel.

    Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.com
    Signed-off-by: Dave Young
    Reported-by: joeyli
    Tested-by: Randy Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR
    API") changed last field of /proc/loadavg (last pid allocated) to be off
    by one:

    # unshare -p -f --mount-proc cat /proc/loadavg
    0.00 0.00 0.00 1/60 2
    Cc: "Eric W. Biederman"
    Cc: Gargi Sharma
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • task_dump_owner() has the following code:

    mm = task->mm;
    if (mm) {
    if (get_dumpable(mm) != SUID_DUMP_USER) {
    uid = ...
    }
    }

    Check for ->mm is buggy -- kernel thread might be borrowing mm
    and inode will go to some random uid:gid pair.

    Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The autofs file system mkdir inode operation blindly sets the created
    directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
    cause selinux dac_override denials.

    But the function also checks if the caller is the daemon (as no-one else
    should be able to do anything here) so there's no point in not honouring
    the passed in mode, allowing the daemon to set appropriate mode when
    required.

    Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • The idea behind using kernel@pengutronix.de (i.e. the mail alias for the
    kernel people at Pengutronix) as email address was to have a backup when
    a given developer is on vacation or run over by a bus. Make this more
    explicit by adding the alias as reviewer and use the personal address
    for Sascha and me.

    Link: http://lkml.kernel.org/r/20180413083312.11213-1-u.kleine-koenig@pengutronix.de
    Signed-off-by: Uwe Kleine-König
    Acked-by: Sascha Hauer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • KASAN uses the __no_sanitize_address macro to disable instrumentation of
    particular functions. Right now it's defined only for GCC build, which
    causes false positives when clang is used.

    This patch adds a definition for clang.

    Note, that clang's revision 329612 or higher is required.

    [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
    Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
    Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: David Rientjes
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: David Woodhouse
    Cc: Andrey Konovalov
    Cc: Will Deacon
    Cc: Greg Kroah-Hartman
    Cc: Paul Lawrence
    Cc: Sandipan Das
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Some of the mport_dma_req structure members were initialized late
    inside the do_dma_request() function, just before submitting the
    request to the dma engine. But we have some error branches before
    that. In case of such an error, the code would return on the error
    path and trigger the calling of dma_req_free() with a req structure
    which is not completely initialized. This causes a NULL pointer
    dereference in dma_req_free().

    This patch fixes these error branches by making sure that all
    necessary mport_dma_req structure members are initialized in
    rio_dma_transfer() immediately after the request structure gets
    allocated.

    Link: http://lkml.kernel.org/r/20180412150605.GA31409@nokia.com
    Fixes: bbd876adb8c72 ("rapidio: use a reference count for struct mport_dma_req")
    Signed-off-by: Ioan Nicu
    Tested-by: Alexander Sverdlin
    Acked-by: Alexandre Bounine
    Cc: Barry Wood
    Cc: Matt Porter
    Cc: Christophe JAILLET
    Cc: Logan Gunthorpe
    Cc: Chris Wilson
    Cc: Tvrtko Ursulin
    Cc: Frank Kunz
    Cc: [4.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ioan Nicu
     
  • My testing for the latest kernel supporting thp migration showed an
    infinite loop in offlining the memory block that is filled with shmem
    thps. We can get out of the loop with a signal, but kernel should return
    with failure in this case.

    What happens in the loop is that scan_movable_pages() repeats returning
    the same pfn without any progress. That's because page migration always
    fails for shmem thps.

    In memory offline code, memory blocks containing unmovable pages should be
    prevented from being offline targets by has_unmovable_pages() inside
    start_isolate_page_range(). So it's possible to change migratability for
    non-anonymous thps to avoid the issue, but it introduces more complex and
    thp-specific handling in migration code, so it might not good.

    So this patch is suggesting to fix the issue by enabling thp migration for
    shmem thp. Both of anon/shmem thp are migratable so we don't need
    precheck about the type of thps.

    Link: http://lkml.kernel.org/r/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp
    Fixes: commit 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early")
    Signed-off-by: Naoya Horiguchi
    Acked-by: Kirill A. Shutemov
    Cc: Zi Yan
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
    the page's memcg is undergoing move accounting, which occurs when a
    process leaves its memcg for a new one that has
    memory.move_charge_at_immigrate set.

    unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
    the given inode is switching writeback domains. Switches occur when
    enough writes are issued from a new domain.

    This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

    If both inode switch and process memcg migration are both in-flight then
    unlocked_inode_to_wb_end() will unconditionally enable interrupts while
    still holding the lock_page_memcg() irq spinlock. This suggests the
    possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end


    end_page_writeback
    test_clear_page_writeback
    lock_page_memcg

    unlock_page_memcg

    Due to configuration limitations this deadlock is not currently possible
    because we don't mix cgroup writeback (a cgroupv2 feature) and
    memory.move_charge_at_immigrate (a cgroupv1 feature).

    If the kernel is hacked to always claim inode switching and memcg
    moving_account, then this script triggers lockup in less than a minute:

    cd /mnt/cgroup/memory
    mkdir a b
    echo 1 > a/memory.move_charge_at_immigrate
    echo 1 > b/memory.move_charge_at_immigrate
    (
    echo $BASHPID > a/cgroup.procs
    while true; do
    dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
    ) &
    while true; do
    sync
    done &
    sleep 1h &
    SLEEP=$!
    while true; do
    echo $SLEEP > a/cgroup.procs
    echo $SLEEP > b/cgroup.procs
    done

    The deadlock does not seem possible, so it's debatable if there's any
    reason to modify the kernel. I suggest we should to prevent future
    surprises. And Wang Long said "this deadlock occurs three times in our
    environment", so there's more reason to apply this, even to stable.
    Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch
    see "[PATCH for-4.4] writeback: safer lock nesting"
    https://lkml.org/lkml/2018/4/11/146

    Wang Long said "this deadlock occurs three times in our environment"

    [gthelen@google.com: v4]
    Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
    [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
    Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
    Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
    Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
    Signed-off-by: Greg Thelen
    Reported-by: Wang Long
    Acked-by: Wang Long
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Cc: Nicholas Piggin
    Cc: [v4.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • The swap offset reported by /proc//pagemap may be not correct for
    PMD migration entries. If addr passed into pagemap_pmd_range() isn't
    aligned with PMD start address, the swap offset reported doesn't
    reflect this. And in the loop to report information of each sub-page,
    the swap offset isn't increased accordingly as that for PFN.

    This may happen after opening /proc//pagemap and seeking to a page
    whose address doesn't align with a PMD start address. I have verified
    this with a simple test program.

    BTW: migration swap entries have PFN information, do we need to restrict
    whether to show them?

    [akpm@linux-foundation.org: fix typo, per Huang, Ying]
    Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Andrei Vagin
    Cc: Dan Williams
    Cc: "Jerome Glisse"
    Cc: Daniel Colascione
    Cc: Zi Yan
    Cc: Naoya Horiguchi
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Li Wang has reported that LTP move_pages04 test fails with the current
    tree:

    LTP move_pages04:
    TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT

    The test allocates an array of two pages, one is present while the other
    is not (resp. backed by zero page) and it expects EFAULT for the second
    page as the man page suggests. We are reporting EPERM which doesn't make
    any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter
    THP migration").

    do_pages_move tries to handle as many pages in one batch as possible so we
    queue all pages with the same node target together and that corresponds to
    [start, i] range which is then used to update status array.
    add_page_for_migration will correctly notice the zero (resp. !present)
    page and returns with EFAULT which gets written to the status. But if
    this is the last page in the array we do not update start and so the last
    store_status after the loop will overwrite the range of the last batch
    with NUMA_NO_NODE (which corresponds to EPERM).

    Fix this by simply bailing out from the last flush if the pagelist is
    empty as there is clearly nothing more to do.

    Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org
    Fixes: cf5f16b23ec9 ("mm: unclutter THP migration")
    Signed-off-by: Michal Hocko
    Reported-by: Li Wang
    Tested-by: Li Wang
    Cc: Zi Yan
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • One of the classes of kernel stack content leaks[1] is exposing the
    contents of prior heap or stack contents when a new process stack is
    allocated. Normally, those stacks are not zeroed, and the old contents
    remain in place. In the face of stack content exposure flaws, those
    contents can leak to userspace.

    Fixing this will make the kernel no longer vulnerable to these flaws, as
    the stack will be wiped each time a stack is assigned to a new process.
    There's not a meaningful change in runtime performance; it almost looks
    like it provides a benefit.

    Performing back-to-back kernel builds before:
    Run times: 157.86 157.09 158.90 160.94 160.80
    Mean: 159.12
    Std Dev: 1.54

    and after:
    Run times: 159.31 157.34 156.71 158.15 160.81
    Mean: 158.46
    Std Dev: 1.46

    Instead of making this a build or runtime config, Andy Lutomirski
    recommended this just be enabled by default.

    [1] A noisy search for many kinds of stack content leaks can be seen here:
    https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak

    I did some more with perf and cycle counts on running 100,000 execs of
    /bin/true.

    before:
    Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
    Mean: 221015379122.60
    Std Dev: 4662486552.47

    after:
    Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
    Mean: 217745009865.40
    Std Dev: 5935559279.99

    It continues to look like it's faster, though the deviation is rather
    wide, but I'm not sure what I could do that would be less noisy. I'm
    open to ideas!

    Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
    Signed-off-by: Kees Cook
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Laura Abbott
    Cc: Rasmus Villemoes
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French
    Reported-by: Long Li

    Aurelien Aptel