01 Mar, 2016

1 commit


25 Feb, 2016

2 commits

  • The AMD Family 15h Models 30h-3Fh (Kaveri) BIOS and Kernel Developer's
    Guide omitted part of the BIOS IOMMU L2 register setup specification.
    Without this setup the IOMMU L2 does not fully respect write permissions
    when handling an ATS translation request.

    The IOMMU L2 will set PTE dirty bit when handling an ATS translation with
    write permission request, even when PTE RW bit is clear. This may occur by
    direct translation (which would cause a PPR) or by prefetch request from
    the ATC.

    This is observed in practice when the IOMMU L2 modifies a PTE which maps a
    pagecache page. The ext4 filesystem driver BUGs when asked to writeback
    these (non-modified) pages.

    Enable ATS write permission check in the Kaveri IOMMU L2 if BIOS has not.

    Signed-off-by: Jay Cornwall
    Cc: # v3.19+
    Signed-off-by: Joerg Roedel

    Jay Cornwall
     
  • The setup code for the performance counters in the AMD IOMMU driver
    tests whether the counters can be written. It tests to setup a counter
    for device 00:00.0, which fails on systems where this particular device
    is not covered by the IOMMU.

    Fix this by not relying on device 00:00.0 but only on the IOMMU being
    present.

    Cc: stable@vger.kernel.org
    Signed-off-by: Suravee Suthikulpanit
    Signed-off-by: Joerg Roedel

    Suravee Suthikulpanit
     

21 Feb, 2016

8 commits

  • Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "This is unusually large, partly due to the EFI fixes that prevent
    accidental deletion of EFI variables through efivarfs that may brick
    machines. These fixes are somewhat involved to maintain compatibility
    with existing install methods and other usage modes, while trying to
    turn off the 'rm -rf' bricking vector.

    Other fixes are for large page ioremap()s and for non-temporal
    user-memcpy()s"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm: Fix vmalloc_fault() to handle large pages properly
    hpet: Drop stale URLs
    x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
    x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
    lib/ucs2_string: Correct ucs2 -> utf8 conversion
    efi: Add pstore variables to the deletion whitelist
    efi: Make efivarfs entries immutable by default
    efi: Make our variable validation list include the guid
    efi: Do variable name validation tests in utf8
    efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
    lib/ucs2_string: Add ucs2 -> utf8 helper functions

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "A handful of CPU hotplug related fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/core: Plug potential memory leak in CPU_UP_PREPARE
    perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
    perf/core: Remove bogus UP_CANCELED hotplug state
    perf/x86/amd/uncore: Plug reference leak

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
    - Fix dedotify for binutils >= 2.26 from Andreas Schwab
    - Don't trace hcalls on offline CPUs from Denis Kirjanov
    - eeh: Fix stale cached primary bus from Gavin Shan
    - eeh: Fix stale PE primary bus from Gavin Shan
    - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
    - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy

    * tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/ioda: Set "read" permission when "write" is set
    powerpc/mm: Fix Multi hit ERAT cause by recent THP update
    powerpc/powernv: Fix stale PE primary bus
    powerpc/eeh: Fix stale cached primary bus
    powerpc/pseries: Don't trace hcalls on offline CPUs
    powerpc: Fix dedotify for binutils >= 2.26
    powerpc/book3s_32: Fix build error with checkpoint restart

    Linus Torvalds
     
  • Pull dmaengine fixes from Vinod Koul:
    "A few fixes for drivers, nothing major here.

    Fixes are: iotdma fix to restart channels, new ID for wildcat PCH,
    residue fix for edma, disable irq for non-cyclic in dw"

    * tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
    dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer
    dmaengine: edma: fix residue race for cyclic
    dmaengine: dw: pci: add ID for WildcatPoint PCH
    dmaengine: IOATDMA: fix timer code that continues to restart channels during idle

    Linus Torvalds
     
  • Pull clk driver fixes from Stephen Boyd:
    "An assortment of vendor specific clk drivers fixes, most notably
    fallout from adding Tegra210 and rockchip rk3036/rk3368 drivers this
    cycle.

    There's also the random smattering of sparse/checker fixes, a build
    "fix" to get the Tango clk driver to compile because the Kconfig
    symbol was renamed after the fact, and a clk gpio fix for a patch
    mismerge"

    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (28 commits)
    clk: gpio: Really allow an optional clock= DT property
    Revert "clk: qcom: Specify LE device endianness"
    clk: versatile: mask VCO bits before writing
    clk: tegra: super: Fix sparse warnings for functions not declared as static
    clk: tegra: Fix sparse warnings for functions not declared as static
    clk: tegra: Fix sparse warning for pll_m
    clk: tegra: Use definition for pll_u override bit
    clk: tegra: Fix warning caused by pll_u failing to lock
    clk: tegra: Fix clock sources for Tegra210 EMC
    clk: tegra: Add the APB2APE audio clock on Tegra210
    clk: tegra: Add missing of_node_put()
    clk: tegra: Fix PLLE SS coefficients
    clk: tegra: Fix typos around clearing PLLE bits during enable
    clk: tegra: Do not disable PLLE when under hardware control
    clk: tegra: Fix pllx dyn step calculation
    clk: tegra: pll: Fix potential sleeping-while-atomic
    clk: tegra: Fix the misnaming of nvenc from msenc
    clk: tegra: Fix naming of MISC registers
    clk: tango4: rename ARCH_TANGOX to ARCH_TANGO
    clk: scpi: Fix checking return value of platform_device_register_simple()
    ...

    Linus Torvalds
     
  • Pull more drm fixes from Dave Airlie:
    "Some more fixes trickled in:

    A bunch of VC4 ones since it's a pretty new driver not much chance of
    regressions, and it fixes GPU resets.

    Also one atomic fix, one set of fixes for a common bug in TTM cleanup,
    and one i915 hotplug fix"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    drm/nouveau: use post-decrement in error handling
    drm/atomic: Allow for holes in connector state, v2.
    drm/i915: Fix hpd live status bits for g4x
    drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.
    drm/vc4: Enable runtime PM.
    drm/vc4: Fix spurious GPU resets due to BO reuse.
    drm/vc4: Drop error message on seqno wait timeouts.
    drm/vc4: Fix -ERESTARTSYS error return from BO waits.
    drm/vc4: Return an ERR_PTR from BO creation instead of NULL.
    drm/vc4: Fix the clear color for the first tile rendered.
    drm/vc4: Validate that WAIT_BO padding is cleared.
    drm/radeon: use post-decrement in error handling
    drm/amdgpu: use post-decrement in error handling

    Linus Torvalds
     
  • In __request_region, if a conflict with a BUSY and MUXED resource is
    detected, then the caller goes to sleep and waits for the resource to be
    released. A pointer on the conflicting resource is kept. At wake-up
    this pointer is used as a parent to retry to request the region.

    A first problem is that this pointer might well be invalid (if for
    example the conflicting resource have already been freed). Another
    problem is that the next call to __request_region() fails to detect a
    remaining conflict. The previously conflicting resource is passed as a
    parameter and __request_region() will look for a conflict among the
    children of this resource and not at the resource itself. It is likely
    to succeed anyway, even if there is still a conflict.

    Instead, the parent of the conflicting resource should be passed to
    __request_region().

    As a fix, this patch doesn't update the parent resource pointer in the
    case we have to wait for a muxed region right after.

    Reported-and-tested-by: Vincent Pelletier
    Signed-off-by: Simon Guinot
    Tested-by: Vincent Donnefort
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Simon Guinot
     

20 Feb, 2016

7 commits

  • Pull ext4 bugfixes from Ted Ts'o:
    "Miscellaneous ext4 bug fixes for v4.5"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix crashes in dioread_nolock mode
    ext4: fix bh->b_state corruption
    ext4: fix memleak in ext4_readdir()
    ext4: remove unused parameter "newblock" in convert_initialized_extent()
    ext4: don't read blocks from disk after extents being swapped
    ext4: fix potential integer overflow
    ext4: add a line break for proc mb_groups display
    ext4: ioctl: fix erroneous return value
    ext4: fix scheduling in atomic on group checksum failure
    ext4 crypto: move context consistency check to ext4_file_open()
    ext4 crypto: revalidate dentry after adding or removing the key

    Linus Torvalds
     
  • Pull btrfs fix from Chris Mason:
    "My for-linus-4.5 branch has a btrfs DIO error passing fix.

    I know how much you love DIO, so I'm going to suggest against reading
    it. We'll follow up with a patch to drop the error arg from
    dio_end_io in the next merge window."

    * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix direct IO requests not reporting IO error to user space

    Linus Torvalds
     
  • Merge fixes from Andrew Morton:
    "10 fixes"

    * emailed patches from Andrew Morton :
    mm: slab: free kmem_cache_node after destroy sysfs file
    ipc/shm: handle removed segments gracefully in shm_mmap()
    MAINTAINERS: update Kselftest Framework mailing list
    devm_memremap_release(): fix memremap'd addr handling
    mm/hugetlb.c: fix incorrect proc nr_hugepages value
    mm, x86: fix pte_page() crash in gup_pte_range()
    fsnotify: turn fsnotify reaper thread into a workqueue job
    Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
    mm: fix regression in remap_file_pages() emulation
    thp, dax: do not try to withdraw pgtable from non-anon VMA

    Linus Torvalds
     
  • Pull arm64 fixes from Will Deacon:
    "Here are some more arm64 fixes for 4.5. This has mostly come from
    Yang Shi, who saw some issues under -rt that also affect mainline.
    The rest of it is pretty small, but still worth having.

    We've got an old issue outstanding with valid_user_regs which will
    likely wait until 4.6 (since it would really benefit from some time in
    -next) and another issue with kasan and idle which should be fixed
    next week.

    Apart from that, pretty quiet here (and still no sign of the THP issue
    reported on s390...)

    Summary:

    - Allow EFI stub to use strnlen(), which is required by recent libfdt

    - Avoid smp_processor_id() in preempt context during unwinding

    - Avoid false Kasan warnings during unwinding

    - Ensure early devices are picked up by the IOMMU DMA ops

    - Avoid rebuilding the kernel for the 'install' target

    - Run fixup handlers for alignment faults on userspace access"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: mm: allow the kernel to handle alignment faults on user accesses
    arm64: kbuild: make "make install" not depend on vmlinux
    arm64: dma-mapping: fix handling of devices registered before arch_initcall
    arm64/efi: Make strnlen() available to the EFI namespace
    arm/arm64: crypto: assure that ECB modes don't require an IV
    arm64: make irq_stack_ptr more robust
    arm64: debug: re-enable irqs before sending breakpoint SIGTRAP
    arm64: disable kasan when accessing frame->fp in unwind_frame

    Linus Torvalds
     
  • Pull s390 fixes from Martin Schwidefsky:
    "Several bug fixes:

    - There are four different stack tracers, and three of them have
    bugs. For 4.5 the bugs are fixed and we prepare a cleanup patch
    for the next merge window.

    - Three bug fixes for the dasd driver in regard to parallel access
    volumes and the new max_dev_sectors block device queue limit

    - The irq restore optimization needs a fixup for memcpy_real

    - The diagnose trace code has a conflict with lockdep"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/dasd: fix performance drop
    s390/maccess: reduce stnsm instructions
    s390/diag: avoid lockdep recursion
    s390/dasd: fix refcount for PAV reassignment
    s390/dasd: prevent incorrect length error under z/VM after PAV changes
    s390: fix DAT off memory access, e.g. on kdump
    s390/oprofile: fix address range for asynchronous stack
    s390/perf_event: fix address range for asynchronous stack
    s390/stacktrace: add save_stack_trace_regs()
    s390/stacktrace: save full stack traces
    s390/stacktrace: add missing end marker
    s390/stacktrace: fix address ranges for asynchronous and panic stack
    s390/stacktrace: fix save_stack_trace_tsk() for current task

    Linus Torvalds
     
  • Pull Pin control fixes from Linus Walleij:
    "Pin control fixes for the v4.5 series, all are individual driver
    fixes:

    - Fix the PXA2xx driver to export its init function so we do not
    break modular compiles.
    - Hide unused functions in the Nomadik driver.
    - Fix up direction control in the Mediatek driver.
    - Toggle the sunxi GPIO lines to input when you read them on the H3
    GPIO controller, lest you only get garbage.
    - Fix up the number of settings in the MVEBU driver.
    - Fix a serious SMP race condition in the Samsung driver"

    * tag 'pinctrl-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
    pinctrl: samsung: fix SMP race condition
    pinctrl: mvebu: fix num_settings in mpp group assignment
    pinctrl: sunxi: H3 requires irq_read_needs_mux
    pinctrl: mediatek: fix direction control issue
    pinctrl: nomadik: hide unused functions
    pinctrl: pxa: export pxa2xx_pinctrl_init()

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "This update contains again a few more fixes for ALSA core stuff
    although it's no longer high flux: two race fixes in sequencer and one
    PCM race fix for non-atomic PCM ops.

    In addition, HD-audio gained a similar fix for race at reloading the
    driver"

    * tag 'sound-4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream
    ALSA: seq: Fix double port list deletion
    ALSA: hda - Cancel probe work instead of flush at remove
    ALSA: seq: Fix leak of pool buffer at concurrent writes

    Linus Torvalds
     

19 Feb, 2016

22 commits

  • Although we don't expect to take alignment faults on access to normal
    memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into
    system calls, leading to things like get_user accessing device memory.

    Rather than OOPS the kernel, allow any exception fixups to run and
    return something like -EFAULT back to userspace. This makes the
    behaviour more consistent with userspace, even though applications with
    access to device mappings can easily cause other issues if they try
    hard enough.

    Acked-by: Catalin Marinas
    Signed-off-by: Eun Taik Lee
    [will: dropped __kprobes annotation and rewrote commit mesage]
    Signed-off-by: Will Deacon

    EunTaik Lee
     
  • For the same reason as commit 19514fc665ff ("arm, kbuild: make "make
    install" not depend on vmlinux"), the install targets should never
    trigger the rebuild of the kernel.

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Will Deacon

    Masahiro Yamada
     
  • Competing overwrite DIO in dioread_nolock mode will just overwrite
    pointer to io_end in the inode. This may result in data corruption or
    extent conversion happening from IO completion interrupt because we
    don't properly set buffer_defer_completion() when unlocked DIO races
    with locked DIO to unwritten extent.

    Since unlocked DIO doesn't need io_end for anything, just avoid
    allocating it and corrupting pointer from inode for locked DIO.
    A cleaner fix would be to avoid these games with io_end pointer from the
    inode but that requires more intrusive changes so we leave that for
    later.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • ext4 can update bh->b_state non-atomically in _ext4_get_block() and
    ext4_da_get_block_prep(). Usually this is fine since bh is just a
    temporary storage for mapping information on stack but in some cases it
    can be fully living bh attached to a page. In such case non-atomic
    update of bh->b_state can race with an atomic update which then gets
    lost. Usually when we are mapping bh and thus updating bh->b_state
    non-atomically, nobody else touches the bh and so things work out fine
    but there is one case to especially worry about: ext4_finish_bio() uses
    BH_Uptodate_Lock on the first bh in the page to synchronize handling of
    PageWriteback state. So when blocksize < pagesize, we can be atomically
    modifying bh->b_state of a buffer that actually isn't under IO and thus
    can race e.g. with delalloc trying to map that buffer. The result is
    that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
    corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

    Fix the problem by always updating bh->b_state bits atomically.

    CC: stable@vger.kernel.org
    Reported-by: Nikolay Borisov
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • We need to use post-decrement to get the dma_map_page undone also for
    i==0, and to avoid some very unpleasant behaviour if dma_map_page
    failed already at i==0.

    Signed-off-by: Rasmus Villemoes
    Reviewed-by: Ben Skeggs
    Signed-off-by: Dave Airlie

    Rasmus Villemoes
     
  • Because we record connector_mask using 1 << drm_connector_index now
    the connector_mask should stay the same even when other connectors
    are removed. This was not the case with MST, in that case when removing
    a connector all other connectors may change their index.

    This is fixed by waiting until the first get_connector_state to allocate
    connector_state, and force reallocation when state is too small.

    As a side effect connector arrays no longer have to be preallocated,
    and can be allocated on first use which means a less allocations in
    the page flip only path.

    Changes since v1:
    - Whitespace. (Ville)
    - Call ida_remove when destroying the connector. (Ville)
    - u32 alloc -> int. (Ville)

    Fixes: 14de6c44d149 ("drm/atomic: Remove drm_atomic_connectors_for_crtc.")
    Signed-off-by: Maarten Lankhorst
    Cc: Ville Syrjälä
    Reviewed-by: Lyude
    Reviewed-by: Ville Syrjälä
    Signed-off-by: Dave Airlie

    Maarten Lankhorst
     
  • We mis-merged the original patch from Russell here and so the
    patch went almost all the way, except that we still failed to
    probe when there wasn't a clocks property in the DT node. Allow
    that case by making a negative value from
    of_clk_get_parent_count() into "no parents", like the original
    patch did.

    Fixes: 7ed88aa2efa5 ("clk: fix clk-gpio.c with optional clock= DT property")
    Cc: Russell King
    Cc: Michael Turquette
    Signed-off-by: Stephen Boyd

    Stephen Boyd
     
  • This pull request fixes GPU reset (which was disabled shortly after
    V3D integration due to build breakage) and waits for idle in the
    presence of signals (which X likes to do a lot).

    * tag 'drm-vc4-fixes-2016-02-17' of github.com:anholt/linux:
    drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.
    drm/vc4: Enable runtime PM.
    drm/vc4: Fix spurious GPU resets due to BO reuse.
    drm/vc4: Drop error message on seqno wait timeouts.
    drm/vc4: Fix -ERESTARTSYS error return from BO waits.
    drm/vc4: Return an ERR_PTR from BO creation instead of NULL.
    drm/vc4: Fix the clear color for the first tile rendered.
    drm/vc4: Validate that WAIT_BO padding is cleared.

    Dave Airlie
     
  • Just two small fixes in the ttm_tt_populate error handling; one for radeon,
    one for amdgpu.

    * 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux:
    drm/radeon: use post-decrement in error handling
    drm/amdgpu: use post-decrement in error handling

    Dave Airlie
     
  • single g4x hpd fix.

    * tag 'drm-intel-fixes-2016-02-18' of git://anongit.freedesktop.org/drm-intel:
    drm/i915: Fix hpd live status bits for g4x

    Dave Airlie
     
  • Pull livepatching fixes from Jiri Kosina:

    - regression (from 4.4) fix for ordering issue, introduced by an
    earlier ftrace change, that broke live patching of modules.

    The fix replaces the ftrace module notifier by direct call in order
    to make the ordering guaranteed and well-defined. The patch, from
    Jessica Yu, has been acked both by Steven and Rusty

    - error message fix from Miroslav Benes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    ftrace/module: remove ftrace module notifier
    livepatch: change the error message in asm/livepatch.h header files

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "Two simple fixes.

    One prevents a soft lockup on some target removal scenarios and the
    other prevents us trying to probe the marvell console device, which
    causes it to time out and need the bus resetting"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: fix soft lockup in scsi_remove_target() on module removal
    SCSI: Add Marvell configuration device to VPD blacklist

    Linus Torvalds
     
  • When slub_debug alloc_calls_show is enabled we will try to track
    location and user of slab object on each online node, kmem_cache_node
    structure and cpu_cache/cpu_slub shouldn't be freed till there is the
    last reference to sysfs file.

    This fixes the following panic:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    IP: list_locations+0x169/0x4e0
    PGD 257304067 PUD 438456067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30
    Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011
    task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000
    RIP: list_locations+0x169/0x4e0
    Call Trace:
    alloc_calls_show+0x1d/0x30
    slab_attr_show+0x1b/0x30
    sysfs_read_file+0x9a/0x1a0
    vfs_read+0x9c/0x170
    SyS_read+0x58/0xb0
    system_call_fastpath+0x16/0x1b
    Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10
    CR2: 0000000000000020

    Separated __kmem_cache_release from __kmem_cache_shutdown which now
    called on slab_kmem_cache_release (after the last reference to sysfs
    file object has dropped).

    Reintroduced locking in free_partial as sysfs file might access cache's
    partial list after shutdowning - partial revert of the commit
    69cb8e6b7c29 ("slub: free slabs without holding locks"). Zap
    __remove_partial and use remove_partial (w/o underscores) as
    free_partial now takes list_lock which s partial revert for commit
    1e4dd9461fab ("slub: do not assert not having lock in removing freed
    partial")

    Signed-off-by: Dmitry Safonov
    Suggested-by: Vladimir Davydov
    Acked-by: Vladimir Davydov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Safonov
     
  • remap_file_pages(2) emulation can reach file which represents removed
    IPC ID as long as a memory segment is mapped. It breaks expectations of
    IPC subsystem.

    Test case (rewritten to be more human readable, originally autogenerated
    by syzkaller[1]):

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #define PAGE_SIZE 4096

    int main()
    {
    int id;
    void *p;

    id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
    p = shmat(id, NULL, 0);
    shmctl(id, IPC_RMID, NULL);
    remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);

    return 0;
    }

    The patch changes shm_mmap() and code around shm_lock() to propagate
    locking error back to caller of shm_mmap().

    [1] http://github.com/google/syzkaller

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Dmitry Vyukov
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Kselftest Framework now has a dedicated mailing list linux-kselftest.
    Update the entry in MAINTAINERS file.

    Signed-off-by: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shuah Khan
     
  • The pmem driver calls devm_memremap() to map a persistent memory range.
    When the pmem driver is unloaded, this memremap'd range is not released
    so the kernel will leak a vma.

    Fix devm_memremap_release() to handle a given memremap'd address
    properly.

    Signed-off-by: Toshi Kani
    Acked-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Currently incorrect default hugepage pool size is reported by proc
    nr_hugepages when number of pages for the default huge page size is
    specified twice.

    When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages
    indicates the current number of pre-allocated huge pages of the default
    size. Basically /proc/sys/vm/nr_hugepages displays default_hstate->
    max_huge_pages and after boot time pre-allocation, max_huge_pages should
    equal the number of pre-allocated pages (nr_hugepages).

    Test case:

    Note that this is specific to x86 architecture.

    Boot the kernel with command line option 'default_hugepagesz=1G
    hugepages=X hugepagesz=2M hugepages=Y hugepagesz=1G hugepages=Z'. After
    boot, 'cat /proc/sys/vm/nr_hugepages' and 'sysctl -a | grep hugepages'
    returns the value X. However, dmesg output shows that Z huge pages were
    pre-allocated.

    So, the root cause of the problem here is that the global variable
    default_hstate_max_huge_pages is set if a default huge page size is
    specified (directly or indirectly) on the command line. After the command
    line processing in hugetlb_init, if default_hstate_max_huge_pages is set,
    the value is assigned to default_hstae.max_huge_pages. However,
    default_hstate.max_huge_pages may have already been set based on the
    number of pre-allocated huge pages of default_hstate size.

    The solution to this problem is if hstate->max_huge_pages is already set
    then it should not set as a result of global max_huge_pages value.
    Basically if the value of the variable hugepages is set multiple times on
    a command line for a specific supported hugepagesize then proc layer
    should consider the last specified value.

    Signed-off-by: Vaishali Thakkar
    Reviewed-by: Naoya Horiguchi
    Cc: Mike Kravetz
    Cc: Hillf Danton
    Cc: Kirill A. Shutemov
    Cc: Dave Hansen
    Cc: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vaishali Thakkar
     
  • Commit 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") has
    moved up the pte_page(pte) in x86's fast gup_pte_range(), for no
    discernible reason: put it back where it belongs, after the pte_flags
    check and the pfn_valid cross-check.

    That may be the cause of the NULL pointer dereference in
    gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a
    qemu-kvm based VM.

    Signed-off-by: Hugh Dickins
    Reported-by: Michael Long
    Tested-by: Michael Long
    Acked-by: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • We don't require a dedicated thread for fsnotify cleanup. Switch it
    over to a workqueue job instead that runs on the system_unbound_wq.

    In the interest of not thrashing the queued job too often when there are
    a lot of marks being removed, we delay the reaper job slightly when
    queueing it, to allow several to gather on the list.

    Signed-off-by: Jeff Layton
    Tested-by: Eryu Guan
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • This reverts commit c510eff6beba ("fsnotify: destroy marks with
    call_srcu instead of dedicated thread").

    Eryu reported that he was seeing some OOM kills kick in when running a
    testcase that adds and removes inotify marks on a file in a tight loop.

    The above commit changed the code to use call_srcu to clean up the
    marks. While that does (in principle) work, the srcu callback job is
    limited to cleaning up entries in small batches and only once per jiffy.
    It's easily possible to overwhelm that machinery with too many call_srcu
    callbacks, and Eryu's reproduer did just that.

    There's also another potential problem with using call_srcu here. While
    you can obviously sleep while holding the srcu_read_lock, the callbacks
    run under local_bh_disable, so you can't sleep there.

    It's possible when putting the last reference to the fsnotify_mark that
    we'll end up putting a chain of references including the fsnotify_group,
    uid, and associated keys. While I don't see any obvious ways that that
    could occurs, it's probably still best to avoid using call_srcu here
    after all.

    This patch reverts the above patch. A later patch will take a different
    approach to eliminated the dedicated thread here.

    Signed-off-by: Jeff Layton
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Cc: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • Grazvydas Ignotas has reported a regression in remap_file_pages()
    emulation.

    Testcase:
    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #define SIZE (4096 * 3)

    int main(int argc, char **argv)
    {
    unsigned long *p;
    long i;

    p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (p == MAP_FAILED) {
    perror("mmap");
    return -1;
    }

    for (i = 0; i < SIZE / 4096; i++)
    p[i * 4096 / sizeof(*p)] = i;

    if (remap_file_pages(p, 4096, 0, 1, 0)) {
    perror("remap_file_pages");
    return -1;
    }

    if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) {
    perror("remap_file_pages");
    return -1;
    }

    assert(p[0] == 1);

    munmap(p, SIZE);

    return 0;
    }

    The second remap_file_pages() fails with -EINVAL.

    The reason is that remap_file_pages() emulation assumes that the target
    vma covers whole area we want to over map. That assumption is broken by
    first remap_file_pages() call: it split the area into two vma.

    The solution is to check next adjacent vmas, if they map the same file
    with the same flags.

    Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
    Signed-off-by: Kirill A. Shutemov
    Reported-by: Grazvydas Ignotas
    Tested-by: Grazvydas Ignotas
    Cc: [4.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • DAX doesn't deposit pgtables when it maps huge pages: nothing to
    withdraw. It can lead to crash.

    Signed-off-by: Kirill A. Shutemov
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Andrea Arcangeli
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov