07 May, 2016

16 commits

  • There are no callers except through the file_operations struct below
    this, so it should be static like everything else here.

    Signed-off-by: Peter Jones
    Signed-off-by: Matt Fleming
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1462570771-13324-6-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Peter Jones
     
  • The parameters atomic and duplicates of efivar_init always have opposite
    values. Drop the parameter atomic, replace the uses of !atomic with
    duplicates, and update the call sites accordingly.

    The code using duplicates is slightly reorganized with an 'else', to avoid
    duplicating the lock code.

    Signed-off-by: Julia Lawall
    Signed-off-by: Matt Fleming
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jeremy Kerr
    Cc: Linus Torvalds
    Cc: Matthew Garrett
    Cc: Peter Zijlstra
    Cc: Saurabh Sengar
    Cc: Thomas Gleixner
    Cc: Vaishali Thakkar
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1462570771-13324-5-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Julia Lawall
     
  • Dan Carpenter reports that passing the address of the pointer to the
    kmalloc()'d memory for 'capsule' is dangerous:

    "drivers/firmware/efi/capsule.c:109 efi_capsule_supported()
    warn: did you mean to pass the address of 'capsule'

    108
    109 status = efi.query_capsule_caps(&capsule, 1, &max_size, reset);
    ^^^^^^^^
    If we modify capsule inside this function call then at the end of the
    function we aren't freeing the original pointer that we allocated."

    Ard Biesheuvel noted that we don't even need to call kmalloc() since the
    object we allocate isn't very big and doesn't need to persist after the
    function returns.

    Place 'capsule' on the stack instead.

    Suggested-by: Ard Biesheuvel
    Reported-by: Dan Carpenter
    Signed-off-by: Matt Fleming
    Acked-by: Ard Biesheuvel
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Bryan O'Donoghue
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Kweh Hock Leong
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: joeyli
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1462570771-13324-4-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • GCC complains about a newly added file for the EFI Bootloader Control:

    drivers/firmware/efi/efibc.c: In function 'efibc_set_variable':
    drivers/firmware/efi/efibc.c:53:1: error: the frame size of 2272 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

    The problem is the declaration of a local variable of type struct
    efivar_entry, which is by itself larger than the warning limit of 1024
    bytes.

    Use dynamic memory allocation instead of stack memory for the entry
    object.

    This patch also fixes a potential buffer overflow.

    Reported-by: Ingo Molnar
    Reported-by: Arnd Bergmann
    Signed-off-by: Jeremy Compostella
    [ Updated changelog to include GCC error ]
    Signed-off-by: Matt Fleming
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1462570771-13324-3-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Jeremy Compostella
     
  • Taking a mutex in the reboot path is bogus because we cannot sleep
    with interrupts disabled, such as when rebooting due to panic(),

    BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
    in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched
    Call Trace:
    dump_stack+0x63/0x89
    ___might_sleep+0xd8/0x120
    __might_sleep+0x49/0x80
    mutex_lock+0x20/0x50
    efi_capsule_pending+0x1d/0x60
    native_machine_emergency_restart+0x59/0x280
    machine_emergency_restart+0x19/0x20
    emergency_restart+0x18/0x20
    panic+0x1ba/0x217

    In this case all other CPUs will have been stopped by the time we
    execute the platform reboot code, so 'capsule_pending' cannot change
    under our feet. We wouldn't care even if it could since we cannot wait
    for it complete.

    Also, instead of relying on the external 'system_state' variable just
    use a reboot notifier, so we can set 'stop_capsules' while holding
    'capsule_mutex', thereby avoiding a race where system_state is updated
    while we're in the middle of efi_capsule_update_locked() (since CPUs
    won't have been stopped at that point).

    Signed-off-by: Matt Fleming
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Bryan O'Donoghue
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Kweh Hock Leong
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: joeyli
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1462570771-13324-2-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Pull writeback fix from Jens Axboe:
    "Just a single fix for domain aware writeback, fixing a regression that
    can cause balance_dirty_pages() to keep looping while not getting any
    work done"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    writeback: Fix performance regression in wb_over_bg_thresh()

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "This contains two fixes: a boot fix for older SGI/UV systems, and an
    APIC calibration fix"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
    x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init

    Linus Torvalds
     
  • Pull power management and ACPI fixes from Rafael Wysocki:
    "Fixes for problems introduced or discovered recently (intel_pstate,
    sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework,
    generic device properties framework) and one fix for a hotplug-related
    deadlock in ACPICA that's been there forever, but is nasty enough.

    Specifics:

    - Fix for a recent regression in the intel_pstate driver causing it
    to fail to restore the HWP (HW-managed P-states) configuration of
    the boot CPU after suspend-to-RAM (Rafael Wysocki).

    - Fix for two recent regressions in the intel_pstate driver, one that
    can trigger a divide by zero if the driver is accessed via sysfs
    before it manages to take the first sample and one causing it to
    fail to update a structure field used in a trace point, so the
    information coming from it is less useful (Rafael Wysocki).

    - Fix for a problem in the sti-cpufreq driver introduced during the
    4.5 cycle that causes it to break CPU PM in multi-platform kernels
    by registering cpufreq-dt (which subsequently doesn't work)
    unconditionally and preventing the driver that would actually work
    from registering (Sudeep Holla).

    - Stable-candidate fix for an ARM64 cpuidle issue causing idle state
    usage counters to be incorrectly updated for idle states that were
    not entered due to errors (James Morse).

    - Fix for a recently introduced issue in the OPP (Operating
    Performance Points) framework causing it to print bogus error
    messages for missing optional regulators (Viresh Kumar).

    - Fix for a recently introduced issue in the generic device
    properties framework that may cause it to attempt to dereferece and
    invalid pointer in some cases (Heikki Krogerus).

    - Fix for a deadlock in the ACPICA core that may be triggered by
    device (eg Thunderbolt) hotplug (Prarit Bhargava)"

    * tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM / OPP: Remove useless check
    ACPICA: Dispatcher: Update thread ID for recursive method calls
    intel_pstate: Fix intel_pstate_get()
    cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
    cpufreq: st: enable selective initialization based on the platform
    ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
    device property: Avoid potential dereferences of invalid pointers

    Linus Torvalds
     
  • Pull scheduler fix from Ingo Molnar:
    "This contains a single fix that fixes a nohz tick stopping bug when
    mixed-poliocy SCHED_FIFO and SCHED_RR tasks are present on a runqueue"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "This tree contains two fixes: new Intel CPU model numbers and an
    AMD/iommu uncore PMU driver fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs
    perf/x86: Add model numbers for Kabylake CPUs

    Linus Torvalds
     
  • Pull EFI fixes from Ingo Molnar:
    "This tree contains three fixes: a console spam fix, a file pattern fix
    and a sysfb_efi fix for a bug that triggered on older ThinkPads"

    * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/sysfb_efi: Fix valid BAR address range check
    x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT
    MAINTAINERS: Remove asterisk from EFI directory names

    Linus Torvalds
     
  • Pull parisc fix from Helge Deller:
    "Patch from Dmitry V Levin to fix a kernel crash when a straced process
    calls the (invalid) syscall which is equal to value of __NR_Linux_syscalls"

    * 'parisc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls

    Linus Torvalds
     
  • Pull ARC fixes from Vineet Gupta:
    "Late in the cycle, but this has fixes for couple of issues: a PAE40
    boot crash and Arnd spotting lack of barriers in BE io-accessors.

    The 3rd patch for enabling highmem in low physical mem ;-) honestly is
    more than a "fix" but its been in works for some time, seems to be
    stable in testing and enables 2 of our customers to go forward with
    4.6 kernel.

    - Fix for PTE truncation in PAE40 builds
    - Fix for big endian IO accessors lacking IO barrier
    - Allow HIGHMEM to work with low physical addresses"

    * tag 'arc-4.6-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: support HIGHMEM even without PAE40
    ARC: Fix PAE40 boot failures due to PTE truncation
    ARC: Add missing io barriers to io{read,write}{16,32}be()

    Linus Torvalds
     
  • Pull powerpc fix from Michael Ellerman:
    "Fix bad inline asm constraint in create_zero_mask() from Anton
    Blanchard"

    * tag 'powerpc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc: Fix bad inline asm constraint in create_zero_mask()

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Fixes for i915, amdgpu/radeon and imx.

    The IMX fix is for an autoloading regression found in Fedora. The
    radeon fixes, are the same fix to amdgpu/radeon to avoid a hardware
    lockup in some circumstances with a bad mode, and a double free bug I
    took a few hours chasing down the other morning.

    The i915 fixes are across the board, all stable material, and fixing
    some hangs and suspend/resume issues, along with a live status
    regressions"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
    drm/amdgpu: make sure vertical front porch is at least 1
    drm/radeon: make sure vertical front porch is at least 1
    drm/amdgpu: set metadata pointer to NULL after freeing.
    drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
    drm/i915: Fake HDMI live status
    drm/i915: Fix eDP low vswing for Broadwell
    drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
    drm/i915: Fix system resume if PCI device remained enabled
    drm/i915: Avoid stalling on pending flips for legacy cursor updates

    Linus Torvalds
     

06 May, 2016

24 commits

  • Do not load one entry beyond the end of the syscall table when the
    syscall number of a traced process equals to __NR_Linux_syscalls.
    Similar bug with regular processes was fixed by commit 3bb457af4fa8
    ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").

    This bug was found by strace test suite.

    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry V. Levin
    Acked-by: Helge Deller
    Signed-off-by: Helge Deller

    Dmitry V. Levin
     
  • * pm-opp-fixes:
    PM / OPP: Remove useless check

    * pm-cpufreq-fixes:
    intel_pstate: Fix intel_pstate_get()
    cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
    cpufreq: st: enable selective initialization based on the platform

    * pm-cpuidle-fixes:
    ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value

    Rafael J. Wysocki
     
  • * acpica-fixes:
    ACPICA: Dispatcher: Update thread ID for recursive method calls

    * device-properties-fixes:
    device property: Avoid potential dereferences of invalid pointers

    Rafael J. Wysocki
     
  • Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;

    Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
    (35.5), the ratio bits are bit 8-15.

    Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
    TSC calibration and the Local APIC timer frequency to be incorrect.

    Fix this problem by masking 0xff instead.

    [ tglx: Massaged changelog ]

    Fixes: 7da7c1561366 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
    Signed-off-by: Chen Yu
    Cc: "Rafael J. Wysocki"
    Cc: stable@vger.kernel.org
    Cc: Bin Gao
    Cc: Len Brown
    Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com
    Signed-off-by: Thomas Gleixner

    Chen Yu
     
  • Merge fixes from Andrew Morton:
    "14 fixes"

    * emailed patches from Andrew Morton :
    byteswap: try to avoid __builtin_constant_p gcc bug
    lib/stackdepot: avoid to return 0 handle
    mm: fix kcompactd hang during memory offlining
    modpost: fix module autoloading for OF devices with generic compatible property
    proc: prevent accessing /proc//environ until it's ready
    mm/zswap: provide unique zpool name
    mm: thp: kvm: fix memory corruption in KVM with THP enabled
    MAINTAINERS: fix Rajendra Nayak's address
    mm, cma: prevent nr_isolated_* counters from going negative
    mm: update min_free_kbytes from khugepaged after core initialization
    huge pagecache: mmap_sem is unlocked when truncation splits pmd
    rapidio/mport_cdev: fix uapi type definitions
    mm: memcontrol: let v2 cgroups follow changes in system swappiness
    mm: thp: correct split_huge_pages file permission

    Linus Torvalds
     
  • Apparently patchwork ended up truncating the full name.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:

    - a fix for the persistent memory 'struct page' driver. The
    implementation overlooked the fact that pages are allocated in 2MB
    units leading to -ENOMEM when establishing some configurations.

    It's tagged for -stable as the problem was introduced with the
    initial implementation in 4.5.

    - The new "error status translation" routine, introduced with the 4.6
    updates to the nfit driver, missed a necessary path in
    acpi_nfit_ctl().

    The end result is that we are falsely assuming commands complete
    successfully when the embedded status says otherwise.

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    nfit: fix translation of command status results
    libnvdimm, pfn: fix memmap reservation sizing

    Linus Torvalds
     
  • This is another attempt to avoid a regression in wwn_to_u64() after that
    started using get_unaligned_be64(), which in turn ran into a bug on
    gcc-4.9 through 6.1.

    The regression got introduced due to the combination of two separate
    workarounds (commits e3bde9568d99: "include/linux/unaligned: force
    inlining of byteswap operations" and ef3fb2422ffe: "scsi: fc: use
    get/put_unaligned64 for wwn access") that each try to sidestep distinct
    problems with gcc behavior (code growth and increased stack usage).

    Unfortunately after both have been applied, a more serious gcc bug has
    been uncovered, leading to incorrect object code that discards part of a
    function and causes undefined behavior.

    As part of this problem is how __builtin_constant_p gets evaluated on an
    argument passed by reference into an inline function, this avoids the
    use of __builtin_constant_p() for all architectures that set
    CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not set
    ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not
    suffer from the problem in the qla2xxx driver, but they might still run
    into it elsewhere.

    Both of the original workarounds were only merged in the 4.6 kernel, and
    the bug that is fixed by this patch should only appear if both are
    there, so we probably don't need to backport the fix. On the other
    hand, it works by simplifying the code path and should not have any
    negative effects.

    [arnd@arndb.de: fix older gcc warnings]
    (http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel)
    Link: https://lkml.org/lkml/headers/2016/4/12/1103
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
    Fixes: e3bde9568d99 ("include/linux/unaligned: force inlining of byteswap operations")
    Fixes: ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
    Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Josh Poimboeuf
    Tested-by: Josh Poimboeuf # on gcc-5.3
    Tested-by: Quinn Tran
    Cc: Martin Jambor
    Cc: "Martin K. Petersen"
    Cc: James Bottomley
    Cc: Denys Vlasenko
    Cc: Thomas Graf
    Cc: Peter Zijlstra
    Cc: David Rientjes
    Cc: Ingo Molnar
    Cc: Himanshu Madhani
    Cc: Jan Hubicka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Recently, we allow to save the stacktrace whose hashed value is 0. It
    causes the problem that stackdepot could return 0 even if in success.
    User of stackdepot cannot distinguish whether it is success or not so we
    need to solve this problem. In this patch, 1 bit are added to handle
    and make valid handle none 0 by setting this bit. After that, valid
    handle will not be 0 and 0 handle will represent failure correctly.

    Fixes: 33334e25769c ("lib/stackdepot.c: allow the stack trace hash to be zero")
    Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Assume memory47 is the last online block left in node1. This will hang:

    # echo offline > /sys/devices/system/node/node1/memory47/state

    After a couple of minutes, the following pops up in dmesg:

    INFO: task bash:957 blocked for more than 120 seconds.
    Not tainted 4.6.0-rc6+ #6
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    bash D ffff8800b7adbaf8 0 957 951 0x00000000
    Call Trace:
    schedule+0x35/0x80
    schedule_timeout+0x1ac/0x270
    wait_for_completion+0xe1/0x120
    kthread_stop+0x4f/0x110
    kcompactd_stop+0x26/0x40
    __offline_pages.constprop.28+0x7e6/0x840
    offline_pages+0x11/0x20
    memory_block_action+0x73/0x1d0
    memory_subsys_offline+0x47/0x60
    device_offline+0x86/0xb0
    store_mem_state+0xda/0xf0
    dev_attr_store+0x18/0x30
    sysfs_kf_write+0x37/0x40
    kernfs_fop_write+0x11d/0x170
    __vfs_write+0x37/0x120
    vfs_write+0xa9/0x1a0
    SyS_write+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x1a/0xa4

    kcompactd is waiting for kcompactd_max_order > 0 when it's woken up to
    actually exit. Check kthread_should_stop() to break out of the wait.

    Fixes: 698b1b306 ("mm, compaction: introduce kcompactd").
    Reported-by: Reza Arbab
    Tested-by: Reza Arbab
    Cc: Andrea Arcangeli
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Since the wildcard at the end of OF module aliases is gone, autoloading
    of modules that don't match a device's last (most generic) compatible
    value fails.

    For example the CODA960 VPU on i.MX6Q has the SoC specific compatible
    "fsl,imx6q-vpu" and the generic compatible "cnm,coda960". Since the
    driver currently only works with knowledge about the SoC specific
    integration, it doesn't list "cnm,cod960" in the module device table.

    This results in the device compatible
    "of:NvpuTCfsl,imx6q-vpuCcnm,coda960" not matching the module alias
    "of:N*T*Cfsl,imx6q-vpu" anymore, whereas before commit 2f632369ab79
    ("modpost: don't add a trailing wildcard for OF module aliases") it
    matched the module alias "of:N*T*Cfsl,imx6q-vpu*".

    This patch adds two module aliases for each compatible, one without the
    wildcard and one with "C*" appended.

    $ modinfo coda | grep imx6q
    alias: of:N*T*Cfsl,imx6q-vpuC*
    alias: of:N*T*Cfsl,imx6q-vpu

    Fixes: 2f632369ab79 ("modpost: don't add a trailing wildcard for OF module aliases")
    Link: http://lkml.kernel.org/r/1462203339-15340-1-git-send-email-p.zabel@pengutronix.de
    Signed-off-by: Philipp Zabel
    Cc: Javier Martinez Canillas
    Cc: Brian Norris
    Cc: Sjoerd Simons
    Cc: Rusty Russell
    Cc: Greg Kroah-Hartman
    Cc: [4.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Zabel
     
  • If /proc//environ gets read before the envp[] array is fully set up
    in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
    read more bytes than are actually written, as env_start will already be
    set but env_end will still be zero, making the range calculation
    underflow, allowing to read beyond the end of what has been written.

    Fix this as it is done for /proc//cmdline by testing env_end for
    zero. It is, apparently, intentionally set last in create_*_tables().

    This bug was found by the PaX size_overflow plugin that detected the
    arithmetic underflow of 'this_len = env_end - (env_start + src)' when
    env_end is still zero.

    The expected consequence is that userland trying to access
    /proc//environ of a not yet fully set up process may get
    inconsistent data as we're in the middle of copying in the environment
    variables.

    Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
    Signed-off-by: Mathias Krause
    Cc: Emese Revfy
    Cc: Pax Team
    Cc: Al Viro
    Cc: Mateusz Guzik
    Cc: Alexey Dobriyan
    Cc: Cyrill Gorcunov
    Cc: Jarod Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     
  • Instead of using "zswap" as the name for all zpools created, add an
    atomic counter and use "zswap%x" with the counter number for each zpool
    created, to provide a unique name for each new zpool.

    As zsmalloc, one of the zpool implementations, requires/expects a unique
    name for each pool created, zswap should provide a unique name. The
    zsmalloc pool creation does not fail if a new pool with a conflicting
    name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
    zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to
    change its compressor parameter if its zpool is zsmalloc; it also will
    be unable to change its zpool parameter back to zsmalloc, if it has any
    existing old zpool using zsmalloc with page(s) in it. Attempts to
    change the parameters will result in failure to create the zpool. This
    changes zswap to provide a unique name for each zpool creation.

    Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
    Signed-off-by: Dan Streetman
    Reported-by: Sergey Senozhatsky
    Reviewed-by: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Minchan Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • After the THP refcounting change, obtaining a compound pages from
    get_user_pages() no longer allows us to assume the entire compound page
    is immediately mappable from a secondary MMU.

    A secondary MMU doesn't want to call get_user_pages() more than once for
    each compound page, in order to know if it can map the whole compound
    page. So a secondary MMU needs to know from a single get_user_pages()
    invocation when it can map immediately the entire compound page to avoid
    a flood of unnecessary secondary MMU faults and spurious
    atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
    users).

    Ideally instead of the page->_mapcount < 1 check, get_user_pages()
    should return the granularity of the "page" mapping in the "mm" passed
    to get_user_pages(). However it's non trivial change to pass the "pmd"
    status belonging to the "mm" walked by get_user_pages up the stack (up
    to the caller of get_user_pages). So the fix just checks if there is
    not a single pte mapping on the page returned by get_user_pages, and in
    turn if the caller can assume that the whole compound page is mapped in
    the current "mm" (in a pmd_trans_huge()). In such case the entire
    compound page is safe to map into the secondary MMU without additional
    get_user_pages() calls on the surrounding tail/head pages. In addition
    of being faster, not having to run other get_user_pages() calls also
    reduces the memory footprint of the secondary MMU fault in case the pmd
    split happened as result of memory pressure.

    Without this fix after a MADV_DONTNEED (like invoked by QEMU during
    postcopy live migration or balloning) or after generic swapping (with a
    failure in split_huge_page() that would only result in pmd splitting and
    not a physical page split), KVM would map the whole compound page into
    the shadow pagetables, despite regular faults or userfaults (like
    UFFDIO_COPY) may map regular pages into the primary MMU as result of the
    pte faults, leading to the guest mode and userland mode going out of
    sync and not working on the same memory at all times.

    Any other secondary MMU notifier manager (KVM is just one of the many
    MMU notifier users) will need the same information if it doesn't want to
    run a flood of get_user_pages_fast and it can support multiple
    granularity in the secondary MMU mappings, so I think it is justified to
    be exposed not just to KVM.

    The other option would be to move transparent_hugepage_adjust to
    mm/huge_memory.c but that currently has all kind of KVM data structures
    in it, so it's definitely not a cut-and-paste work, so I couldn't do a
    fix as cleaner as this one for 4.6.

    Signed-off-by: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: "Kirill A. Shutemov"
    Cc: "Li, Liang Z"
    Cc: Amit Shah
    Cc: Paolo Bonzini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Signed-off-by: Eric Engestrom
    Cc: Rajendra Nayak
    Cc: Afzal Mohammed
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Engestrom
     
  • /proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
    increasingly negative under compaction: which would add delay when
    should be none, or no delay when should delay. The bug in compaction
    was due to a recent mmotm patch, but much older instance of the bug was
    also noticed in isolate_migratepages_range() which is used for CMA and
    gigantic hugepage allocations.

    The bug is caused by putback_movable_pages() in an error path
    decrementing the isolated counters without them being previously
    incremented by acct_isolated(). Fix isolate_migratepages_range() by
    removing the error-path putback, thus reaching acct_isolated() with
    migratepages still isolated, and leaving putback to caller like most
    other places do.

    Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
    [vbabka@suse.cz: expanded the changelog]
    Signed-off-by: Hugh Dickins
    Signed-off-by: Vlastimil Babka
    Acked-by: Joonsoo Kim
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Khugepaged attempts to raise min_free_kbytes if its set too low.
    However, on boot khugepaged sets min_free_kbytes first from
    subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes
    after from init_per_zone_wmark_min(), via a module_init() call.

    Khugepaged used to use a late_initcall() to set min_free_kbytes (such
    that it occurred after the core initialization), however this was
    removed when the initialization of min_free_kbytes was integrated into
    the starting of the khugepaged thread.

    The fix here is simply to invoke the core initialization using a
    core_initcall() instead of module_init(), such that the previous
    initialization ordering is restored. I didn't restore the
    late_initcall() since start_stop_khugepaged() already sets
    min_free_kbytes via set_recommended_min_free_kbytes().

    This was noticed when we had a number of page allocation failures when
    moving a workload to a kernel with this new initialization ordering. On
    an 8GB system this restores min_free_kbytes back to 67584 from 11365
    when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either
    CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or
    CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.

    Fixes: 79553da293d3 ("thp: cleanup khugepaged startup")
    Signed-off-by: Jason Baron
    Acked-by: Kirill A. Shutemov
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • zap_pmd_range()'s CONFIG_DEBUG_VM !rwsem_is_locked(&mmap_sem) BUG() will
    be invalid with huge pagecache, in whatever way it is implemented:
    truncation of a hugely-mapped file to an unhugely-aligned size would
    easily hit it.

    (Although anon THP could in principle apply khugepaged to private file
    mappings, which are not excluded by the MADV_HUGEPAGE restrictions, in
    practice there's a vm_ops check which excludes them, so it never hits
    this BUG() - there's no interface to "truncate" an anonymous mapping.)

    We could complicate the test, to check i_mmap_rwsem also when there's a
    vm_file; but my inclination was to make zap_pmd_range() more readable by
    simply deleting this check. A search has shown no report of the issue
    in the years since commit e0897d75f0b2 ("mm, thp: print useful
    information when mmap_sem is unlocked in zap_pmd_range") expanded it
    from VM_BUG_ON() - though I cannot point to what commit I would say then
    fixed the issue.

    But there are a couple of other patches now floating around, neither yet
    in the tree: let's agree to retain the check as a VM_BUG_ON_VMA(), as
    Matthew Wilcox has done; but subject to a vma_is_anonymous() check, as
    Kirill Shutemov has done. And let's get this in, without waiting for
    any particular huge pagecache implementation to reach the tree.

    Matthew said "We can reproduce this BUG() in the current Linus tree with
    DAX PMDs".

    Signed-off-by: Hugh Dickins
    Tested-by: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Andres Lagar-Cavilla
    Cc: Yang Shi
    Cc: Ning Qu
    Cc: Mel Gorman
    Cc: Andres Lagar-Cavilla
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Fix problems in uapi definitions reported by Gabriel Laskar: (see
    https://lkml.org/lkml/2016/4/5/205 for details)

    - move public header file rio_mport_cdev.h to include/uapi/linux directory
    - change types in data structures passed as IOCTL parameters
    - improve parameter checking in some IOCTL service routines

    Signed-off-by: Alexandre Bounine
    Reported-by: Gabriel Laskar
    Tested-by: Barry Wood
    Cc: Gabriel Laskar
    Cc: Matt Porter
    Cc: Aurelien Jacquiot
    Cc: Andre van Herk
    Cc: Barry Wood
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Cgroup2 currently doesn't have a per-cgroup swappiness setting. We
    might want to add one later - that's a different discussion - but until
    we do, the cgroups should always follow the system setting. Otherwise
    it will be unchangeably set to whatever the ancestor inherited from the
    system setting at the time of cgroup creation.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Cc: [4.5]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • split_huge_pages doesn't support get method at all, so the read
    permission sounds confusing, change the permission to write only.

    And, add "\n" to the output of set method to make it more readable.

    Signed-off-by: Yang Shi
    Acked-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • Pull asm-generic syscall fix from Arnd Bergmann:
    "My last pull request for asm-generic had just one patch that added two
    new system calls to asm/unistd.h, but unfortunately it turned out to
    be wrong, pointing arch/tile compat mode at the native handlers rather
    than the compat ones.

    This was spotted by Yury Norov, who is working on ILP32 mode for
    arch/arm64, which would have the same problem when merged. This fixes
    the table to use the correct compat syscalls, like the other 64-bit
    architectures do.

    I'll try to find the time to come up with a solution that prevents
    this problem from happening again, by allowing all future system calls
    to just get added in a single file for use by all architectures"

    * tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
    asm-generic: use compat version for preadv2 and pwritev2

    Linus Torvalds
     
  • Pull ARM SoC fixes from Arnd Bergmann:
    "Here are a couple last-minute fixes for ARM SoCs. Most of them are
    for the OMAP platforms, the rest are all for different platforms.

    OMAP:
    All dts fixes, mostly affecting voltages and pinctrl for various
    device drivers:

    - Regulator minimum voltage fixes for omap5
    - ISP syscon register offset fix for omap3
    - Fix regulator initial modes for n900
    - Fix omap5 pinctrl wkup instance size

    Allwinner:
    Remove incorrect constraints from a dcdc1 regulator

    Alltera SoCFPGA:
    Fix compilation in thumb2 mode

    Samsung exynos:
    Fix a potential oops in the pm-domain error handling

    Davinci:
    Avoid a link error if NVMEM is disabled

    Renesas:
    Do not mark an external uart clock as disabled, to allow probing
    the uarts"

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ARM: davinci: only use NVMEM when available
    ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
    ARM: dts: omap5: fix range of permitted wakeup pinmux registers
    ARM: dts: omap3-n900: Specify peripherals LDO regulators initial mode
    ARM: dts: omap3: Fix ISP syscon register offset
    ARM: dts: omap5-cm-t54: fix ldo1_reg and ldo4_reg ranges
    ARM: dts: omap5-board-common: fix ldo1_reg and ldo4_reg ranges
    arm64: dts: r8a7795: Don't disable referenced optional scif clock
    ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
    ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator

    Linus Torvalds
     
  • Update my email and web addresses in the kernel maintainers file.

    Signed-off-by: Russell King
    Signed-off-by: Linus Torvalds

    Russell King