08 Jun, 2016

1 commit

  • We need to read a bunch of registers on each compute unit and possibly
    on the current CPU too. Disable preemption around it. Otherwise, you
    get:

    BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/327
    caller is read_registers+0x6a/0x110 [fam15h_power]
    CPU: 3 PID: 327 Comm: systemd-udevd Not tainted 4.7.0-rc1+ #4
    Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016
    ...

    Suggested-by: Thomas Gleixner
    Signed-off-by: Borislav Petkov
    Cc: Rui Huang
    Cc: Sherry Hurwitz
    Cc: Guenter Roeck
    Acked-by: Huang Rui
    Tested-by: Huang Rui
    Fixes: fa7943449943 ("hwmon: (fam15h_power) Add compute unit accumulated power")
    Signed-off-by: Guenter Roeck

    Borislav Petkov
     

06 Jun, 2016

4 commits

  • Linus Torvalds
     
  • Pull parisc fixes from Helge Deller:

    - Fix printk time stamps on SMP systems which got wrong due to a patch
    which was added during the merge window

    - Fix two bugs in the stack backtrace code: Races in module unloading
    and possible invalid accesses to memory due to wrong instruction
    decoding (Mikulas Patocka)

    - Fix userspace crash when syscalls access invalid unaligned userspace
    addresses. Those syscalls will now return EFAULT as expected.
    (tagged for stable kernel series)

    * 'parisc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Move die_if_kernel() prototype into traps.h header
    parisc: Fix pagefault crash in unaligned __get_user() call
    parisc: Fix printk time during boot
    parisc: Fix backtrace on PA-RISC

    Linus Torvalds
     
  • Pull key handling update from James Morris:
    "This alters a new keyctl function added in the current merge window to
    allow for a future extension planned for the next merge window"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    KEYS: Add placeholder for KDF usage with DH

    Linus Torvalds
     
  • The /dev/ptmx device node is changed to lookup the directory entry "pts"
    in the same directory as the /dev/ptmx device node was opened in. If
    there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
    uses that filesystem. Otherwise the open of /dev/ptmx fails.

    The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
    userspace can now safely depend on each mount of devpts creating a new
    instance of the filesystem.

    Each mount of devpts is now a separate and equal filesystem.

    Reserved ttys are now available to all instances of devpts where the
    mounter is in the initial mount namespace.

    A new vfs helper path_pts is introduced that finds a directory entry
    named "pts" in the directory of the passed in path, and changes the
    passed in path to point to it. The helper path_pts uses a function
    path_parent_directory that was factored out of follow_dotdot.

    In the implementation of devpts:
    - devpts_mnt is killed as it is no longer meaningful if all mounts of
    devpts are equal.
    - pts_sb_from_inode is replaced by just inode->i_sb as all cached
    inodes in the tty layer are now from the devpts filesystem.
    - devpts_add_ref is rolled into the new function devpts_ptmx. And the
    unnecessary inode hold is removed.
    - devpts_del_ref is renamed devpts_release and reduced to just a
    deacrivate_super.
    - The newinstance mount option continues to be accepted but is now
    ignored.

    In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
    they are never used.

    Documentation/filesystems/devices.txt is updated to describe the current
    situation.

    This has been verified to work properly on openwrt-15.05, centos5,
    centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
    ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
    slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the
    caveat that on centos6 and on slackware-14.1 that there wind up being
    two instances of the devpts filesystem mounted on /dev/pts, the lower
    copy does not end up getting used.

    Signed-off-by: "Eric W. Biederman"
    Cc: Greg KH
    Cc: Peter Hurley
    Cc: Peter Anvin
    Cc: Andy Lutomirski
    Cc: Al Viro
    Cc: Serge Hallyn
    Cc: Willy Tarreau
    Cc: Aurelien Jarno
    Cc: One Thousand Gnomes
    Cc: Jann Horn
    Cc: Jiri Slaby
    Cc: Florian Weimer
    Cc: Konstantin Khlebnikov
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

05 Jun, 2016

12 commits

  • Signed-off-by: Helge Deller

    Helge Deller
     
  • One of the debian buildd servers had this crash in the syslog without
    any other information:

    Unaligned handler failed, ret = -2
    clock_adjtime (pid 22578): Unaligned data reference (code 28)
    CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G E 4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
    task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000

    YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
    PSW: 00001000000001001111100000001111 Tainted: G E
    r00-03 000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
    r04-07 00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
    r08-11 0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
    r12-15 000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
    r16-19 0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
    r20-23 0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
    r24-27 0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
    r28-31 0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
    sr00-03 0000000001200000 0000000001200000 0000000000000000 0000000001200000
    sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000

    IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
    IIR: 0ca0d089 ISR: 0000000001200000 IOR: 00000000fa6f7fff
    CPU: 1 CR30: 00000001bde7c000 CR31: ffffffffffffffff
    ORIG_R28: 00000002369fe628
    IAOQ[0]: compat_get_timex+0x2dc/0x3c0
    IAOQ[1]: compat_get_timex+0x2e0/0x3c0
    RP(r2): compat_get_timex+0x40/0x3c0
    Backtrace:
    [] compat_SyS_clock_adjtime+0x40/0xc0
    [] syscall_exit+0x0/0x14

    This means the userspace program clock_adjtime called the clock_adjtime()
    syscall and then crashed inside the compat_get_timex() function.
    Syscalls should never crash programs, but instead return EFAULT.

    The IIR register contains the executed instruction, which disassebles
    into "ldw 0(sr3,r5),r9".
    This load-word instruction is part of __get_user() which tried to read the word
    at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The
    unaligned handler is able to emulate all ldw instructions, but it fails if it
    fails to read the source e.g. because of page fault.

    The following program reproduces the problem:

    #define _GNU_SOURCE
    #include
    #include
    #include

    int main(void) {
    /* allocate 8k */
    char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    /* free second half (upper 4k) and make it invalid. */
    munmap(ptr+4096, 4096);
    /* syscall where first int is unaligned and clobbers into invalid memory region */
    /* syscall should return EFAULT */
    return syscall(__NR_clock_adjtime, 0, ptr+4095);
    }

    To fix this issue we simply need to check if the faulting instruction address
    is in the exception fixup table when the unaligned handler failed. If it
    is, call the fixup routine instead of crashing.

    While looking at the unaligned handler I found another issue as well: The
    target register should not be modified if the handler was unsuccessful.

    Signed-off-by: Helge Deller
    Cc: stable@vger.kernel.org

    Helge Deller
     
  • Avoid showing invalid printk time stamps during boot.

    Signed-off-by: Helge Deller
    Reviewed-by: Aaro Koskinen

    Helge Deller
     
  • This patch fixes backtrace on PA-RISC

    There were several problems:

    1) The code that decodes instructions handles instructions that subtract
    from the stack pointer incorrectly. If the instruction subtracts the
    number X from the stack pointer the code increases the frame size by
    (0x100000000-X). This results in invalid accesses to memory and
    recursive page faults.

    2) Because gcc reorders blocks, handling instructions that subtract from
    the frame pointer is incorrect. For example, this function
    int f(int a)
    {
    if (__builtin_expect(a, 1))
    return a;
    g();
    return a;
    }
    is compiled in such a way, that the code that decreases the stack
    pointer for the first "return a" is placed before the code for "g" call.
    If we recognize this decrement, we mistakenly believe that the frame
    size for the "g" call is zero.

    To fix problems 1) and 2), the patch doesn't recognize instructions that
    decrease the stack pointer at all. To further safeguard the unwind code
    against nonsense values, we don't allow frame size larger than
    Total_frame_size.

    3) The backtrace is not locked. If stack dump races with module unload,
    invalid table can be accessed.

    This patch adds a spinlock when processing module tables.

    Note, that for correct backtrace, you need recent binutils.
    Binutils 2.18 from Debian 5 produce garbage unwind tables.
    Binutils 2.21 work better (it sometimes forgets function frames, but at
    least it doesn't generate garbage).

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Helge Deller

    Mikulas Patocka
     
  • Pull drm fixes from Dave Airlie:
    "A bunch of ARM drivers got into the fixes vibe this time around, so
    this contains a bunch of fixes for imx, atmel hlcdc, arm hdlcd (only
    so many combos of hlcd), mediatek and omap drm.

    Other than that there is one mgag200 fix and a few core drm regression
    fixes"

    * tag 'drm-fixes-for-v4.7-rc2' of git://people.freedesktop.org/~airlied/linux: (34 commits)
    drm/omap: fix unused variable warning.
    drm: hdlcd: Add information about the underlying framebuffers in debugfs
    drm: hdlcd: Cleanup the atomic plane operations
    drm/hdlcd: Fix up crtc_state->event handling
    drm: hdlcd: Revamp runtime power management
    drm/mediatek: mtk_dsi: Remove spurious drm_connector_unregister
    drm/mediatek: mtk_dpi: remove invalid error message
    drm: atmel-hlcdc: fix a NULL check
    drm: atmel-hlcdc: fix atmel_hlcdc_crtc_reset() implementation
    drm/mgag200: Black screen fix for G200e rev 4
    drm: Wrap direct calls to driver->gem_free_object from CMA
    drm: fix fb refcount issue with atomic modesetting
    drm: make drm_atomic_set_mode_prop_for_crtc() more reliable
    drm/sti: remove extra mode fixup
    drm: add missing drm_mode_set_crtcinfo call
    drm/omap: include gpio/consumer.h where needed
    drm/omap: include linux/seq_file.h where needed
    Revert "drm/omap: no need to select OMAP2_DSS"
    drm/omap: Remove regulator API abuse
    OMAPDSS: HDMI5: Change DDC timings
    ...

    Linus Torvalds
     
  • Pull VFIO fixes from Alex Williamson:
    "Fix irqfd shutdown ordering, build warning, and VPD short read"

    * tag 'vfio-v4.7-rc2' of git://github.com/awilliam/linux-vfio:
    vfio/pci: Allow VPD short read
    vfio/type1: Fix build warning
    vfio/pci: Fix ordering of eventfd vs virqfd shutdown

    Linus Torvalds
     
  • Pull MMC fixes from Ulf Hansson:
    "MMC core:
    - Fix/restore behaviour when selecting bus width for (e)MMC

    MMC host:
    - sunxi: Fix eMMC HS-DDR modes on Allwinner A80"

    * tag 'mmc-v4.7-rc1-2' of git://git.linaro.org/people/ulf.hansson/mmc:
    mmc: sunxi: Re-enable eMMC HS-DDR modes on Allwinner A80
    mmc: sunxi: Fix DDR MMC timings for A80
    mmc: fix mmc mode selection for HS-DDR and higher

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "The important part of this pull is Filipe's set of fixes for btrfs
    device replacement. Filipe fixed a few issues seen on the list and a
    number he found on his own"

    * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent
    Btrfs: fix race between device replace and read repair
    Btrfs: fix race between device replace and discard
    Btrfs: fix race between device replace and chunk allocation
    Btrfs: fix race setting block group back to RW mode during device replace
    Btrfs: fix unprotected assignment of the left cursor for device replace
    Btrfs: fix race setting block group readonly during device replace
    Btrfs: fix race between device replace and block group removal
    Btrfs: fix race between readahead and device replace/removal

    Linus Torvalds
     
  • Pull Ceph fixes from Sage Weil:
    "We have a few follow-up fixes for the libceph refactor from Ilya, and
    then some cephfs + fscache fixes from Zheng.

    The first two FS-Cache patches are acked by David Howells and deemed
    trivial enough to go through our tree. The rest fix some issues with
    the ceph fscache handling (disable cache for inodes opened for write,
    and simplify the revalidation logic accordingly, dropping the
    now-unnecessary work queue)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: use i_version to check validity of fscache
    ceph: improve fscache revalidation
    ceph: disable fscache when inode is opened for write
    ceph: avoid unnecessary fscache invalidation/revlidation
    ceph: call __fscache_uncache_page() if readpages fails
    FS-Cache: make check_consistency callback return int
    FS-Cache: wake write waiter after invalidating writes
    libceph: use %s instead of %pE in dout()s
    libceph: put request only if it's done in handle_reply()
    libceph: change ceph_osdmap_flag() to take osdc

    Linus Torvalds
     
  • Pull ACPI fixes from Rafael Wysocki:
    "Two fixes for problems introduced recently (ACPICA and the ACPI
    backlight driver) and one fix for an older issue that prevents at
    least one system from booting.

    Specifics:

    - Fix an incorrect check introduced by recent ACPICA changes which
    causes problems with booting KVM guests to happen, among other
    things (Lv Zheng).

    - Fix a backlight issue introduced by recent changes to the ACPI
    video driver (Aaron Lu).

    - Fix the ACPI processor initialization which attempts to register an
    IO region without checking if that really is necessary and
    sometimes prevents drivers loaded subsequently from registering
    their resources which leads to boot issues (Rafael Wysocki)"

    * tag 'acpi-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / processor: Avoid reserving IO regions too early
    ACPICA / Hardware: Fix old register check in acpi_hw_get_access_bit_width()
    ACPI / Thermal / video: fix max_level incorrect value

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "Two fixes for problems introduced recently in the cpufreq core and the
    intel_pstate driver.

    Specifics:

    - Fix a silly mistake related to the clamp_val() usage in a function
    added by a recent commit (Rafael Wysocki).

    - Reduce the log level of an annoying message added to intel_pstate
    during the recent merge window (Srinivas Pandruvada)"

    * tag 'pm-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch()
    cpufreq: intel_pstate: Downgrade print level for _PPC

    Linus Torvalds
     
  • Merge various fixes from Andrew Morton:
    "10 fixes"

    * emailed patches from Andrew Morton :
    mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
    mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
    mm, oom_reaper: do not use siglock in try_oom_reaper()
    mm, page_alloc: prevent infinite loop in buffered_rmqueue()
    checkpatch: reduce git commit description style false positives
    mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
    memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
    mm: check the return value of lookup_page_ext for all call sites
    kdump: fix dmesg gdbmacro to work with record based printk
    mm: fix overflow in vm_map_ram()

    Linus Torvalds
     

04 Jun, 2016

18 commits

  • Pull irq fixes from Thomas Gleixner:
    - a few simple fixes for fallout from the recent gic-v3 changes
    - a workaround for a Cavium thunderX erratum
    - a bugfix for the pic32 irqchip to make external interrupts work proper
    - a missing return value in the generic IPI management code

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/irq-pic32-evic: Fix bug with external interrupts.
    irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
    irqchip/gic-v3: Fix quiescence check in gic_enable_redist
    irqchip/gic-v3: Fix copy+paste mistakes in defines
    irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
    genirq: Fix missing return value in irq_destroy_ipi()

    Linus Torvalds
     
  • The optimistic fast path may use cpuset_current_mems_allowed instead of
    of a NULL nodemask supplied by the caller for cpuset allocations. The
    preferred zone is calculated on this basis for statistic purposes and as
    a starting point in the zonelist iterator.

    However, if the context can ignore memory policies due to being atomic
    or being able to ignore watermarks then the starting point in the
    zonelist iterator is no longer correct. This patch resets the zonelist
    iterator in the allocator slowpath if the context can ignore memory
    policies. This will alter the zone used for statistics but only after
    it is known that it makes sense for that context. Resetting it before
    entering the slowpath would potentially allow an ALLOC_CPUSET allocation
    to be accounted for against the wrong zone. Note that while nodemask is
    not explicitly set to the original nodemask, it would only have been
    overwritten if cpuset_enabled() and it was reset before the slowpath was
    entered.

    Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
    Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
    Signed-off-by: Mel Gorman
    Reported-by: Geert Uytterhoeven
    Tested-by: Geert Uytterhoeven
    Acked-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Geert Uytterhoeven reported the following problem that bisected to
    commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone
    in a zonelist twice") on m68k/ARAnyM

    BUG: scheduling while atomic: cron/668/0x10c9a0c0
    Modules linked in:
    CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364
    Call Trace: [] __schedule_bug+0x40/0x54
    __schedule+0x312/0x388
    __schedule+0x0/0x388
    prepare_to_wait+0x0/0x52
    schedule+0x64/0x82
    schedule_timeout+0xda/0x104
    set_next_entity+0x18/0x40
    pick_next_task_fair+0x78/0xda
    io_schedule_timeout+0x36/0x4a
    bit_wait_io+0x0/0x40
    bit_wait_io+0x12/0x40
    __wait_on_bit+0x46/0x76
    wait_on_page_bit_killable+0x64/0x6c
    bit_wait_io+0x0/0x40
    wake_bit_function+0x0/0x4e
    __lock_page_or_retry+0xde/0x124
    do_scan_async+0x114/0x17c
    lookup_swap_cache+0x24/0x4e
    handle_mm_fault+0x626/0x7de
    find_vma+0x0/0x66
    down_read+0x0/0xe
    wait_on_page_bit_killable_timeout+0x77/0x7c
    find_vma+0x16/0x66
    do_page_fault+0xe6/0x23a
    res_func+0xa3c/0x141a
    buserr_c+0x190/0x6d4
    res_func+0xa3c/0x141a
    buserr+0x20/0x28
    res_func+0xa3c/0x141a
    buserr+0x20/0x28

    The relationship is not obvious but it's due to a failure to rescan the
    full zonelist after the fair zone allocation policy exhausts the batch
    count. While this is a functional problem, it's also a performance
    issue. A page allocator microbenchmark showed the following

    4.7.0-rc1 4.7.0-rc1
    vanilla reset-v1r2
    Min alloc-odr0-1 327.00 ( 0.00%) 326.00 ( 0.31%)
    Min alloc-odr0-2 235.00 ( 0.00%) 235.00 ( 0.00%)
    Min alloc-odr0-4 198.00 ( 0.00%) 198.00 ( 0.00%)
    Min alloc-odr0-8 170.00 ( 0.00%) 170.00 ( 0.00%)
    Min alloc-odr0-16 156.00 ( 0.00%) 156.00 ( 0.00%)
    Min alloc-odr0-32 150.00 ( 0.00%) 150.00 ( 0.00%)
    Min alloc-odr0-64 146.00 ( 0.00%) 146.00 ( 0.00%)
    Min alloc-odr0-128 145.00 ( 0.00%) 145.00 ( 0.00%)
    Min alloc-odr0-256 155.00 ( 0.00%) 155.00 ( 0.00%)
    Min alloc-odr0-512 168.00 ( 0.00%) 165.00 ( 1.79%)
    Min alloc-odr0-1024 175.00 ( 0.00%) 174.00 ( 0.57%)
    Min alloc-odr0-2048 180.00 ( 0.00%) 180.00 ( 0.00%)
    Min alloc-odr0-4096 187.00 ( 0.00%) 186.00 ( 0.53%)
    Min alloc-odr0-8192 190.00 ( 0.00%) 190.00 ( 0.00%)
    Min alloc-odr0-16384 191.00 ( 0.00%) 191.00 ( 0.00%)
    Min alloc-odr1-1 736.00 ( 0.00%) 445.00 ( 39.54%)
    Min alloc-odr1-2 343.00 ( 0.00%) 335.00 ( 2.33%)
    Min alloc-odr1-4 277.00 ( 0.00%) 270.00 ( 2.53%)
    Min alloc-odr1-8 238.00 ( 0.00%) 233.00 ( 2.10%)
    Min alloc-odr1-16 224.00 ( 0.00%) 218.00 ( 2.68%)
    Min alloc-odr1-32 210.00 ( 0.00%) 208.00 ( 0.95%)
    Min alloc-odr1-64 207.00 ( 0.00%) 203.00 ( 1.93%)
    Min alloc-odr1-128 276.00 ( 0.00%) 202.00 ( 26.81%)
    Min alloc-odr1-256 206.00 ( 0.00%) 202.00 ( 1.94%)
    Min alloc-odr1-512 207.00 ( 0.00%) 202.00 ( 2.42%)
    Min alloc-odr1-1024 208.00 ( 0.00%) 205.00 ( 1.44%)
    Min alloc-odr1-2048 213.00 ( 0.00%) 212.00 ( 0.47%)
    Min alloc-odr1-4096 218.00 ( 0.00%) 216.00 ( 0.92%)
    Min alloc-odr1-8192 341.00 ( 0.00%) 219.00 ( 35.78%)

    Note that order-0 allocations are unaffected but higher orders get a
    small boost from this patch and a large reduction in system CPU usage
    overall as can be seen here:

    4.7.0-rc1 4.7.0-rc1
    vanilla reset-v1r2
    User 85.32 86.31
    System 2221.39 2053.36
    Elapsed 2368.89 2202.47

    Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
    Link: http://lkml.kernel.org/r/20160531100848.GR2527@techsingularity.net
    Signed-off-by: Mel Gorman
    Reported-by: Geert Uytterhoeven
    Tested-by: Geert Uytterhoeven
    Tested-by: Mikulas Patocka
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Oleg has noted that siglock usage in try_oom_reaper is both pointless
    and dangerous. signal_group_exit can be checked lockless. The problem
    is that sighand becomes NULL in __exit_signal so we can crash.

    Fixes: 3ef22dfff239 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")
    Link: http://lkml.kernel.org/r/1464679423-30218-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Suggested-by: Oleg Nesterov
    Cc: Tetsuo Handa
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
    buffered_rmqueue() when check_new_pcp() returns 1, because the bad page
    is never removed from the pcp list. Fix this by removing the page
    before retrying. Also we don't need to check if page is non-NULL,
    because we simply grab it from the list which was just tested for being
    non-empty.

    Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
    Link: http://lkml.kernel.org/r/20160530090154.GM2527@techsingularity.net
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Mel Gorman
    Reported-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Some lines in a commit log appear to be commit SHA1 ids like:

    ERROR: Please use git commit description style 'commit ("")' - ie: 'commit 0123456789ab ("commit description")'
    Link: http://lkml.kernel.org/r/40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com

    Reduce the false positives.

    Link: http://lkml.kernel.org/r/eda977eaa8328fef42bb3c87935d97e10ea8ff67.1464384023.git.joe@perches.com
    Signed-off-by: Joe Perches
    Reported-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Fix erroneous z3fold header access in a HEADLESS page in reclaim
    function, and change one remaining direct handle-to-buddy conversion to
    use the appropriate helper.

    Link: http://lkml.kernel.org/r/5748706F.9020208@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • memcg_offline_kmem() may be called from memcg_free_kmem() after a css
    init failure. memcg_free_kmem() is a ->css_free callback which is
    called without cgroup_mutex and memcg_offline_kmem() ends up using
    css_for_each_descendant_pre() without any locking. Fix it by adding rcu
    read locking around it.

    mkdir: cannot create directory `65530': No space left on device
    ===============================
    [ INFO: suspicious RCU usage. ]
    4.6.0-work+ #321 Not tainted
    -------------------------------
    kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
    [ 527.243970] other info that might help us debug this:
    [ 527.244715]
    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by kworker/0:5/1664:
    #0: ("cgroup_destroy"){.+.+..}, at: [] process_one_work+0x165/0x4a0
    #1: ((&css->destroy_work)#3){+.+...}, at: [] process_one_work+0x165/0x4a0
    [ 527.248098] stack backtrace:
    CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
    Workqueue: cgroup_destroy css_free_work_fn
    Call Trace:
    dump_stack+0x68/0xa1
    lockdep_rcu_suspicious+0xd7/0x110
    css_next_descendant_pre+0x7d/0xb0
    memcg_offline_kmem.part.44+0x4a/0xc0
    mem_cgroup_css_free+0x1ec/0x200
    css_free_work_fn+0x49/0x5e0
    process_one_work+0x1c5/0x4a0
    worker_thread+0x49/0x490
    kthread+0xea/0x100
    ret_from_fork+0x1f/0x40

    Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org
    Signed-off-by: Tejun Heo
    Acked-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: [4.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Pull timer bugfix from Thomas Gleixner:
    "A single bugfix for the error check wreckage we introduced in the
    merge window"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Make settimeofday error checking work again

    Linus Torvalds
     
  • Per the discussion with Joonsoo Kim [1], we need check the return value
    of lookup_page_ext() for all call sites since it might return NULL in
    some cases, although it is unlikely, i.e. memory hotplug.

    Tested with ltp with "page_owner=0".

    [1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE

    [akpm@linux-foundation.org: fix build-breaking typos]
    [arnd@arndb.de: fix build problems from lookup_page_ext]
    Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Signed-off-by: Arnd Bergmann
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • Commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length
    record buffer") introduced a record based printk buffer. Modify
    gdbmacros.txt to parse this new structure so dmesg will work properly.

    Link: http://lkml.kernel.org/r/1463515794-1599-1-git-send-email-minyard@acm.org
    Signed-off-by: Corey Minyard
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Corey Minyard
     
  • When remapping pages accounting for 4G or more memory space, the
    operation 'count << PAGE_SHIFT' overflows as it is performed on an
    integer. Solution: cast before doing the bitshift.

    [akpm@linux-foundation.org: fix vm_unmap_ram() also]
    [akpm@linux-foundation.org: fix vmap() as well, per Guillermo]
    Link: http://lkml.kernel.org/r/etPan.57175fb3.7a271c6b.2bd@naudit.es
    Signed-off-by: Guillermo Julián Moreno
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillermo Julián Moreno
     
  • Pull ARM fix from Russell King:
    "Just one fix to the ptrace code, spotted by Simon Marchi, where if a
    thread migrates to a different CPU and the VFP registers are changed
    through ptrace, the application doesn't see the updated VFP registers"

    * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
    ARM: fix PTRACE_SETVFPREGS on SMP systems

    Linus Torvalds
     
  • Pull arm64 fixes from Will Deacon:
    "The main thing here is reviving hugetlb support using contiguous ptes,
    which we ended up reverting at the last minute in 4.5 pending a fix
    which went into the core mm/ code during the recent merge window.

    - Revert a previous revert and get hugetlb going with contiguous hints
    - Wire up missing compat syscalls
    - Enable CONFIG_SET_MODULE_RONX by default
    - Add missing line to our compat /proc/cpuinfo output
    - Clarify levels in our page table dumps
    - Fix booting with RANDOMIZE_TEXT_OFFSET enabled
    - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
    - Remove some dead code and update a comment"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
    arm64: move {PAGE,CONT}_SHIFT into Kconfig
    arm64: mm: dump: log span level
    arm64: update stale PAGE_OFFSET comment
    drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
    drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
    drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
    arm64: report CPU number in bad_mode
    arm64: unistd32.h: wire up missing syscalls for compat tasks
    arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
    arm64: enable CONFIG_SET_MODULE_RONX by default
    arm64: Remove orphaned __addr_ok() definition
    Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    - Handle RTAS delay requests in configure_bridge from Russell Currey
    - Refactor the configure_bridge RTAS tokens from Russell Currey
    - Fix definition of SIAR and SDAR registers from Thomas Huth
    - Use privileged SPR number for MMCR2 from Thomas Huth
    - Update LPCR only if it is powernv from Aneesh Kumar K.V
    - Fix the reference bit update when handling hash fault from Aneesh
    Kumar K.V
    - Add missing tlb flush from Aneesh Kumar K.V
    - Add POWER8NVL support to ibm,client-architecture-support call from
    Thomas Huth

    * tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
    powerpc/mm/radix: Add missing tlb flush
    powerpc/mm/hash: Fix the reference bit update when handling hash fault
    powerpc/mm/radix: Update LPCR only if it is powernv
    powerpc: Use privileged SPR number for MMCR2
    powerpc: Fix definition of SIAR and SDAR registers
    powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens
    powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge

    Linus Torvalds
     
  • * acpica-fixes:
    ACPICA / Hardware: Fix old register check in acpi_hw_get_access_bit_width()

    * acpi-video:
    ACPI / Thermal / video: fix max_level incorrect value

    * acpi-processor:
    ACPI / processor: Avoid reserving IO regions too early

    Rafael J. Wysocki
     
  • * pm-cpufreq-fixes:
    cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch()
    cpufreq: intel_pstate: Downgrade print level for _PPC

    Rafael J. Wysocki
     
  • When dealing with inline extents, btrfs_get_extent will incorrectly try
    to insert a duplicate extent_map. The dup hits -EEXIST from
    add_extent_map, but then we try to merge with the existing one and end
    up trying to insert a zero length extent_map.

    This actually works most of the time, except when there are extent maps
    past the end of the inline extent. rocksdb will trigger this sometimes
    because it preallocates an extent and then truncates down.

    Josef made a script to trigger with xfs_io:

    #!/bin/bash

    xfs_io -f -c "pwrite 0 1000" inline
    xfs_io -c "falloc -k 4k 1M" inline
    xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline
    xfs_io -c "fadvise -d 0 1000" inline
    cat inline

    You'll get EIOs trying to read inline after this because add_extent_map
    is returning EEXIST

    Signed-off-by: Chris Mason

    Chris Mason
     

03 Jun, 2016

5 commits

  • …/arm-platforms into irq/urgent

    Merge irqchip updates from Marc Zyngier:

    - A number of embarassing buglets (GICv3, PIC32)
    - A more substential errata workaround for Cavium's GICv3 ITS
    (kept for post-rc1 due to its dependency on NUMA)

    Thomas Gleixner
     
  • With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the
    following issue on the boot:

    kernel BUG at arch/arm64/mm/mmu.c:480!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310
    Hardware name: ARM Juno development board (r2) (DT)
    task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000
    PC is at map_kernel_segment+0x44/0xb0
    LR is at paging_init+0x84/0x5b0
    pc : [] lr : [] pstate: 600002c5

    Call trace:
    [] map_kernel_segment+0x44/0xb0
    [] paging_init+0x84/0x5b0
    [] setup_arch+0x198/0x534
    [] start_kernel+0x70/0x388
    [] __primary_switched+0x30/0x74

    Commit 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text
    segment mapping") removed the alignment between the .head.text and .text
    sections, and used the _text rather than the _stext interval for mapping
    the .text segment.

    Prior to this commit _stext was always section aligned and didn't cause
    any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that
    alignment has been removed and _text is used to map the .text segment,
    we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET
    is enabled.

    This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset
    is always aligned to the kernel page size. To ensure this, we rely on
    the PAGE_SHIFT being available via Kconfig.

    Signed-off-by: Mark Rutland
    Reported-by: Sudeep Holla
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: Will Deacon
    Fixes: 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping")
    Signed-off-by: Will Deacon

    Mark Rutland
     
  • In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would
    like to make use of PAGE_SHIFT outside of code that can include the
    usual header files.

    Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with
    ARM64_CONT_SHIFT for consistency.

    Signed-off-by: Mark Rutland
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: Sudeep Holla
    Cc: Will Deacon
    Signed-off-by: Will Deacon

    Mark Rutland
     
  • The page table dump code logs spans of entries at the same level
    (pgd/pud/pmd/pte) which have the same attributes. While we log the
    (decoded) attributes, we don't log the level, which leaves the output
    ambiguous and/or confusing in some cases.

    For example:

    0xffff800800000000-0xffff800980000000 6G RW NX SHD AF BLK UXN MEM/NORMAL

    If using 4K pages, this may describe a span of 6 1G block entries at the
    PGD/PUD level, or 3072 2M block entries at the PMD level.

    This patch adds the page table level to each output line, removing this
    ambiguity. For the example above, this will produce:

    0xffffffc800000000-0xffffffc980000000 6G PUD RW NX SHD AF BLK UXN MEM/NORMAL

    When 3 level tables are in use, and we use the asm-generic/nopud.h
    definitions, the dump code treats each entry in the PGD as a 1 element
    table at the PUD level, and logs spans as being PUDs, which can be
    confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
    when CONFIG_PGTABLE_LEVELS
    Cc: Catalin Marinas
    Cc: Huang Shijie
    Cc: Laura Abbott
    Cc: Steve Capper
    Cc: Will Deacon
    Signed-off-by: Will Deacon

    Mark Rutland
     
  • Commit ab893fb9f1b17f02 ("arm64: introduce KIMAGE_VADDR as the virtual
    base of the kernel region") logically split KIMAGE_VADDR from
    PAGE_OFFSET, and since commit f9040773b7bbbd9e ("arm64: move kernel
    image to base of vmalloc area") the two have been distinct values.

    Unfortunately, neither commit updated the comment above these
    definitions, which now erroneously states that PAGE_OFFSET is the start
    of the kernel image rather than the start of the linear mapping.

    This patch fixes said comment, and introduces an explanation of
    KIMAGE_VADDR.

    Signed-off-by: Mark Rutland
    Cc: Will Deacon
    Cc: Catalin Marinas
    Cc: Marc Zyngier
    Signed-off-by: Will Deacon

    Mark Rutland