04 Feb, 2017

1 commit

  • [ Upstream commit 003c941057eaa868ca6fedd29a274c863167230d ]

    Fix up a data alignment issue on sparc by swapping the order
    of the cookie byte array field with the length field in
    struct tcp_fastopen_cookie, and making it a proper union
    to clean up the typecasting.

    This addresses log complaints like these:
    log_unaligned: 113 callbacks suppressed
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
    Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
    Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
    Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360

    Cc: Eric Dumazet
    Signed-off-by: Shannon Nelson
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Shannon Nelson
     

01 Feb, 2017

5 commits

  • commit 8a1f780e7f28c7c1d640118242cf68d528c456cd upstream.

    online_{kernel|movable} is used to change the memory zone to
    ZONE_{NORMAL|MOVABLE} and online the memory.

    To check that memory zone can be changed, zone_can_shift() is used.
    Currently the function returns minus integer value, plus integer
    value and 0. When the function returns minus or plus integer value,
    it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.

    But when the function returns 0, there are two meanings.

    One of the meanings is that the memory zone does not need to be changed.
    For example, when memory is in ZONE_NORMAL and onlined by online_kernel
    the memory zone does not need to be changed.

    Another meaning is that the memory zone cannot be changed. When memory
    is in ZONE_NORMAL and onlined by online_movable, the memory zone may
    not be changed to ZONE_MOVALBE due to memory online limitation(see
    Documentation/memory-hotplug.txt). In this case, memory must not be
    onlined.

    The patch changes the return type of zone_can_shift() so that memory
    online operation fails when memory zone cannot be changed as follows:

    Before applying patch:
    # grep -A 35 "Node 2" /proc/zoneinfo
    Node 2, zone Normal

    node_scanned 0
    spanned 8388608
    present 7864320
    managed 7864320
    # echo online_movable > memory4097/state
    # grep -A 35 "Node 2" /proc/zoneinfo
    Node 2, zone Normal

    node_scanned 0
    spanned 8388608
    present 8388608
    managed 8388608

    online_movable operation succeeded. But memory is onlined as
    ZONE_NORMAL, not ZONE_MOVABLE.

    After applying patch:
    # grep -A 35 "Node 2" /proc/zoneinfo
    Node 2, zone Normal

    node_scanned 0
    spanned 8388608
    present 7864320
    managed 7864320
    # echo online_movable > memory4097/state
    bash: echo: write error: Invalid argument
    # grep -A 35 "Node 2" /proc/zoneinfo
    Node 2, zone Normal

    node_scanned 0
    spanned 8388608
    present 7864320
    managed 7864320

    online_movable operation failed because of failure of changing
    the memory zone from ZONE_NORMAL to ZONE_MOVABLE

    Fixes: df429ac03936 ("memory-hotplug: more general validation of zone during online")
    Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.com
    Signed-off-by: Yasuaki Ishimatsu
    Reviewed-by: Reza Arbab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Yasuaki Ishimatsu
     
  • commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream.

    After removing sunrpc module, I get many kmemleak information as,
    unreferenced object 0xffff88003316b1e0 (size 544):
    comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] kmem_cache_alloc+0x15e/0x1f0
    [] ida_pre_get+0xaa/0x150
    [] ida_simple_get+0xad/0x180
    [] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
    [] lockd+0x4d/0x270 [lockd]
    [] param_set_timeout+0x55/0x100 [lockd]
    [] svc_defer+0x114/0x3f0 [sunrpc]
    [] svc_defer+0x2d7/0x3f0 [sunrpc]
    [] rpc_show_info+0x8a/0x110 [sunrpc]
    [] proc_reg_write+0x7f/0xc0
    [] __vfs_write+0xdf/0x3c0
    [] vfs_write+0xef/0x240
    [] SyS_write+0xad/0x130
    [] entry_SYSCALL_64_fastpath+0x1a/0xa9
    [] 0xffffffffffffffff

    I found, the ida information (dynamic memory) isn't cleanup.

    Signed-off-by: Kinglong Mee
    Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Kinglong Mee
     
  • commit 059aa734824165507c65fd30a55ff000afd14983 upstream.

    Xuan Qi reports that the Linux NFSv4 client failed to lock a file
    that was migrated. The steps he observed on the wire:

    1. The client sent a LOCK request to the source server
    2. The source server replied NFS4ERR_MOVED
    3. The client switched to the destination server
    4. The client sent the same LOCK request to the destination
    server with a bumped lock sequence ID
    5. The destination server rejected the LOCK request with
    NFS4ERR_BAD_SEQID

    RFC 3530 section 8.1.5 provides a list of NFS errors which do not
    bump a lock sequence ID.

    However, RFC 3530 is now obsoleted by RFC 7530. In RFC 7530 section
    9.1.7, this list has been updated by the addition of NFS4ERR_MOVED.

    Reported-by: Xuan Qi
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit b1a27eac7fefff33ccf6acc919fc0725bf9815fb upstream.

    Use CXGB3_... instead of CXBG3_...

    Fixes: a85fb3383340 ("IB/cxgb3: Move user vendor structures")
    Signed-off-by: Nicolas Iooss
    Reviewed-by: Leon Romanovsky
    Acked-by: Steve Wise
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Iooss
     
  • commit ea57485af8f4221312a5a95d63c382b45e7840dc upstream.

    Patch series "fix premature OOM regression in 4.7+ due to cpuset races".

    This is v2 of my attempt to fix the recent report based on LTP cpuset
    stress test [1]. The intention is to go to stable 4.9 LTSS with this,
    as triggering repeated OOMs is not nice. That's why the patches try to
    be not too intrusive.

    Unfortunately why investigating I found that modifying the testcase to
    use per-VMA policies instead of per-task policies will bring the OOM's
    back, but that seems to be much older and harder to fix problem. I have
    posted a RFC [2] but I believe that fixing the recent regressions has a
    higher priority.

    Longer-term we might try to think how to fix the cpuset mess in a better
    and less error prone way. I was for example very surprised to learn,
    that cpuset updates change not only task->mems_allowed, but also
    nodemask of mempolicies. Until now I expected the parameter to
    alloc_pages_nodemask() to be stable. I wonder why do we then treat
    cpusets specially in get_page_from_freelist() and distinguish HARDWALL
    etc, when there's unconditional intersection between mempolicy and
    cpuset. I would expect the nodemask adjustment for saving overhead in
    g_p_f(), but that clearly doesn't happen in the current form. So we
    have both crazy complexity and overhead, AFAICS.

    [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
    [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz

    This patch (of 4):

    Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first
    zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
    which can theoretically happen due to concurrent cpuset modification. We
    check the zoneref pointer which is never NULL and we should check the zone
    pointer. Also document this in first_zones_zonelist() comment per Michal
    Hocko.

    Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
    Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: Hillf Danton
    Cc: Ganapatrao Kulkarni
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     

26 Jan, 2017

5 commits

  • commit fff5d99225107f5f13fe4a9805adc2a1c4b5fb00 upstream.

    On architectures like arm64, swiotlb is tied intimately to the core
    architecture DMA support. In addition, ZONE_DMA cannot be disabled.

    To aid debugging and catch devices not supporting DMA to memory outside
    the 32-bit address space, add a kernel command line option
    "swiotlb=noforce", which disables the use of bounce buffers.
    If specified, trying to map memory that cannot be used with DMA will
    fail, and a rate-limited warning will be printed.

    Note that io_tlb_nslabs is set to 1, which is the minimal supported
    value.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     
  • commit ae7871be189cb41184f1e05742b4a99e2c59774d upstream.

    Convert the flag swiotlb_force from an int to an enum, to prepare for
    the advent of more possible values.

    Suggested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     
  • commit 546125d1614264d26080817d0c8cddb9b25081fa upstream.

    The inet6addr_chain is an atomic notifier chain, so we can't call
    anything that might sleep (like lock_sock)... instead of closing the
    socket from svc_age_temp_xprts_now (which is called by the notifier
    function), just have the rpc service threads do it instead.

    Fixes: c3d4879e01be "sunrpc: Add a function to close..."
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     
  • commit 52d7e48b86fc108e45a656d8e53e4237993c481d upstream.

    The current preemptible RCU implementation goes through three phases
    during bootup. In the first phase, there is only one CPU that is running
    with preemption disabled, so that a no-op is a synchronous grace period.
    In the second mid-boot phase, the scheduler is running, but RCU has
    not yet gotten its kthreads spawned (and, for expedited grace periods,
    workqueues are not yet running. During this time, any attempt to do
    a synchronous grace period will hang the system (or complain bitterly,
    depending). In the third and final phase, RCU is fully operational and
    everything works normally.

    This has been OK for some time, but there has recently been some
    synchronous grace periods showing up during the second mid-boot phase.
    This code worked "by accident" for awhile, but started failing as soon
    as expedited RCU grace periods switched over to workqueues in commit
    8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue").
    Note that the code was buggy even before this commit, as it was subject
    to failure on real-time systems that forced all expedited grace periods
    to run as normal grace periods (for example, using the rcu_normal ksysfs
    parameter). The callchain from the failure case is as follows:

    early_amd_iommu_init()
    |-> acpi_put_table(ivrs_base);
    |-> acpi_tb_put_table(table_desc);
    |-> acpi_tb_invalidate_table(table_desc);
    |-> acpi_tb_release_table(...)
    |-> acpi_os_unmap_memory
    |-> acpi_os_unmap_iomem
    |-> acpi_os_map_cleanup
    |-> synchronize_rcu_expedited

    The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
    which caused the code to try using workqueues before they were
    initialized, which did not go well.

    This commit therefore reworks RCU to permit synchronous grace periods
    to proceed during this mid-boot phase. This commit is therefore a
    fix to a regression introduced in v4.9, and is therefore being put
    forward post-merge-window in v4.10.

    This commit sets a flag from the existing rcu_scheduler_starting()
    function which causes all synchronous grace periods to take the expedited
    path. The expedited path now checks this flag, using the requesting task
    to drive the expedited grace period forward during the mid-boot phase.
    Finally, this flag is updated by a core_initcall() function named
    rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.

    Note that this arrangement assumes that tasks are not sent POSIX signals
    (or anything similar) from the time that the first task is spawned
    through core_initcall() time.

    Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")
    Reported-by: "Zheng, Lv"
    Reported-by: Borislav Petkov
    Signed-off-by: Paul E. McKenney
    Tested-by: Stan Kain
    Tested-by: Ivan
    Tested-by: Emanuel Castelo
    Tested-by: Bruno Pesavento
    Tested-by: Borislav Petkov
    Tested-by: Frederic Bezies
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     
  • commit 68cc085a4daaa32f7138de1e918331c05165a484 upstream.

    R8A7794 doesn't have Cortex-A15 CPUs, thus there's no Z clock...

    Fixes: 0dce5454d5c2 ("ARM: shmobile: Initial r8a7794 SoC device tree")
    Signed-off-by: Sergei Shtylyov
    Reviewed-by: Geert Uytterhoeven
    Signed-off-by: Simon Horman
    Signed-off-by: Greg Kroah-Hartman

    Sergei Shtylyov
     

20 Jan, 2017

9 commits

  • commit 3bee9ea1de687925d116670f036599cbed8b66b0 upstream.

    The BQ27510 and BQ27520 use a slightly different register map than the
    BQ27500, add a new type enum and add these gauges to it.

    Fixes: d74534c27775 ("power: bq27xxx_battery: Add support for additional bq27xxx family devices")
    Based-on-patch-by: Kenneth R. Crudup
    Signed-off-by: Andrew F. Davis
    Signed-off-by: Sebastian Reichel
    Signed-off-by: Greg Kroah-Hartman

    Andrew F. Davis
     
  • commit 9a05e7541c39680d28ecf91892338e074738d5fd upstream.

    With compilers which follow the C99 standard (like modern versions of
    gcc and clang), "extern inline" does the opposite thing from older
    versions of gcc (emits code for an externally linkable version of the
    inline function).

    "static inline" does the intended behavior in all cases instead.

    Description taken from commit 6d91857d4826 ("staging, rtl8192e,
    LLVMLinux: Change extern inline to static inline").

    This also fixes the following GCC warning when building with CONFIG_PM
    disabled:

    ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes]

    Fixes: d07ab6d11477 ("block: Add blk_set_runtime_active()")
    Reviewed-by: Mika Westerberg
    Signed-off-by: Tobias Klauser
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Tobias Klauser
     
  • commit 9e4d59ada4d602e78eee9fb5f898ce61fdddb446 upstream.

    This is a fix for Linux 4.10-rc1.

    In C language specification, a bit-field is interpreted as a signed or
    unsigned integer type consisting of the specified number of bits.

    In GCC manual, the range of a signed bit field of N bits is from
    -(2^N) / 2 to ((2^N) / 2) - 1
    https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Bit-Fields

    Therefore, when defined as 1 bit-field with signed type, variables can
    represents -1 and 0.

    The snd-soc-hdmi-codec module includes a structure which has signed type
    members with bit-fields. Codes of this module assign 0 and 1 to the
    members. This seems to result in implementation-dependent behaviours.

    As of v4.10-rc1 merge window, outside of sound subsystem, this structure
    is referred by below GPU modules.
    - tda998x
    - sti-drm
    - mediatek-drm-hdmi
    - msm

    As long as I review their codes relevant to the structure, the structure
    members are used just for condition statements and printk formats.
    My proposal of change is a bit intrusive to the printk formats but this
    may be acceptable.

    Totally, it's reasonable to use unsigned type for the structure members.
    This bug is detected by Sparse, static code analyzer with below warnings.

    ./include/sound/hdmi-codec.h:39:26: error: dubious one-bit signed bitfield
    ./include/sound/hdmi-codec.h:40:28: error: dubious one-bit signed bitfield
    ./include/sound/hdmi-codec.h:41:29: error: dubious one-bit signed bitfield
    ./include/sound/hdmi-codec.h:42:31: error: dubious one-bit signed bitfield

    Fixes: 09184118a8ab ("ASoC: hdmi-codec: Add hdmi-codec for external HDMI-encoders")
    Signed-off-by: Takashi Sakamoto
    Acked-by: Arnaud Pouliquen
    Signed-off-by: Mark Brown
    Signed-off-by: Greg Kroah-Hartman

    Takashi Sakamoto
     
  • commit ac0c7cf8be00f269f82964cf7b144ca3edc5dbc4 upstream.

    Enabling btrfs tracepoints leads to instant crash, as reported. The wq
    callbacks could free the memory and the tracepoints started to
    dereference the members to get to fs_info.

    The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2
    removed the tracepoints but we could preserve them by passing only the
    required data in a safe way.

    Fixes: bc074524e123 ("btrfs: prefix fsid to all trace events")
    Reported-by: Sebastian Andrzej Siewior
    Reviewed-by: Qu Wenruo
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     
  • commit 20b1e22d01a4b0b11d3a1066e9feb04be38607ec upstream.

    With the following commit:

    4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")

    ... efi_bgrt_init() calls into the memblock allocator through
    efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.

    Indeed, KASAN reports a bad read access later on in efi_free_boot_services():

    BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
    at addr ffff88022de12740
    Read of size 4 by task swapper/0/0
    page:ffffea0008b78480 count:0 mapcount:-127
    mapping: (null) index:0x1 flags: 0x5fff8000000000()
    [...]
    Call Trace:
    dump_stack+0x68/0x9f
    kasan_report_error+0x4c8/0x500
    kasan_report+0x58/0x60
    __asan_load4+0x61/0x80
    efi_free_boot_services+0xae/0x24c
    start_kernel+0x527/0x562
    x86_64_start_reservations+0x24/0x26
    x86_64_start_kernel+0x157/0x17a
    start_cpu+0x5/0x14

    The instruction at the given address is the first read from the memmap's
    memory, i.e. the read of md->type in efi_free_boot_services().

    Note that the writes earlier in efi_arch_mem_reserve() don't splat because
    they're done through early_memremap()ed addresses.

    So, after memblock is gone, allocations should be done through the "normal"
    page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
    it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
    of consistency, from efi_fake_memmap() as well.

    Note that for the latter, the memmap allocations cease to be page aligned.
    This isn't needed though.

    Tested-by: Dan Williams
    Signed-off-by: Nicolai Stange
    Reviewed-by: Ard Biesheuvel
    Cc: Dave Young
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Mika Penttilä
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
    Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • commit 0100a3e67a9cef64d72cd3a1da86f3ddbee50363 upstream.

    Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
    (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.

    These machines fail to boot after the following commit,

    commit 8e80632fb23f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")

    Fix this by removing such bogus entries from the memory map.

    Furthermore, currently the log output for this case (with efi=debug)
    looks like:

    [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)

    This is clearly wrong, and also not as informative as it could be. This
    patch changes it so that if we find obviously invalid memory map
    entries, we print an error and skip those entries. It also detects the
    display of the address range calculation overflow, so the new output is:

    [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
    [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid)

    It also detects memory map sizes that would overflow the physical
    address, for example phys_addr=0xfffffffffffff000 and
    num_pages=0x0200000000000001, and prints:

    [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
    [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)

    It then removes these entries from the memory map.

    Signed-off-by: Peter Jones
    Signed-off-by: Ard Biesheuvel
    [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
    Signed-off-by: Matt Fleming
    [Matt: Include bugzilla info in commit log]
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Jones
     
  • commit b6416e61012429e0277bd15a229222fd17afc1c1 upstream.

    Modules that use static_key_deferred need a way to synchronize with
    any delayed work that is still pending when the module is unloaded.
    Introduce static_key_deferred_flush() which flushes any pending
    jump label updates.

    Signed-off-by: David Matlack
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    David Matlack
     
  • commit f05714293a591038304ddae7cb0dd747bb3786cc upstream.

    During developemnt for zram-swap asynchronous writeback, I found strange
    corruption of compressed page, resulting in:

    Modules linked in: zram(E)
    CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G E 4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff88007620b840 task.stack: ffff880078090000
    RIP: set_freeobj.part.43+0x1c/0x1f
    RSP: 0018:ffff880078093ca8 EFLAGS: 00010246
    RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
    RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
    RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
    R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
    FS: 0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
    Call Trace:
    obj_malloc+0x22b/0x260
    zs_malloc+0x1e4/0x580
    zram_bvec_rw+0x4cd/0x830 [zram]
    page_requests_rw+0x9c/0x130 [zram]
    zram_thread+0xe6/0x173 [zram]
    kthread+0xca/0xe0
    ret_from_fork+0x25/0x30

    With investigation, it reveals currently stable page doesn't support
    anonymous page. IOW, reuse_swap_page can reuse the page without waiting
    writeback completion so it can overwrite page zram is compressing.

    Unfortunately, zram has used per-cpu stream feature from v4.7.
    It aims for increasing cache hit ratio of scratch buffer for
    compressing. Downside of that approach is that zram should ask
    memory space for compressed page in per-cpu context which requires
    stricted gfp flag which could be failed. If so, it retries to
    allocate memory space out of per-cpu context so it could get memory
    this time and compress the data again, copies it to the memory space.

    In this scenario, zram assumes the data should never be changed
    but it is not true unless stable page supports. So, If the data is
    changed under us, zram can make buffer overrun because second
    compression size could be bigger than one we got in previous trial
    and blindly, copy bigger size object to smaller buffer which is
    buffer overrun. The overrun breaks zsmalloc free object chaining
    so system goes crash like above.

    I think below is same problem.
    https://bugzilla.suse.com/show_bug.cgi?id=997574

    Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
    writeback in there so the approach in this patch is simply return false if
    we found it needs stable page. Although it increases memory footprint
    temporarily, it happens rarely and it should be reclaimed easily althoug
    it happened. Also, It would be better than waiting of IO completion,
    which is critial path for application latency.

    Fixes: da9556a2367c ("zram: user per-cpu compression streams")
    Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
    Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Acked-by: Hugh Dickins
    Cc: Sergey Senozhatsky
    Cc: Darrick J. Wong
    Cc: Takashi Iwai
    Cc: Hyeoncheol Lee
    Cc:
    Cc: Sangseok Lee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Minchan Kim
     
  • commit b4536f0c829c8586544c94735c343f9b5070bd01 upstream.

    Nils Holland and Klaus Ethgen have reported unexpected OOM killer
    invocations with 32b kernel starting with 4.8 kernels

    kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
    kworker/u4:5 cpuset=/ mems_allowed=0
    CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
    [...]
    Mem-Info:
    active_anon:58685 inactive_anon:90 isolated_anon:0
    active_file:274324 inactive_file:281962 isolated_file:0
    unevictable:0 dirty:649 writeback:0 unstable:0
    slab_reclaimable:40662 slab_unreclaimable:17754
    mapped:7382 shmem:202 pagetables:351 bounce:0
    free:206736 free_pcp:332 free_cma:0
    Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
    DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    lowmem_reserve[]: 0 813 3474 3474
    Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
    lowmem_reserve[]: 0 0 21292 21292
    HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

    the oom killer is clearly pre-mature because there there is still a lot
    of page cache in the zone Normal which should satisfy this lowmem
    request. Further debugging has shown that the reclaim cannot make any
    forward progress because the page cache is hidden in the active list
    which doesn't get rotated because inactive_list_is_low is not memcg
    aware.

    The code simply subtracts per-zone highmem counters from the respective
    memcg's lru sizes which doesn't make any sense. We can simply end up
    always seeing the resulting active and inactive counts 0 and return
    false. This issue is not limited to 32b kernels but in practice the
    effect on systems without CONFIG_HIGHMEM would be much harder to notice
    because we do not invoke the OOM killer for allocations requests
    targeting < ZONE_NORMAL.

    Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
    and subtract per-memcg highmem counts when memcg is enabled. Introduce
    helper lruvec_zone_lru_size which redirects to either zone counters or
    mem_cgroup_get_zone_lru_size when appropriate.

    We are losing empty LRU but non-zero lru size detection introduced by
    ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because
    of the inherent zone vs. node discrepancy.

    Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
    Link: http://lkml.kernel.org/r/20170104100825.3729-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reported-by: Nils Holland
    Tested-by: Nils Holland
    Reported-by: Klaus Ethgen
    Acked-by: Minchan Kim
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Reviewed-by: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     

15 Jan, 2017

1 commit

  • [ Upstream commit 57ea52a865144aedbcd619ee0081155e658b6f7d ]

    The GRO fast path caches the frag0 address. This address becomes
    invalid if frag0 is modified by pskb_may_pull or its variants.
    So whenever that happens we must disable the frag0 optimization.

    This is usually done through the combination of gro_header_hard
    and gro_header_slow, however, the IPv6 extension header path did
    the pulling directly and would continue to use the GRO fast path
    incorrectly.

    This patch fixes it by disabling the fast path when we enter the
    IPv6 extension header path.

    Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
    Reported-by: Slava Shwartsman
    Signed-off-by: Herbert Xu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     

12 Jan, 2017

7 commits

  • commit 9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9 upstream.

    When the dummy timer callback is invoked before the real timer callbacks,
    then it tries to install that timer for the starting CPU. If the platform
    does not have a broadcast timer installed the installation fails with a
    kernel crash. The crash happens due to a unconditional deference of the non
    available broadcast device. This needs to be fixed in the timer core code.

    But even when this is fixed in the core code then installing the dummy
    timer before the real timers is a pointless exercise.

    Move it to the end of the callback list.

    Fixes: 00c1d17aab51 ("clocksource/dummy_timer: Convert to hotplug state machine")
    Reported-and-tested-by: Mason
    Signed-off-by: Thomas Gleixner
    Cc: Mark Rutland
    Cc: Anna-Maria Gleixner
    Cc: Richard Cochran
    Cc: Sebastian Andrzej Siewior
    Cc: Daniel Lezcano
    Cc: Peter Zijlstra ,
    Cc: Sebastian Frias
    Cc: Thibaud Cornic
    Cc: Robin Murphy
    Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 7254383341bc6e1a61996accd836009f0c922b21 upstream.

    Add Mellanox device IDs for use by the mlx4 driver and INTx quirks.

    [bhelgaas: sorted and adapted from
    http://lkml.kernel.org/r/1478011644-12080-1-git-send-email-noaos@mellanox.com]
    Signed-off-by: Noa Osherovich
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Greg Kroah-Hartman

    Noa Osherovich
     
  • commit 7f847dd31736f1284538e54f46cf10e63929eb7f upstream.

    The slp_s0_residency_usec debugfs file currently uses
    DEFINE_DEBUGFS_ATTRIBUTE(), but that macro cannot really be used to
    define files outside of the debugfs code, as it has no reference to
    the get/set functions if CONFIG_DEBUG_FS is not defined:

    drivers/platform/x86/intel_pmc_core.c:80:12: error: ‘pmc_core_dev_state_get’ defined but not used [-Werror=unused-function]

    This fixes the macro to always contain the reference, and instead rely
    on the stubbed-out debugfs_create_file to not actually refer to
    its arguments so the compiler can still drop the reference.
    This works because the attribute definition is always 'static',
    and the dead-code removal silently drops all static symbols
    that are not used.

    Fixes: c64688081490 ("debugfs: add support for self-protecting attribute file fops")
    Fixes: df2294fb6428 ("intel_pmc_core: Convert to DEFINE_DEBUGFS_ATTRIBUTE")
    Signed-off-by: Arnd Bergmann
    [nicstange@gmail.com: Add dummy implementations of debugfs_attr_read() and
    debugfs_attr_write() in order to protect against possibly broken dead
    code elimination and to improve readability.
    Correct CONFIG_DEBUGFS_FS -> CONFIG_DEBUG_FS typo in changelog.]
    Signed-off-by: Nicolai Stange
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 2fa436b3a2a7009c11a3bc03fe0ff4c26e80fd87 upstream.

    NL80211_ATTR_MAC was used to set both the specific BSSID to be scanned
    and the random MAC address to be used when privacy is enabled. When both
    the features are enabled, both the BSSID and the local MAC address were
    getting same value causing Probe Request frames to go with unintended
    DA. Hence, this has been fixed by using a different NL80211_ATTR_BSSID
    attribute to set the specific BSSID (which was the more recent addition
    in cfg80211) for a scan.

    Backwards compatibility with old userspace software is maintained to
    some extent by allowing NL80211_ATTR_MAC to be used to set the specific
    BSSID when scanning without enabling random MAC address use.

    Scanning with random source MAC address was introduced by commit
    ad2b26abc157 ("cfg80211: allow drivers to support random MAC addresses
    for scan") and the issue was introduced with the addition of the second
    user for the same attribute in commit 818965d39177 ("cfg80211: Allow a
    scan request for a specific BSSID").

    Fixes: 818965d39177 ("cfg80211: Allow a scan request for a specific BSSID")
    Signed-off-by: Vamsi Krishna
    Signed-off-by: Jouni Malinen
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Vamsi Krishna
     
  • commit 65e4345c8ef8811bbb4860fe5f2df10646b7f2e1 upstream.

    The LIS3LV02 has a special bit that need to be set to get the
    read values left aligned. Before this patch we get gibberish
    like this:

    iio_generic_buffer -a -c10 -n lis3lv02dl_accel
    (...)
    0.000000 -0.010042 -0.642688 19155832931907
    0.000000 -0.010042 -0.642688 19155858751073

    Which is because we read a raw value for 1g as 64 which is
    the nominal 1024 for 1g shifted 4 bits to the left by being
    right-aligned rather than left aligned.

    Since all other sensors are left aligned, add some code to
    set the special DAS (data alignment setting) bit to 1 so that
    the right value is now read like this:

    iio_generic_buffer -a -c10 -n lis3lv02dl_accel
    (...)
    0.000000 -0.147095 -10.120135 24761614364956
    -0.029419 -0.176514 -10.120135 24761631624540

    The scaling was weird as well: we have a gain of 1000 for 1g
    and 3000 for 6g. I don't even remember how I came up with the
    old values but they are wrong.

    Fixes: 3acddf74f807 ("iio: st-sensors: add support for lis3lv02d accelerometer")
    Cc: Lorenzo Bianconi
    Cc: Giuseppe Barba
    Cc: Denis Ciocca
    Signed-off-by: Linus Walleij
    Signed-off-by: Jonathan Cameron
    Signed-off-by: Greg Kroah-Hartman

    Linus Walleij
     
  • commit 982555fc26f9d8bcdbd5f9db0378fe0682eb4188 upstream.

    For isoc endpoint descriptor, the wMaxPacketSize is not real max packet
    size (see Table 9-13. Standard Endpoint Descriptor, USB 2.0 specifcation),
    it may contain the number of packet, so the real max packet should be
    ep->desc->wMaxPacketSize && 0x7ff.

    Cc: Felipe F. Tonello
    Cc: Felipe Balbi
    Fixes: 16b114a6d797 ("usb: gadget: fix usb_ep_align_maybe
    endianness and new usb_ep_aligna")
    Signed-off-by: Peter Chen
    Signed-off-by: Felipe Balbi
    Signed-off-by: Greg Kroah-Hartman

    Peter Chen
     
  • commit c7858bf16c0b2cc62f475f31e6df28c3a68da1d6 upstream.

    The asm-prototypes.h file is used to provide dummy function declarations
    for genksyms, when processing asm files with EXPORT_SYMBOL. Make sure
    that any architecture defines get out of our way. x86 currently has an
    issue with memcpy on 64bit with CONFIG_KMEMCHECK=y and with
    memset/__memset on 32bit:

    $ cat init/test.c
    #include
    $ make -s init/test.o
    In file included from ./arch/x86/include/asm/string.h:4:0,
    from ./include/linux/string.h:18,
    from ./include/linux/bitmap.h:8,
    from ./include/linux/cpumask.h:11,
    from ./arch/x86/include/asm/cpumask.h:4,
    from ./arch/x86/include/asm/msr.h:10,
    from ./arch/x86/include/asm/processor.h:20,
    from ./arch/x86/include/asm/cpufeature.h:4,
    from ./arch/x86/include/asm/thread_info.h:52,
    from ./include/linux/thread_info.h:25,
    from ./arch/x86/include/asm/preempt.h:6,
    from ./include/linux/preempt.h:59,
    from ./include/linux/spinlock.h:50,
    from ./include/linux/seqlock.h:35,
    from ./include/linux/time.h:5,
    from ./include/uapi/linux/timex.h:56,
    from ./include/linux/timex.h:56,
    from ./include/linux/sched.h:19,
    from ./include/linux/uaccess.h:4,
    from ./arch/x86/include/asm/asm-prototypes.h:2,
    from init/test.c:1:
    ./arch/x86/include/asm/string_64.h:52:47: error: expected declaration specifiers or ‘...’ before ‘(’ token
    #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len))
    ./include/asm-generic/asm-prototypes.h:6:14: note: in expansion of macro ‘memcpy’
    extern void *memcpy(void *, const void *, __kernel_size_t);

    ^
    ...

    During real build, this manifests itself by genksyms segfaulting.

    Fixes: 334bb7738764 ("x86/kbuild: enable modversions for symbols exported from asm")
    Reported-and-tested-by: Borislav Petkov
    Cc: Adam Borowski
    Signed-off-by: Michal Marek
    Signed-off-by: Greg Kroah-Hartman

    Michal Marek
     

09 Jan, 2017

2 commits

  • commit fba332b079029c2f4f7e84c1c1cd8e3867310c90 upstream.

    Code that dereferences the struct net_device ip_ptr member must be
    protected with an in_dev_get() / in_dev_put() pair. Hence insert
    calls to these functions.

    Fixes: commit 7b85627b9f02 ("IB/cma: IBoE (RoCE) IP-based GID addressing")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Moni Shoua
    Cc: Or Gerlitz
    Cc: Roland Dreier
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • commit e6f462df9acd2a3295e5d34eb29e2823220cf129 upstream.

    When mac80211 abandons an association attempt, it may free
    all the data structures, but inform cfg80211 and userspace
    about it only by sending the deauth frame it received, in
    which case cfg80211 has no link to the BSS struct that was
    used and will not cfg80211_unhold_bss() it.

    Fix this by providing a way to inform cfg80211 of this with
    the BSS entry passed, so that it can clean up properly, and
    use this ability in the appropriate places in mac80211.

    This isn't ideal: some code is more or less duplicated and
    tracing is missing. However, it's a fairly small change and
    it's thus easier to backport - cleanups can come later.

    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     

06 Jan, 2017

7 commits

  • commit 334bb773876403eae3457d81be0b8ea70f8e4ccc upstream.

    Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds
    modversion support for symbols exported from asm files. Architectures
    must include C-style declarations for those symbols in asm/asm-prototypes.h
    in order for them to be versioned.

    Add these declarations for x86, and an architecture-independent file that
    can be used for common symbols.

    With f27c2f6 reverting 8ab2ae6 ("default exported asm symbols to zero") we
    produce a scary warning on x86, this commit fixes that.

    Signed-off-by: Adam Borowski
    Tested-by: Kalle Valo
    Acked-by: Nicholas Piggin
    Tested-by: Peter Wu
    Tested-by: Oliver Hartkopp
    Signed-off-by: Michal Marek
    Signed-off-by: Greg Kroah-Hartman

    Adam Borowski
     
  • commit 91291d9ad92faa65a56a9a19d658d8049b78d3d4 upstream.

    Joonyoung Shim reported an interesting problem on his ARM octa-core
    Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator()
    was failing for a struct device for which dev_pm_opp_set_regulator() is
    called earlier.

    This happened because an earlier call to
    dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file)
    removed all the entries from opp_table->dev_list apart from the last CPU
    device in the cpumask of CPUs sharing the OPP.

    But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator()
    routines get CPU device for the first CPU in the cpumask. And so the OPP
    core failed to find the OPP table for the struct device.

    This patch attempts to fix this problem by returning a pointer to the
    opp_table from dev_pm_opp_set_regulator() and using that as the
    parameter to dev_pm_opp_put_regulator(). This ensures that the
    dev_pm_opp_put_regulator() doesn't fail to find the opp table.

    Note that similar design problem also exists with other
    dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and
    so we don't need to update them for now.

    Reported-by: Joonyoung Shim
    Signed-off-by: Stephen Boyd
    Signed-off-by: Viresh Kumar
    [ Viresh: Wrote commit log and tested on exynos 5250 ]
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     
  • commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3 upstream.

    It is the reasonable expectation that if an executable file is not
    readable there will be no way for a user without special privileges to
    read the file. This is enforced in ptrace_attach but if ptrace
    is already attached before exec there is no enforcement for read-only
    executables.

    As the only way to read such an mm is through access_process_vm
    spin a variant called ptrace_access_vm that will fail if the
    target process is not being ptraced by the current process, or
    the current process did not have sufficient privileges when ptracing
    began to read the target processes mm.

    In the ptrace implementations replace access_process_vm by
    ptrace_access_vm. There remain several ptrace sites that still use
    access_process_vm as they are reading the target executables
    instructions (for kernel consumption) or register stacks. As such it
    does not appear necessary to add a permission check to those calls.

    This bug has always existed in Linux.

    Fixes: v1.0
    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.

    When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
    overlooked. This can result in incorrect behavior when an application
    like strace traces an exec of a setuid executable.

    Further PT_PTRACE_CAP does not have enough information for making good
    security decisions as it does not report which user namespace the
    capability is in. This has already allowed one mistake through
    insufficient granulariy.

    I found this issue when I was testing another corner case of exec and
    discovered that I could not get strace to set PT_PTRACE_CAP even when
    running strace as root with a full set of caps.

    This change fixes the above issue with strace allowing stracing as
    root a setuid executable without disabling setuid. More fundamentaly
    this change allows what is allowable at all times, by using the correct
    information in it's decision.

    Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream.

    During exec dumpable is cleared if the file that is being executed is
    not readable by the user executing the file. A bug in
    ptrace_may_access allows reading the file if the executable happens to
    enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
    unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

    This problem is fixed with only necessary userspace breakage by adding
    a user namespace owner to mm_struct, captured at the time of exec, so
    it is clear in which user namespace CAP_SYS_PTRACE must be present in
    to be able to safely give read permission to the executable.

    The function ptrace_may_access is modified to verify that the ptracer
    has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
    This ensures that if the task changes it's cred into a subordinate
    user namespace it does not become ptraceable.

    The function ptrace_attach is modified to only set PT_PTRACE_CAP when
    CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
    PT_PTRACE_CAP is to be a flag to note that whatever permission changes
    the task might go through the tracer has sufficient permissions for
    it not to be an issue. task->cred->user_ns is always the same
    as or descendent of mm->user_ns. Which guarantees that having
    CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
    credentials.

    To prevent regressions mm->dumpable and mm->user_ns are not considered
    when a task has no mm. As simply failing ptrace_may_attach causes
    regressions in privileged applications attempting to read things
    such as /proc//stat

    Acked-by: Kees Cook
    Tested-by: Cyrill Gorcunov
    Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit f84df2a6f268de584a201e8911384a2d244876e3 upstream.

    When the user namespace support was merged the need to prevent
    ptrace from revealing the contents of an unreadable executable
    was overlooked.

    Correct this oversight by ensuring that the executed file
    or files are in mm->user_ns, by adjusting mm->user_ns.

    Use the new function privileged_wrt_inode_uidgid to see if
    the executable is a member of the user namespace, and as such
    if having CAP_SYS_PTRACE in the user namespace should allow
    tracing the executable. If not update mm->user_ns to
    the parent user namespace until an appropriate parent is found.

    Reported-by: Jann Horn
    Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream.

    Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
    notifiers when HOTPLUG_CPU=y while the registration might succeed even
    when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
    might keep a stale notifier on the list on the manual clean up during
    the pool tear down and thus corrupt the list. Resulting in the following

    [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
    [ 144.971337] IP: [] raw_notifier_chain_register+0x1b/0x40

    [ 145.122628] Call Trace:
    [ 145.125086] [] __register_cpu_notifier+0x18/0x20
    [ 145.131350] [] zswap_pool_create+0x273/0x400
    [ 145.137268] [] __zswap_param_set+0x1fc/0x300
    [ 145.143188] [] ? trace_hardirqs_on+0xd/0x10
    [ 145.149018] [] ? kernel_param_lock+0x28/0x30
    [ 145.154940] [] ? __might_fault+0x4f/0xa0
    [ 145.160511] [] zswap_compressor_param_set+0x17/0x20
    [ 145.167035] [] param_attr_store+0x5c/0xb0
    [ 145.172694] [] module_attr_store+0x1d/0x30
    [ 145.178443] [] sysfs_kf_write+0x4f/0x70
    [ 145.183925] [] kernfs_fop_write+0x149/0x180
    [ 145.189761] [] __vfs_write+0x18/0x40
    [ 145.194982] [] vfs_write+0xb2/0x1a0
    [ 145.200122] [] SyS_write+0x52/0xa0
    [ 145.205177] [] entry_SYSCALL_64_fastpath+0x12/0x17

    This can be even triggered manually by changing
    /sys/module/zswap/parameters/compressor multiple times.

    Fix this issue by making unregister APIs symmetric to the register so
    there are no surprises.

    Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
    Reported-and-tested-by: Yu Zhao
    Signed-off-by: Michal Hocko
    Cc: linux-mm@kvack.org
    Cc: Andrew Morton
    Cc: Dan Streetman
    Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     

11 Dec, 2016

1 commit

  • Pull networking fixes from David Miller:

    1) Limit the number of can filters to avoid > MAX_ORDER allocations.
    Fix from Marc Kleine-Budde.

    2) Limit GSO max size in netvsc driver to avoid problems with NVGRE
    configurations. From Stephen Hemminger.

    3) Return proper error when memory allocation fails in
    ser_gigaset_init(), from Dan Carpenter.

    4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao
    Feng.

    5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers, from
    Florian Fainelli.

    6) Handle probe deferral properly in smsc911x driver.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: mlx5: Fix Kconfig help text
    net: smsc911x: back out silently on probe deferrals
    ibmveth: set correct gso_size and gso_type
    net: ethernet: cpmac: Call SET_NETDEV_DEV()
    net: ethernet: lantiq_etop: Call SET_NETDEV_DEV()
    vhost-vsock: fix orphan connection reset
    cxgb4/cxgb4vf: Assign netdev->dev_port with port ID
    driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
    ser_gigaset: return -ENOMEM on error instead of success
    NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
    can: peak: fix bad memory access and free sequence
    phy: Don't increment MDIO bus refcount unless it's a different owner
    netvsc: reduce maximum GSO size
    drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
    can: raw: raw_setsockopt: limit number of can_filter that can be set

    Linus Torvalds
     

10 Dec, 2016

1 commit

  • Pull libnvdimm fixes from Dan Williams:
    "Several fixes to the DSM (ACPI device specific method) marshaling
    implementation.

    I consider these urgent enough to send for 4.9 consideration since
    they fix the kernel's handling of ARS (Address Range Scrub) commands.
    Especially for platforms without machine-check-recovery capabilities,
    successful execution of ARS commands enables the platform to
    potentially break out of an infinite reboot problem if a media error
    is present in the boot path. There is also a one line fix for a
    device-dax read-only mapping regression.

    Commits 9a901f5495e2 ("acpi, nfit: fix extended status translations
    for ACPI DSMs") and 325896ffdf90 ("device-dax: fix private mapping
    restriction, permit read-only") are true regression fixes for changes
    introduced this cycle.

    Commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status
    output length handling") fixes the kernel's handling of zero-length
    results, this never would have worked in the past, but we only just
    recently discovered a BIOS implementation that emits this arguably
    spec non-compliant result.

    The remaining two commits are additional fall out from thinking
    through the implications of a zero / truncated length result of the
    ARS Status command.

    In order to mitigate the risk that these changes introduce yet more
    regressions they are backstopped by a new unit test in commit
    a7de92dac9f0 ("tools/testing/nvdimm: unit test acpi_nfit_ctl()") that
    mocks up inputs to acpi_nfit_ctl()"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    device-dax: fix private mapping restriction, permit read-only
    tools/testing/nvdimm: unit test acpi_nfit_ctl()
    acpi, nfit: fix bus vs dimm confusion in xlat_status
    acpi, nfit: validate ars_status output buffer size
    acpi, nfit, libnvdimm: fix / harden ars_status output length handling
    acpi, nfit: fix extended status translations for ACPI DSMs

    Linus Torvalds
     

09 Dec, 2016

1 commit

  • Telit LE922A MBIM based composition does not work properly
    with altsetting toggle done in cdc_ncm_bind_common.

    This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk
    to avoid this procedure that, instead, is mandatory for
    other modems.

    Signed-off-by: Daniele Palmas
    Reviewed-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Daniele Palmas