09 Feb, 2017

28 commits

  • commit a96dfddbcc04336bbed50dc2b24823e45e09e80c upstream.

    Reading a sysfs "memoryN/valid_zones" file leads to the following oops
    when the first page of a range is not backed by struct page.
    show_valid_zones() assumes that 'start_pfn' is always valid for
    page_zone().

    BUG: unable to handle kernel paging request at ffffea017a000000
    IP: show_valid_zones+0x6f/0x160

    This issue may happen on x86-64 systems with 64GiB or more memory since
    their memory block size is bumped up to 2GiB. [1] An example of such
    systems is desribed below. 0x3240000000 is only aligned by 1GiB and
    this memory block starts from 0x3200000000, which is not backed by
    struct page.

    BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

    Since test_pages_in_a_zone() already checks holes, fix this issue by
    extending this function to return 'valid_start' and 'valid_end' for a
    given range. show_valid_zones() then proceeds with the valid range.

    [1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

    Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Greg Kroah-Hartman
    Cc: Zhang Zhen
    Cc: Reza Arbab
    Cc: David Rientjes
    Cc: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • commit deb88a2a19e85842d79ba96b05031739ec327ff4 upstream.

    Patch series "fix a kernel oops when reading sysfs valid_zones", v2.

    A sysfs memory file is created for each 2GiB memory block on x86-64 when
    the system has 64GiB or more memory. [1] When the start address of a
    memory block is not backed by struct page, i.e. a memory range is not
    aligned by 2GiB, reading its 'valid_zones' attribute file leads to a
    kernel oops. This issue was observed on multiple x86-64 systems with
    more than 64GiB of memory. This patch-set fixes this issue.

    Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not
    test the start section.

    Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone()
    to return valid [start, end).

    Note for stable kernels: The memory block size change was made by commit
    bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64
    systems"), which was accepted to 3.9. However, this patch-set depends
    on (and fixes) the change to test_pages_in_a_zone() made by commit
    5f0f2887f4de ("mm/memory_hotplug.c: check for missing sections in
    test_pages_in_a_zone()"), which was accepted to 4.4.

    So, I recommend that we backport it up to 4.4.

    [1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

    This patch (of 2):

    test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by
    section since 'sec_end_pfn' is set equal to 'pfn'. Since this function
    is called for testing the range of a sysfs memory file, 'start_pfn' is
    always aligned by section.

    Fix it by properly setting 'sec_end_pfn' to the next section pfn.

    Also make sure that this function returns 1 only when the range belongs
    to a zone.

    Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Andrew Banman
    Cc: Reza Arbab
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Toshi Kani
     
  • commit 81ddd8c0c5e1cb41184d66567140cb48c53eb3d1 upstream.

    Reviewed-by: Jeff Layton

    file_info_lock is not initalized in initiate_cifs_search(), leading to the
    following splat after a simple "mount.cifs ... dir && ls dir/":

    BUG: spinlock bad magic on CPU#0, ls/486
    lock: 0xffff880009301110, .magic: 00000000, .owner: /-1, .owner_cpu: 0
    CPU: 0 PID: 486 Comm: ls Not tainted 4.9.0 #27
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    ffffc900042f3db0 ffffffff81327533 0000000000000000 ffff880009301110
    ffffc900042f3dd0 ffffffff810baf75 ffff880009301110 ffffffff817ae077
    ffffc900042f3df0 ffffffff810baff6 ffff880009301110 ffff880008d69900
    Call Trace:
    [] dump_stack+0x65/0x92
    [] spin_dump+0x85/0xe0
    [] spin_bug+0x26/0x30
    [] do_raw_spin_lock+0xe9/0x130
    [] _raw_spin_lock+0x1f/0x30
    [] cifs_closedir+0x4d/0x100
    [] __fput+0x5d/0x160
    [] ____fput+0xe/0x10
    [] task_work_run+0x7e/0xa0
    [] exit_to_usermode_loop+0x92/0xa0
    [] syscall_return_slowpath+0x49/0x50
    [] entry_SYSCALL_64_fastpath+0xa7/0xa9

    Fixes: 3afca265b5f53a0 ("Clarify locking of cifs file and tcon structures and make more granular")
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Rabin Vincent
     
  • commit d7b028f56a971a2e4d8d7887540a144eeefcd4ab upstream.

    Add zswap_init_failed bool that prevents changing any of the module
    params, if init_zswap() fails, and set zswap_enabled to false. Change
    'enabled' param to a callback, and check zswap_init_failed before
    allowing any change to 'enabled', 'zpool', or 'compressor' params.

    Any driver that is built-in to the kernel will not be unloaded if its
    init function returns error, and its module params remain accessible for
    users to change via sysfs. Since zswap uses param callbacks, which
    assume that zswap has been initialized, changing the zswap params after
    a failed initialization will result in WARNING due to the param
    callbacks expecting a pool to already exist. This prevents that by
    immediately exiting any of the param callbacks if initialization failed.

    This was reported here:
    https://marc.info/?l=linux-mm&m=147004228125528&w=4

    And fixes this WARNING:
    [ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60

    The warning is just noise, and not serious. However, when init fails,
    zswap frees all its percpu dstmem pages and its kmem cache. The kmem
    cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
    the percpu dstmem pages are definitely a problem, as they're used as
    temporary buffer for compressed pages before copying into place in the
    zpool.

    If the user does get zswap enabled after an init failure, then zswap
    will likely Oops on the first page it tries to compress (or worse, start
    corrupting memory).

    Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
    Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Reported-by: Marcin Miroslaw
    Cc: Seth Jennings
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Streetman
     
  • commit 034dd34ff4916ec1f8f74e39ca3efb04eab2f791 upstream.

    Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
    (4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
    to mount the server with krb5 where the server doesn't have the
    rpcsec_gss_krb5 module built."

    The problem is that rsci.cred is copied from a svc_cred structure that
    gss_proxy didn't properly initialize. Fix that.

    [120408.542387] general protection fault: 0000 [#1] SMP
    ...
    [120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
    [120408.567037] Hardware name: VMware, Inc. VMware Virtual =
    Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
    [120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
    [120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
    ...
    [120408.584946] ? rsc_free+0x55/0x90 [auth_rpcgss]
    [120408.585901] gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
    [120408.587017] svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
    [120408.588257] ? __enqueue_entity+0x6c/0x70
    [120408.589101] svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
    [120408.590212] ? try_to_wake_up+0x4a/0x360
    [120408.591036] ? wake_up_process+0x15/0x20
    [120408.592093] ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
    [120408.593177] svc_authenticate+0xe1/0x100 [sunrpc]
    [120408.594168] svc_process_common+0x203/0x710 [sunrpc]
    [120408.595220] svc_process+0x105/0x1c0 [sunrpc]
    [120408.596278] nfsd+0xe9/0x160 [nfsd]
    [120408.597060] kthread+0x101/0x140
    [120408.597734] ? nfsd_destroy+0x60/0x60 [nfsd]
    [120408.598626] ? kthread_park+0x90/0x90
    [120408.599448] ret_from_fork+0x22/0x30

    Fixes: 1d658336b05f "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
    Cc: Simo Sorce
    Reported-by: Olga Kornievskaia
    Tested-by: Olga Kornievskaia
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • commit d19fb70dd68c4e960e2ac09b0b9c79dfdeefa726 upstream.

    nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

    If nfsd doesn't go through init_lock_stateid() and put stateid at end,
    there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

    This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

    Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..."
    Signed-off-by: Kinglong Mee
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Kinglong Mee
     
  • commit a0615a16f7d0ceb5804d295203c302d496d8ee91 upstream.

    When setting a 2MB pte, radix__map_kernel_page() is using the address

    ptep = (pte_t *)pudp;

    Fix this conversion to use pmdp instead. Use pmdp_ptep() to do this
    instead of casting the pointer.

    Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Reza Arbab
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Reza Arbab
     
  • commit b5fa0f7f88edcde37df1807fdf9ff10ec787a60e upstream.

    Anton says: In commit 4db7327194db ("powerpc: Add option to use jump
    label for cpu_has_feature()") and commit c12e6f24d413 ("powerpc: Add
    option to use jump label for mmu_has_feature()") we added:

    BUILD_BUG_ON(!__builtin_constant_p(feature))

    to cpu_has_feature() and mmu_has_feature() in order to catch usage
    issues (such as cpu_has_feature(cpu_has_feature(X), which has happened
    once in the past). Unfortunately LLVM isn't smart enough to resolve
    this, and it errors out.

    I work around it in my clang/LLVM builds of the kernel, but I have just
    discovered that it causes a lot of issues for the bcc (eBPF) trace tool
    (which uses LLVM).

    For now just #ifdef it away for clang builds.

    Fixes: 4db7327194db ("powerpc: Add option to use jump label for cpu_has_feature()")
    Fixes: c12e6f24d413 ("powerpc: Add option to use jump label for mmu_has_feature()")
    Reported-by: Anton Blanchard
    Tested-by: Naveen N. Rao
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • commit af2b7fa17eb92e52b65f96604448ff7a2a89ee99 upstream.

    prom_init.c calls 'instance-to-package' twice, but the return
    is not checked during prom_find_boot_cpu(). The result is then
    passed to prom_getprop(), which could be PROM_ERROR. Add a return check
    to prevent this.

    This was found on a pasemi system, where CFE doesn't have a working
    'instance-to package' prom call.

    Before Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') the area
    around addr 0 was mostly 0's and this doesn't cause a problem. Once the
    macro 'FIXUP_ENDIAN' has been added to head_64.S, the low memory area
    now has non-zero values, which cause the prom_getprop() call
    to hang.

    mpe: Also confirmed that under SLOF if 'instance-to-package' did fail
    with PROM_ERROR we would crash in SLOF. So the bug is not specific to
    CFE, it's just that other open firmwares don't trigger it because they
    have a working 'instance-to-package'.

    Fixes: 5c0484e25ec0 ("powerpc: Endian safe trampoline")
    Signed-off-by: Darren Stevens
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Darren Stevens
     
  • commit f05fea5b3574a5926c53865eea27139bb40b2f2b upstream.

    In __eeh_clear_pe_frozen_state(), we should pass the flag's value
    instead of its address to eeh_unfreeze_pe(). The isolated flag is
    cleared if no error returned from __eeh_clear_pe_frozen_state(). We
    never observed the error from the function. So the isolated flag should
    have been always cleared, no real issue is caused because of the misused
    @flag.

    This fixes the code by passing the value of @flag to eeh_unfreeze_pe().

    Fixes: 5cfb20b96f6 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
    Signed-off-by: Gavin Shan
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Gavin Shan
     
  • commit 2dae99558e86894e9e5dbf097477baaa5eb70134 upstream.

    For an ATA device supporting the sense data reporting feature set, a
    failed command will trigger the execution of ata_eh_request_sense if
    the result task file of the failed command has the ATA_SENSE bit set
    (sense data available bit). ata_eh_request_sense executes the REQUEST
    SENSE DATA EXT command to retrieve the sense data of the failed
    command. On success of REQUEST SENSE DATA EXT, the ATA_SENSE bit will
    NOT be set (the command succeeded) but ata_eh_request_sense
    nevertheless tests the availability of sense data by testing that bit
    presence in the result tf of the REQUEST SENSE DATA EXT command. This
    leads us to falsely assume that request sense data failed and to the
    warning message:

    atax.xx: request sense failed stat 50 emask 0

    Upon success of REQUEST SENSE DATA EXT, set the ATA_SENSE bit in the
    result task file command so that sense data can be returned by
    ata_eh_request_sense.

    Signed-off-by: Damien Le Moal
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Damien Le Moal
     
  • commit e0edc8c546463f268d41d064d855bcff994c52fa upstream.

    Marko reports that CX1-JB512-HP shows the same timeout issues as
    CX1-JB256-HP. Let's apply MAX_SEC_128 to all devices in the series.

    Signed-off-by: Tejun Heo
    Reported-by: Marko Koski-Vähälä
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • commit 064c3db9c564cc5be514ac21fb4aa26cc33db746 upstream.

    Here, If devm_ioremap will fail. It will return NULL.
    Then hpriv->base = NULL - 0x20000; Kernel can run into
    a NULL-pointer dereference. This error check will avoid
    NULL pointer dereference.

    Signed-off-by: Arvind Yadav
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Arvind Yadav
     
  • commit 0b3589be9b98994ce3d5aeca52445d1f5627c4ba upstream.

    Andres reported that MMAP2 records for anonymous memory always have
    their protection field 0.

    Turns out, someone daft put the prot/flags generation code in the file
    branch, leaving them unset for anonymous memory.

    Reported-by: Andres Freund
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Don Zickus
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: acme@kernel.org
    Cc: anton@ozlabs.org
    Cc: namhyung@kernel.org
    Fixes: f972eb63b100 ("perf: Pass protection and flags bits through mmap2 interface")
    Link: http://lkml.kernel.org/r/20170126221508.GF6536@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit a76a82a3e38c8d3fb6499e3dfaeb0949241ab588 upstream.

    Dmitry reported a KASAN use-after-free on event->group_leader.

    It turns out there's a hole in perf_remove_from_context() due to
    event_function_call() not calling its function when the task
    associated with the event is already dead.

    In this case the event will have been detached from the task, but the
    grouping will have been retained, such that group operations might
    still work properly while there are live child events etc.

    This does however mean that we can miss a perf_group_detach() call
    when the group decomposes, this in turn can then lead to
    use-after-free.

    Fix it by explicitly doing the group detach if its still required.

    Reported-by: Dmitry Vyukov
    Tested-by: Dmitry Vyukov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: syzkaller
    Fixes: 63b6da39bb38 ("perf: Fix perf_event_exit_task() race")
    Link: http://lkml.kernel.org/r/20170126153955.GD6515@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 11e3b725cfc282efe9d4a354153e99d86a16af08 upstream.

    Update the ARMv8 Crypto Extensions and the plain NEON AES implementations
    in CBC and CTR modes to return the next IV back to the skcipher API client.
    This is necessary for chaining to work correctly.

    Note that for CTR, this is only done if the request is a round multiple of
    the block size, since otherwise, chaining is impossible anyway.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     
  • commit d6040764adcb5cb6de1489422411d701c158bb69 upstream.

    Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with
    the algorithm registration. This fixes qat-dh registration when
    driver is restarted

    Signed-off-by: Salvatore Benedetto
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Salvatore Benedetto
     
  • commit 24bf7ae359b8cca165bb30742d2b1c03a1eb23af upstream.

    Based on the xf86-video-nv code, NFORCE (NV1A) and NFORCE2 (NV1F) have a
    different way of retrieving clocks. See the
    nv_hw.c:nForceUpdateArbitrationSettings function in the original code
    for how these clocks were accessed.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54587
    Signed-off-by: Ilia Mirkin
    Signed-off-by: Ben Skeggs
    Signed-off-by: Greg Kroah-Hartman

    Ilia Mirkin
     
  • commit d347583a39e2df609a9e40c835f72d3614665b53 upstream.

    Store the ELD correctly, not just enough copies of the first byte
    to pad out the given ELD size.

    Signed-off-by: Alastair Bridgewater
    Fixes: 120b0c39c756 ("drm/nv50-/disp: audit and version SOR_HDA_ELD method")
    Reviewed-by: Ilia Mirkin
    Signed-off-by: Ben Skeggs
    Signed-off-by: Greg Kroah-Hartman

    Alastair Bridgewater
     
  • commit 57bcd0a6364cd4eaa362d7ff1777e88ddf501602 upstream.

    Missing check for crtcs present.

    Fixes:
    https://bugzilla.kernel.org/show_bug.cgi?id=193341
    https://bugs.freedesktop.org/show_bug.cgi?id=99387

    Reviewed-by: Christian König
    Signed-off-by: Alex Deucher
    Signed-off-by: Alex Deucher
    Signed-off-by: Greg Kroah-Hartman

    Alex Deucher
     
  • commit cdca06e4e85974d8a3503ab15709dbbaf90d3dd1 upstream.

    According to VLI64 Intel Atom E3800 Specification Update (#329901)
    concurrent read accesses may result in returning 0xffffffff and write
    accesses may be dropped silently.
    To workaround all accesses must be protected by locks.

    Signed-off-by: Alexander Stein
    Acked-by: Mika Westerberg
    Signed-off-by: Linus Walleij
    Signed-off-by: Greg Kroah-Hartman

    Alexander Stein
     
  • commit 8e9faa15469ed7c7467423db4c62aeed3ff4cae3 upstream.

    In case of a zero-length report, the gpio direction_input callback would
    currently return success instead of an errno.

    Fixes: 1ffb3c40ffb5 ("HID: cp2112: make transfer buffers DMA capable")
    Signed-off-by: Johan Hovold
    Reviewed-by: Benjamin Tissoires
    Signed-off-by: Jiri Kosina
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit 7a7b5df84b6b4e5d599c7289526eed96541a0654 upstream.

    A recent commit fixing DMA-buffers on stack added a shared transfer
    buffer protected by a spinlock. This is broken as the USB HID request
    callbacks can sleep. Fix this up by replacing the spinlock with a mutex.

    Fixes: 1ffb3c40ffb5 ("HID: cp2112: make transfer buffers DMA capable")
    Signed-off-by: Johan Hovold
    Reviewed-by: Benjamin Tissoires
    Signed-off-by: Jiri Kosina
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit 4b3e6f2ef3722f1a6a97b6034ed492c1a21fd4ae upstream.

    Commit bf15f86b343ed8 ("xtensa: initialize MMU before jumping to reset
    vector") calls MMU management functions even when CONFIG_MMU is not
    selected. That breaks noMMU build on cores with MMU.

    Don't manage MMU when CONFIG_MMU is not selected.

    Signed-off-by: Max Filippov
    Signed-off-by: Greg Kroah-Hartman

    Max Filippov
     
  • commit c8f325a59cfc718d13a50fbc746ed9b415c25e92 upstream.

    Some AArch64 UEFI implementations disable the MMU in ExitBootServices(),
    after which unaligned accesses to RAM are no longer supported.

    Commit:

    abfb7b686a3e ("efi/libstub/arm*: Pass latest memory map to the kernel")

    fixed an issue in the memory map handling of the stub FDT code, but
    inadvertently created an issue with such firmware, by moving some
    of the FDT manipulation to after the invocation of ExitBootServices().

    Given that the stub's libfdt implementation uses the ordinary, accelerated
    string functions, which rely on hardware handling of unaligned accesses,
    manipulating the FDT with the MMU off may result in alignment faults.

    So fix the situation by moving the update_fdt_memmap() call into the
    callback function invoked by efi_exit_boot_services() right before it
    calls the ExitBootServices() UEFI service (which is arguably a better
    place for it anyway)

    Note that disabling the MMU in ExitBootServices() is not compliant with
    the UEFI spec, and carries great risk due to the fact that switching from
    cached to uncached memory accesses halfway through compiler generated code
    (i.e., involving a stack) can never be done in a way that is architecturally
    safe.

    Fixes: abfb7b686a3e ("efi/libstub/arm*: Pass latest memory map to the kernel")
    Signed-off-by: Ard Biesheuvel
    Tested-by: Riku Voipio
    Cc: mark.rutland@arm.com
    Cc: linux-efi@vger.kernel.org
    Cc: matt@codeblueprint.co.uk
    Cc: leif.lindholm@linaro.org
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1485971102-23330-2-git-send-email-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     
  • commit bf29bddf0417a4783da3b24e8c9e017ac649326f upstream.

    Commit:

    129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")

    stopped creating 1:1 mappings for all RAM, when running in native 64-bit mode.

    It turns out though that there are 64-bit EFI implementations in the wild
    (this particular problem has been reported on a Lenovo Yoga 710-11IKB),
    which still make use of the first physical page for their own private use,
    even though they explicitly mark it EFI_CONVENTIONAL_MEMORY in the memory
    map.

    In case there is no mapping for this particular frame in the EFI pagetables,
    as soon as firmware tries to make use of it, a triple fault occurs and the
    system reboots (in case of the Yoga 710-11IKB this is very early during bootup).

    Fix that by always mapping the first page of physical memory into the EFI
    pagetables. We're free to hand this page to the BIOS, as trim_bios_range()
    will reserve the first page and isolate it away from memory allocators anyway.

    Note that just reverting 129766708 alone is not enough on v4.9-rc1+ to fix the
    regression on affected hardware, as this commit:

    ab72a27da ("x86/efi: Consolidate region mapping logic")

    later made the first physical frame not to be mapped anyway.

    Reported-by: Hanka Pavlikova
    Signed-off-by: Jiri Kosina
    Signed-off-by: Matt Fleming
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Laura Abbott
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vojtech Pavlik
    Cc: Waiman Long
    Cc: linux-efi@vger.kernel.org
    Fixes: 129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
    Link: http://lkml.kernel.org/r/20170127222552.22336-1-matt@codeblueprint.co.uk
    [ Tidied up the changelog and the comment. ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Jiri Kosina
     
  • commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream.

    Ralf Spenneberg reported that he hit a kernel crash when mounting a
    modified ext4 image. And it turns out that kernel crashed when
    calculating fs overhead (ext4_calculate_overhead()), this is because
    the image has very large s_first_meta_bg (debug code shows it's
    842150400), and ext4 overruns the memory in count_overhead() when
    setting bitmap buffer, which is PAGE_SIZE.

    ext4_calculate_overhead():
    buf = get_zeroed_page(GFP_NOFS); 0; j--) {
    Signed-off-by: Eryu Guan
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Signed-off-by: Greg Kroah-Hartman

    Eryu Guan
     
  • commit 030305d69fc6963c16003f50d7e8d74b02d0a143 upstream.

    In a struct pcie_link_state, link->root points to the pcie_link_state of
    the root of the PCIe hierarchy. For the topmost link, this points to
    itself (link->root = link). For others, we copy the pointer from the
    parent (link->root = link->parent->root).

    Previously we recognized that Root Ports originated PCIe hierarchies, but
    we treated PCI/PCI-X to PCIe Bridges as being in the middle of the
    hierarchy, and when we tried to copy the pointer from link->parent->root,
    there was no parent, and we dereferenced a NULL pointer:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
    IP: [] pcie_aspm_init_link_state+0x170/0x820

    Recognize that PCI/PCI-X to PCIe Bridges originate PCIe hierarchies just
    like Root Ports do, so link->root for these devices should also point to
    itself.

    Fixes: 51ebfc92b72b ("PCI: Enumerate switches below PCI-to-PCIe bridges")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=193411
    Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1022181
    Tested-by: lists@ssl-mail.com
    Tested-by: Jayachandran C.
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Greg Kroah-Hartman

    Bjorn Helgaas
     

04 Feb, 2017

12 commits

  • Greg Kroah-Hartman
     
  • commit c364b6d0b6cda1cd5d9ab689489adda3e82529aa upstream.

    In a bmapx call, bmv_count is the total size of the array, including the
    zeroth element that userspace uses to supply the search key. The output
    array starts at offset 1 so that we can set up the user for the next
    invocation. Since we now can split an extent into multiple bmap records
    due to shared/unshared status, we have to be careful that we don't
    overflow the output array.

    In the original patch f86f403794b ("xfs: teach get_bmapx about shared
    extents and the CoW fork") I used cur_ext (the output index) to check
    for overflows, albeit with an off-by-one error. Since nexleft no longer
    describes the number of unfilled slots in the output, we can rip all
    that out and use cur_ext for the overflow check directly.

    Failure to do this causes heap corruption in bmapx callers such as
    xfs_io and xfs_scrub. xfs/328 can reproduce this problem.

    Reviewed-by: Eric Sandeen
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit 2aa6ba7b5ad3189cc27f14540aa2f57f0ed8df4b upstream.

    If we try to allocate memory pages to back an xfs_buf that we're trying
    to read, it's possible that we'll be so short on memory that the page
    allocation fails. For a blocking read we'll just wait, but for
    readahead we simply dump all the pages we've collected so far.

    Unfortunately, after dumping the pages we neglect to clear the
    _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
    thinks that b_pages still points to pages we own. It then double-frees
    the b_pages pages.

    This results in screaming about negative page refcounts from the memory
    manager, which xfs oughtn't be triggering. To reproduce this case,
    mount a filesystem where the size of the inodes far outweighs the
    availalble memory (a ~500M inode filesystem on a VM with 300MB memory
    did the trick here) and run bulkstat in parallel with other memory
    eating processes to put a huge load on the system. The "check summary"
    phase of xfs_scrub also works for this purpose.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Eric Sandeen
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit 493611ebd62673f39e2f52c2561182c558a21cb6 upstream.

    With COW files they are the hotpath, just like for files with the
    extent size hint attribute. We really shouldn't micro-manage anything
    but failure cases with unlikely.

    Additionally Arnd Bergmann recently reported that one of these two
    unlikely annotations causes link failures together with an upcoming
    kernel instrumentation patch, so let's get rid of it ASAP.

    Signed-off-by: Christoph Hellwig
    Reported-by: Arnd Bergmann
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Christoph Hellwig
     
  • commit 5a93790d4e2df73e30c965ec6e49be82fc3ccfce upstream.

    xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
    away a lock cycle in cases where the fork does not exist or is otherwise
    empty. This check is not safe, however, because an attribute fork short
    form to extent format conversion includes a transient state that causes
    the xfs_inode_hasattr() check to fail. Specifically,
    xfs_attr_shortform_to_leaf() creates an empty extent format attribute
    fork and then adds the existing shortform attributes to it.

    This means that lookup of an existing xattr can spuriously return
    -ENOATTR when racing against a setxattr that causes the associated
    format conversion. This was originally reproduced by an untar on a
    particularly configured glusterfs volume, but can also be reproduced on
    demand with properly crafted xattr requests.

    The format conversion occurs under the exclusive ilock. xfs_attr_get()
    and xfs_attr_remove() already have the proper locking and checks further
    down in the functions to handle this situation correctly. Drop the
    unlocked checks to avoid the spurious failure and rely on the existing
    logic.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Brian Foster
     
  • commit 83d230eb5c638949350f4761acdfc0af5cb1bc00 upstream.

    sb_dirblklog is added to sb_blocklog to compute the directory block size
    in bytes. Therefore, we must compare the sum of both those values
    against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit d2b3964a0780d2d2994eba57f950d6c9fe489ed8 upstream.

    Due to the way how xfs_iomap_write_allocate tries to convert the whole
    found extents from delalloc to real space we can run into a race
    condition with multiple threads doing writes to this same extent.
    For the non-COW case that is harmless as the only thing that can happen
    is that we call xfs_bmapi_write on an extent that has already been
    converted to a real allocation. For COW writes where we move the extent
    from the COW to the data fork after I/O completion the race is, however,
    not quite as harmless. In the worst case we are now calling
    xfs_bmapi_write on a region that contains hole in the COW work, which
    will trip up an assert in debug builds or lead to file system corruption
    in non-debug builds. This seems to be reproducible with workloads of
    small O_DSYNC write, although so far I've not managed to come up with
    a with an isolated reproducer.

    The fix for the issue is relatively simple: tell xfs_bmapi_write
    that we are only asked to convert delayed allocations and skip holes
    in that case.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Christoph Hellwig
     
  • commit fd29f7af75b7adf250beccffa63746c6a88e2b74 upstream.

    A harmless warning just got introduced:

    fs/xfs/libxfs/xfs_dir2.h:40:8: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]

    Removing the 'const' modifier avoids the warning and has no
    other effect.

    Fixes: 1fc4d33fed12 ("xfs: replace xfs_mode_to_ftype table with switch statement")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 657bdfb7f5e68ca5e2ed009ab473c429b0d6af85 upstream.

    The GETNEXTQOTA ioctl takes whatever ID is sent in,
    and looks for the next active quota for an user
    equal or higher to that ID.

    But if we are at the maximum ID and then ask for the "next"
    one, we may wrap back to zero. In this case, userspace
    may loop forever, because it will start querying again
    at zero.

    We'll fix this in userspace as well, but for the kernel,
    return -ENOENT if we ask for the next quota ID
    past UINT_MAX so the caller knows to stop.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Eric Sandeen
     
  • commit a324cbf10a3c67aaa10c9f47f7b5801562925bc2 upstream.

    Check for invalid file type in xfs_dinode_verify()
    and fail to load the inode structure from disk.

    Reviewed-by: Darrick J. Wong
    Signed-off-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit fab8eef86c814c3dd46bc5d760b6e4a53d5fc5a6 upstream.

    The helper xfs_dentry_to_name() is used by 2 different
    classes of callers: Callers that pass zero mode and don't care
    about the returned name.type field and Callers that pass
    non zero mode and do care about the name.type field.

    Change xfs_dentry_to_name() to not take the mode argument and
    change the call sites of the first class to not pass the mode
    argument.

    Create a new helper xfs_dentry_mode_to_name() which does pass
    the mode argument and returns -EFSCORRUPTED if mode is invalid.
    Callers that translate non zero mode to on-disk file type now
    check the return value and will export the error to user instead
    of staging an invalid file type to be written to directory entry.

    Signed-off-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     
  • commit 1fc4d33fed124fb182e8e6c214e973a29389ae83.

    The size of the xfs_mode_to_ftype[] conversion table
    was too small to handle an invalid value of mode=S_IFMT.

    Instead of fixing the table size, replace the conversion table
    with a conversion helper that uses a switch statement.

    Suggested-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Amir Goldstein
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein