30 May, 2018

40 commits

  • [ Upstream commit e1ebd0e5b9d0a10ba65e63a3514b6da8c6a5a819 ]

    Current code in power_pmu_disable() does not clear the sampling
    registers like Sampling Instruction Address Register (SIAR) and
    Sampling Data Address Register (SDAR) after disabling the PMU. Since
    these are userspace readable and could contain kernel addresses, add
    code to explicitly clear the content of these registers.

    Also add a "context synchronizing instruction" to enforce no further
    updates to these registers as suggested by Power ISA v3.0B. From
    section 9.4, on page 1108:

    "If an mtspr instruction is executed that changes the value of a
    Performance Monitor register other than SIAR, SDAR, and SIER, the
    change is not guaranteed to have taken effect until after a
    subsequent context synchronizing instruction has been executed (see
    Chapter 11. "Synchronization Requirements for Context Alterations"
    on page 1133)."

    Signed-off-by: Madhavan Srinivasan
    [mpe: Massage change log and add ISA reference]
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • [ Upstream commit bb19af816025d495376bd76bf6fbcf4244f9a06d ]

    The current Branch History Rolling Buffer (BHRB) code does not check
    for any privilege levels before updating the data from BHRB. This
    could leak kernel addresses to userspace even when profiling only with
    userspace privileges. Add proper checks to prevent it.

    Acked-by: Balbir Singh
    Signed-off-by: Madhavan Srinivasan
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Madhavan Srinivasan
     
  • [ Upstream commit 415eb2a1aaa4881cf85bd86c683356fdd8094a23 ]

    pwmX_mode is defined in the ABI as 0=DC mode, 1=pwm mode. The chip
    register bit is set to 1 for DC mode. This got mixed up, and writing
    1 into pwmX_mode resulted in DC mode enabled. Fix it up by using
    the ABI definition throughout the driver for consistency.

    Fixes: 77eb5b3703d99 ("hwmon: (nct6775) Add support for pwm, pwm_mode, ... ")
    Signed-off-by: Guenter Roeck
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Guenter Roeck
     
  • [ Upstream commit b845f66f78bf42a4ce98e5cfe0e94fab41dd0742 ]

    Carlo Pisani noticed that his C3600 workstation behaved unstable during heavy
    I/O on the PCI bus with a VIA VT6421 IDE/SATA PCI card.

    To avoid such instability, this patch switches the LBA PCI bus from Hard Fail
    mode into Soft Fail mode. In this mode the bus will return -1UL for timed out
    MMIO transactions, which is exactly how the x86 (and most other architectures)
    PCI busses behave.

    This patch is based on a proposal by Grant Grundler and Kyle McMartin 10
    years ago:
    https://www.spinics.net/lists/linux-parisc/msg01027.html

    Cc: Carlo Pisani
    Cc: Kyle McMartin
    Reviewed-by: Grant Grundler
    Signed-off-by: Helge Deller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Helge Deller
     
  • [ Upstream commit 9a233bb8025105db9a60b5d761005cc5a6c77f3d ]

    Sometimes iwl_mvm_disable_txq() may be called with mac80211_queue ==
    IEEE80211_INVAL_HW_QUEUE, and this would cause us to use BIT(0xFF)
    which is way too large for the u16 we used to store it in
    hw_queue_to_mac820211. If this happens the following UBSAN warning
    will be generated:

    [ 167.185167] UBSAN: Undefined behaviour in drivers/net/wireless/intel/iwlwifi/mvm/utils.c:838:5
    [ 167.185171] shift exponent 255 is too large for 64-bit type 'long unsigned int'

    Fix that by checking that it is not IEEE80211_INVAL_HW_QUEUE and,
    while at it, add a warning if the queue number is larger than
    IEEE80211_MAX_QUEUES.

    Fixes: 34e10860ae8d ("iwlwifi: mvm: remove references to queue_info in new TX path")
    Reported-by: Paul Menzel
    Signed-off-by: Luca Coelho
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Luca Coelho
     
  • [ Upstream commit f61e64310b75733d782e930d1fb404b84699eed6 ]

    As of commit 205e1b7f51e4 ("dma-mapping: warn when there is no
    coherent_dma_mask") the Freescale FEC driver is issuing the following
    warning on driver initialization on ColdFire systems:

    WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 0x40159e20
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc7-dirty #4
    Stack from 41833dd8:
    41833dd8 40259c53 40025534 40279e26 00000003 00000000 4004e514 41827000
    400255de 40244e42 00000204 40159e20 00000009 00000000 00000000 4024531d
    40159e20 40244e42 00000204 00000000 00000000 00000000 00000007 00000000
    00000000 40279e26 4028d040 40226576 4003ae88 40279e26 418273f6 41833ef8
    7fffffff 418273f2 41867028 4003c9a2 4180ac6c 00000004 41833f8c 4013e71c
    40279e1c 40279e26 40226c16 4013ced2 40279e26 40279e58 4028d040 00000000
    Call Trace:
    [] 0x40025534
    [] 0x4004e514
    [] 0x400255de
    [] 0x40159e20
    [] 0x40159e20

    It is not fatal, the driver and the system continue to function normally.

    As per the warning the coherent_dma_mask is not set on this device.
    There is nothing special about the DMA memory coherency on this hardware
    so we can just set the mask to 32bits in the platform data for the FEC
    ethernet devices.

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Greg Ungerer
     
  • [ Upstream commit 9ad577087165478c9d9be82b15ed9bf2db5835f5 ]

    Since commit 8edc514b01e9 ("intel_th: Make SOURCE devices children of the
    root device") the hub is not the parent of SOURCE devices any more, so the
    new helper function should be used for that instead of always using the
    parent. The intel_th_set_output() path, however, still uses the old
    logic, leading to the hub driver structure being aliased with something
    else, like struct pci_driver or struct acpi_driver, and an incorrect call
    to an address inferred from that, potentially resulting in a crash.

    Fixes: 8edc514b01e9 ("intel_th: Make SOURCE devices children of the root device")
    Signed-off-by: Alexander Shishkin
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexander Shishkin
     
  • [ Upstream commit 39ffe39545cd5cb5b8cee9f0469165cf24dc62c2 ]

    find_dev_data() does not check whether the return value alloc_dev_data()
    is NULL. This was okay once because the pointer was returned once as-is.
    Since commit df3f7a6e8e85 ("iommu/amd: Use is_attach_deferred
    call-back") the pointer may be used within find_dev_data() so a NULL
    check is required.

    Cc: Baoquan He
    Fixes: df3f7a6e8e85 ("iommu/amd: Use is_attach_deferred call-back")
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Joerg Roedel
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Andrzej Siewior
     
  • [ Upstream commit 8ebee73b574ad3dd1f14d461f65ceaffbd637650 ]

    This patch fixes regression caused by 0c317a02ca98
    ("cfg80211: support virtual interfaces with different beacon intervals"),
    with this change cfg80211 expects the driver to advertize
    'beacon_int_min_gcd' to support different beacon intervals in multivap
    scenario. This support is added for, QCA988X/QCA99X0/QCA9984/QCA4019.

    Verifed AP + mesh bring up on QCA9984 with beacon interval 100msec and
    1000msec respectively.
    Frimware: firmware-5.bin_10.4-3.5.3-00053

    Fixes: 0c317a02ca98 ("cfg80211: support virtual interfaces with different beacon intervals")
    Signed-off-by: Anilkumar Kolli
    Signed-off-by: Kalle Valo
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Anilkumar Kolli
     
  • [ Upstream commit 86674a97f5055f4c7f406563408096e8cf9364ff ]

    In ca8210_test_int_user_write() a user can request the transfer of a
    frame with a length field (command.length) that is longer than the
    actual buffer provided (len). In this scenario the driver will copy
    the buffer contents into the uninitialised command[] buffer, then
    transfer bytes over the SPI even though only bytes
    had been populated, potentially leaking sensitive kernel memory.

    Also the first 6 bytes of the command buffer must be initialised in case
    a malformed, short packet is written and the uninitialised bytes are
    read in ca8210_test_check_upstream.

    Reported-by: Domen Puncer Kugler
    Signed-off-by: Harry Morris
    Tested-by: Harry Morris
    Signed-off-by: Stefan Schmidt
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Harry Morris
     
  • [ Upstream commit 0834d627fbea00c1444075eb3e448e1974da452d ]

    In mpic_physmask() we loop over all CPUs up to 32, then get the hard
    SMP processor id of that CPU.

    Currently that's possibly walking off the end of the paca array, but
    in a future patch we will change the paca array to be an array of
    pointers, and in that case we will get a NULL for missing CPUs and
    oops. eg:

    Unable to handle kernel paging request for data at address 0x88888888888888b8
    Faulting instruction address: 0xc00000000004e380
    Oops: Kernel access of bad area, sig: 11 [#1]
    ...
    NIP .mpic_set_affinity+0x60/0x1a0
    LR .irq_do_set_affinity+0x48/0x100

    Fix it by checking the CPU is possible, this also fixes the code if
    there are gaps in the CPU numbering which probably never happens on
    mpic systems but who knows.

    Debugged-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • [ Upstream commit 8b29d29abc484d638213dd79a18a95ae7e5bb402 ]

    Fix once per second (round_robin_time) memory leak of about 1 KB in
    each acpi_pad kernel idling thread that is activated.

    Found by testing with kmemleak.

    Signed-off-by: Lenny Szubowicz
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Lenny Szubowicz
     
  • [ Upstream commit e283655b5abe26462d53d5196f186c5e8863af3b ]

    We should zero an array using sizeof instead of number of elements.

    Fixes the following compiler (GCC 7.3.0) warnings:

    drivers/macintosh/rack-meter.c: In function 'rackmeter_do_pause':
    drivers/macintosh/rack-meter.c:157:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
    drivers/macintosh/rack-meter.c:158:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]

    Fixes: 4f7bef7a9f69 ("drivers: macintosh: rack-meter: fix bogus memsets")
    Reported-by: Stephen Rothwell
    Signed-off-by: Aaro Koskinen
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Aaro Koskinen
     
  • [ Upstream commit c37a3c94775855567b90f91775b9691e10bd2806 ]

    If acpi_id is == nr_acpi_bits, then we access one element beyond the end
    of the acpi_psd[] array or we set one bit beyond the end of the bit map
    when we do __set_bit(acpi_id, acpi_id_present);

    Fixes: 59a568029181 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Joao Martins
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit 57b0c9d49b94bbeb53649b7fbd264603c1ebd585 ]

    If a call-level abort is received for the previous call to complete on a
    connection channel, then that abort is queued for the connection processor
    to handle. Unfortunately, the connection processor then assumes without
    checking that the abort is connection-level (ie. callNumber is 0) and
    distributes it over all active calls on that connection, thereby
    incorrectly aborting them.

    Fix this by discarding aborts aimed at a completed call.

    Further, discard all packets aimed at a call that's complete if there's
    currently an active call on a channel, since the DATA packets associated
    with the new call automatically terminate the old call.

    Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
    Reported-by: Marc Dionne
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 03877bf6a30cca7d4bc3ffabd3c3e9464a7a1a19 ]

    rxrpc calls have a ring of packets that are awaiting ACK or retransmission
    and a parallel ring of annotations that tracks the state of those packets.
    If the initial transmission of a packet on the underlying UDP socket fails
    then the packet annotation is marked for resend - but the setting of this
    mark accidentally erases the last-packet mark also stored in the same
    annotation slot. If this happens, a call won't switch out of the Tx phase
    when all the packets have been transmitted.

    Fix this by retaining the last-packet mark and only altering the packet
    state.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • …created with quota enabled

    [ Upstream commit 4d31778aa2fa342f5f92ca4025b293a1729161d1 ]

    When multiple pending snapshots referring to the same source subvolume
    are executed, enabled quota will cause root item corruption, where root
    items are using old bytenr (no backref in extent tree).

    This can be triggered by fstests btrfs/152.

    The cause is when source subvolume is still dirty, extra commit
    (simplied transaction commit) of qgroup_account_snapshot() can skip
    dirty roots not recorded in current transaction, making root item of
    source subvolume not updated.

    Fix it by forcing recording source subvolume in current transaction
    before qgroup sub-transaction commit.

    Reported-by: Justin Maggard <jmaggard@netgear.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

    Qu Wenruo
     
  • [ Upstream commit 8a5a916d9a35e13576d79cc16e24611821b13e34 ]

    While running btrfs/011, I hit the following lockdep splat.

    This is the important bit:
    pcpu_alloc+0x1ac/0x5e0
    __percpu_counter_init+0x4e/0xb0
    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
    resolve_indirect_refs+0x130/0x830 [btrfs]
    find_parent_nodes+0x69e/0xff0 [btrfs]
    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
    btrfs_find_all_roots+0x50/0x70 [btrfs]
    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]

    The percpu_counter_init call in btrfs_alloc_subvolume_writers
    uses GFP_KERNEL, which we can't do during transaction commit.

    This switches it to GFP_NOFS.

    ========================================================
    WARNING: possible irq lock inversion dependency detected
    4.12.14-kvmsmall #8 Tainted: G W
    --------------------------------------------------------
    kswapd0/50 just changed the state of lock:
    (&delayed_node->mutex){+.+.-.}, at: [] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    but this lock took another, RECLAIM_FS-unsafe lock in the past:
    (pcpu_alloc_mutex){+.+.+.}

    and interrupts could create inverse lock ordering between them.

    other info that might help us debug this:
    Chain exists of:
    &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(pcpu_alloc_mutex);
    local_irq_disable();
    lock(&delayed_node->mutex);
    lock(&found->groups_sem);

    lock(&delayed_node->mutex);

    *** DEADLOCK ***

    2 locks held by kswapd0/50:
    #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x7f/0x5b0
    #1: (&type->s_umount_key#30){+++++.}, at: [] trylock_super+0x16/0x50

    the shortest dependencies between 2nd lock and 1st lock:
    -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
    HARDIRQ-ON-W at:
    __mutex_lock+0x4e/0x8c0
    pcpu_alloc+0x1ac/0x5e0
    alloc_kmem_cache_cpus.isra.70+0x25/0xa0
    __do_tune_cpucache+0x2c/0x220
    do_tune_cpucache+0x26/0xc0
    enable_cpucache+0x6d/0xf0
    kmem_cache_init_late+0x42/0x75
    start_kernel+0x343/0x4cb
    x86_64_start_kernel+0x127/0x134
    secondary_startup_64+0xa5/0xb0
    SOFTIRQ-ON-W at:
    __mutex_lock+0x4e/0x8c0
    pcpu_alloc+0x1ac/0x5e0
    alloc_kmem_cache_cpus.isra.70+0x25/0xa0
    __do_tune_cpucache+0x2c/0x220
    do_tune_cpucache+0x26/0xc0
    enable_cpucache+0x6d/0xf0
    kmem_cache_init_late+0x42/0x75
    start_kernel+0x343/0x4cb
    x86_64_start_kernel+0x127/0x134
    secondary_startup_64+0xa5/0xb0
    RECLAIM_FS-ON-W at:
    __kmalloc+0x47/0x310
    pcpu_extend_area_map+0x2b/0xc0
    pcpu_alloc+0x3ec/0x5e0
    alloc_kmem_cache_cpus.isra.70+0x25/0xa0
    __do_tune_cpucache+0x2c/0x220
    do_tune_cpucache+0x26/0xc0
    enable_cpucache+0x6d/0xf0
    __kmem_cache_create+0x1bf/0x390
    create_cache+0xba/0x1b0
    kmem_cache_create+0x1f8/0x2b0
    ksm_init+0x6f/0x19d
    do_one_initcall+0x50/0x1b0
    kernel_init_freeable+0x201/0x289
    kernel_init+0xa/0x100
    ret_from_fork+0x3a/0x50
    INITIAL USE at:
    __mutex_lock+0x4e/0x8c0
    pcpu_alloc+0x1ac/0x5e0
    alloc_kmem_cache_cpus.isra.70+0x25/0xa0
    setup_cpu_cache+0x2f/0x1f0
    __kmem_cache_create+0x1bf/0x390
    create_boot_cache+0x8b/0xb1
    kmem_cache_init+0xa1/0x19e
    start_kernel+0x270/0x4cb
    x86_64_start_kernel+0x127/0x134
    secondary_startup_64+0xa5/0xb0
    }
    ... key at: [] pcpu_alloc_mutex+0x70/0xa0
    ... acquired at:
    pcpu_alloc+0x1ac/0x5e0
    __percpu_counter_init+0x4e/0xb0
    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
    resolve_indirect_refs+0x130/0x830 [btrfs]
    find_parent_nodes+0x69e/0xff0 [btrfs]
    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
    btrfs_find_all_roots+0x50/0x70 [btrfs]
    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
    transaction_kthread+0x176/0x1b0 [btrfs]
    kthread+0x102/0x140
    ret_from_fork+0x3a/0x50

    -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
    HARDIRQ-ON-W at:
    down_write+0x3e/0xa0
    cache_block_group+0x287/0x420 [btrfs]
    find_free_extent+0x106c/0x12d0 [btrfs]
    btrfs_reserve_extent+0xd8/0x170 [btrfs]
    cow_file_range.isra.66+0x133/0x470 [btrfs]
    run_delalloc_range+0x121/0x410 [btrfs]
    writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
    __extent_writepage+0x19a/0x360 [btrfs]
    extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
    extent_writepages+0x4d/0x60 [btrfs]
    do_writepages+0x1a/0x70
    __filemap_fdatawrite_range+0xa7/0xe0
    btrfs_rename+0x5ee/0xdb0 [btrfs]
    vfs_rename+0x52a/0x7e0
    SyS_rename+0x351/0x3b0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    HARDIRQ-ON-R at:
    down_read+0x35/0x90
    caching_thread+0x57/0x560 [btrfs]
    normal_work_helper+0x1c0/0x5e0 [btrfs]
    process_one_work+0x1e0/0x5c0
    worker_thread+0x44/0x390
    kthread+0x102/0x140
    ret_from_fork+0x3a/0x50
    SOFTIRQ-ON-W at:
    down_write+0x3e/0xa0
    cache_block_group+0x287/0x420 [btrfs]
    find_free_extent+0x106c/0x12d0 [btrfs]
    btrfs_reserve_extent+0xd8/0x170 [btrfs]
    cow_file_range.isra.66+0x133/0x470 [btrfs]
    run_delalloc_range+0x121/0x410 [btrfs]
    writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
    __extent_writepage+0x19a/0x360 [btrfs]
    extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
    extent_writepages+0x4d/0x60 [btrfs]
    do_writepages+0x1a/0x70
    __filemap_fdatawrite_range+0xa7/0xe0
    btrfs_rename+0x5ee/0xdb0 [btrfs]
    vfs_rename+0x52a/0x7e0
    SyS_rename+0x351/0x3b0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-R at:
    down_read+0x35/0x90
    caching_thread+0x57/0x560 [btrfs]
    normal_work_helper+0x1c0/0x5e0 [btrfs]
    process_one_work+0x1e0/0x5c0
    worker_thread+0x44/0x390
    kthread+0x102/0x140
    ret_from_fork+0x3a/0x50
    INITIAL USE at:
    down_write+0x3e/0xa0
    cache_block_group+0x287/0x420 [btrfs]
    find_free_extent+0x106c/0x12d0 [btrfs]
    btrfs_reserve_extent+0xd8/0x170 [btrfs]
    cow_file_range.isra.66+0x133/0x470 [btrfs]
    run_delalloc_range+0x121/0x410 [btrfs]
    writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
    __extent_writepage+0x19a/0x360 [btrfs]
    extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
    extent_writepages+0x4d/0x60 [btrfs]
    do_writepages+0x1a/0x70
    __filemap_fdatawrite_range+0xa7/0xe0
    btrfs_rename+0x5ee/0xdb0 [btrfs]
    vfs_rename+0x52a/0x7e0
    SyS_rename+0x351/0x3b0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    }
    ... key at: [] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
    ... acquired at:
    cache_block_group+0x287/0x420 [btrfs]
    find_free_extent+0x106c/0x12d0 [btrfs]
    btrfs_reserve_extent+0xd8/0x170 [btrfs]
    btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
    btrfs_create_tree+0xbb/0x2a0 [btrfs]
    btrfs_create_uuid_tree+0x37/0x140 [btrfs]
    open_ctree+0x23c0/0x2660 [btrfs]
    btrfs_mount+0xd36/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    btrfs_mount+0x18c/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    do_mount+0x1c1/0xcc0
    SyS_mount+0x7e/0xd0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    -> (&found->groups_sem){++++..} ops: 2134587 {
    HARDIRQ-ON-W at:
    down_write+0x3e/0xa0
    __link_block_group+0x34/0x130 [btrfs]
    btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
    open_ctree+0x2054/0x2660 [btrfs]
    btrfs_mount+0xd36/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    btrfs_mount+0x18c/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    do_mount+0x1c1/0xcc0
    SyS_mount+0x7e/0xd0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    HARDIRQ-ON-R at:
    down_read+0x35/0x90
    btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
    open_ctree+0x207b/0x2660 [btrfs]
    btrfs_mount+0xd36/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    btrfs_mount+0x18c/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    do_mount+0x1c1/0xcc0
    SyS_mount+0x7e/0xd0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-W at:
    down_write+0x3e/0xa0
    __link_block_group+0x34/0x130 [btrfs]
    btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
    open_ctree+0x2054/0x2660 [btrfs]
    btrfs_mount+0xd36/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    btrfs_mount+0x18c/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    do_mount+0x1c1/0xcc0
    SyS_mount+0x7e/0xd0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-R at:
    down_read+0x35/0x90
    btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
    open_ctree+0x207b/0x2660 [btrfs]
    btrfs_mount+0xd36/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    btrfs_mount+0x18c/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    do_mount+0x1c1/0xcc0
    SyS_mount+0x7e/0xd0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    INITIAL USE at:
    down_write+0x3e/0xa0
    __link_block_group+0x34/0x130 [btrfs]
    btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
    open_ctree+0x2054/0x2660 [btrfs]
    btrfs_mount+0xd36/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    btrfs_mount+0x18c/0xf90 [btrfs]
    mount_fs+0x3a/0x160
    vfs_kern_mount+0x66/0x150
    do_mount+0x1c1/0xcc0
    SyS_mount+0x7e/0xd0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    }
    ... key at: [] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
    ... acquired at:
    find_free_extent+0xcb4/0x12d0 [btrfs]
    btrfs_reserve_extent+0xd8/0x170 [btrfs]
    btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
    __btrfs_cow_block+0x110/0x5b0 [btrfs]
    btrfs_cow_block+0xd7/0x290 [btrfs]
    btrfs_search_slot+0x1f6/0x960 [btrfs]
    btrfs_lookup_inode+0x2a/0x90 [btrfs]
    __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
    btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
    btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
    evict+0xc4/0x190
    __dentry_kill+0xbf/0x170
    dput+0x2ae/0x2f0
    SyS_rename+0x2a6/0x3b0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
    HARDIRQ-ON-W at:
    __mutex_lock+0x4e/0x8c0
    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
    btrfs_update_inode+0x83/0x110 [btrfs]
    btrfs_dirty_inode+0x62/0xe0 [btrfs]
    touch_atime+0x8c/0xb0
    do_generic_file_read+0x818/0xb10
    __vfs_read+0xdc/0x150
    vfs_read+0x8a/0x130
    SyS_read+0x45/0xa0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-W at:
    __mutex_lock+0x4e/0x8c0
    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
    btrfs_update_inode+0x83/0x110 [btrfs]
    btrfs_dirty_inode+0x62/0xe0 [btrfs]
    touch_atime+0x8c/0xb0
    do_generic_file_read+0x818/0xb10
    __vfs_read+0xdc/0x150
    vfs_read+0x8a/0x130
    SyS_read+0x45/0xa0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    IN-RECLAIM_FS-W at:
    __mutex_lock+0x4e/0x8c0
    __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    btrfs_evict_inode+0x22c/0x6a0 [btrfs]
    evict+0xc4/0x190
    dispose_list+0x35/0x50
    prune_icache_sb+0x42/0x50
    super_cache_scan+0x139/0x190
    shrink_slab+0x262/0x5b0
    shrink_node+0x2eb/0x2f0
    kswapd+0x2eb/0x890
    kthread+0x102/0x140
    ret_from_fork+0x3a/0x50
    INITIAL USE at:
    __mutex_lock+0x4e/0x8c0
    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
    btrfs_update_inode+0x83/0x110 [btrfs]
    btrfs_dirty_inode+0x62/0xe0 [btrfs]
    touch_atime+0x8c/0xb0
    do_generic_file_read+0x818/0xb10
    __vfs_read+0xdc/0x150
    vfs_read+0x8a/0x130
    SyS_read+0x45/0xa0
    do_syscall_64+0x79/0x1e0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    }
    ... key at: [] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
    ... acquired at:
    __lock_acquire+0x264/0x11c0
    lock_acquire+0xbd/0x1e0
    __mutex_lock+0x4e/0x8c0
    __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    btrfs_evict_inode+0x22c/0x6a0 [btrfs]
    evict+0xc4/0x190
    dispose_list+0x35/0x50
    prune_icache_sb+0x42/0x50
    super_cache_scan+0x139/0x190
    shrink_slab+0x262/0x5b0
    shrink_node+0x2eb/0x2f0
    kswapd+0x2eb/0x890
    kthread+0x102/0x140
    ret_from_fork+0x3a/0x50

    stack backtrace:
    CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased)
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
    dump_stack+0x78/0xb7
    print_irq_inversion_bug.part.38+0x19f/0x1aa
    check_usage_forwards+0x102/0x120
    ? ret_from_fork+0x3a/0x50
    ? check_usage_backwards+0x110/0x110
    mark_lock+0x16c/0x270
    __lock_acquire+0x264/0x11c0
    ? pagevec_lookup_entries+0x1a/0x30
    ? truncate_inode_pages_range+0x2b3/0x7f0
    lock_acquire+0xbd/0x1e0
    ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    __mutex_lock+0x4e/0x8c0
    ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
    __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
    btrfs_evict_inode+0x22c/0x6a0 [btrfs]
    evict+0xc4/0x190
    dispose_list+0x35/0x50
    prune_icache_sb+0x42/0x50
    super_cache_scan+0x139/0x190
    shrink_slab+0x262/0x5b0
    shrink_node+0x2eb/0x2f0
    kswapd+0x2eb/0x890
    kthread+0x102/0x140
    ? mem_cgroup_shrink_node+0x2c0/0x2c0
    ? kthread_create_on_node+0x40/0x40
    ret_from_fork+0x3a/0x50

    Signed-off-by: Jeff Mahoney
    Reviewed-by: Liu Bo
    Signed-off-by: David Sterba

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jeff Mahoney
     
  • [ Upstream commit 8434ec46c6e3232cebc25a910363b29f5c617820 ]

    When logging an inode, at tree-log.c:copy_items(), if we call
    btrfs_next_leaf() at the loop which checks for the need to log holes, we
    need to make sure copy_items() returns the value 1 to its caller and
    not 0 (on success). This is because the path the caller passed was
    released and is now different from what is was before, and the caller
    expects a return value of 0 to mean both success and that the path
    has not changed, while a return value of 1 means both success and
    signals the caller that it can not reuse the path, it has to perform
    another tree search.

    Even though this is a case that should not be triggered on normal
    circumstances or very rare at least, its consequences can be very
    unpredictable (especially when replaying a log tree).

    Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • [ Upstream commit 3c0efdf03b2d127f0e40e30db4e7aa0429b1b79a ]

    The extent tree of the test fs is like the following:

    BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
    item 0 key (4096 168 4096) itemoff 3944 itemsize 51
    extent refs 1 gen 1 flags 2
    tree block key (68719476736 0 0) level 1
    ^^^^^^^
    ref#0: tree block backref root 5

    And it's using an empty tree for fs tree, so there is no way that its
    level can be 1.

    For REAL (created by mkfs) fs tree backref with no skinny metadata, the
    result should look like:

    item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
    refs 1 gen 4 flags TREE_BLOCK
    tree block key (256 INODE_ITEM 0) level 0
    ^^^^^^^
    tree block backref root 5

    Fix the level to 0, so it won't break later tree level checker.

    Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     
  • [ Upstream commit d40b6768e45bd9213139b2d91d30c7692b6007b1 ]

    system_reset_exception does most of its own crash handling now,
    invoking the debugger or crash dumps if they are registered. If not,
    then it goes through to die() to print stack traces, and then is
    supposed to panic (according to comments).

    However after die() prints oopses, it does its own handling which
    doesn't allow system_reset_exception to panic (e.g., it may just
    kill the current process). This patch causes sreset exceptions to
    return from die after it prints messages but before acting.

    This also stops die from invoking the debugger on 0x100 crashes.
    system_reset_exception similarly calls the debugger. It had been
    thought this was harmless (because if the debugger was disabled,
    neither call would fire, and if it was enabled the first call
    would return). However in some cases like xmon 'X' command, the
    debugger returns 0, which currently causes it to be entered
    again (first in system_reset_exception, then in die), which is
    confusing.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Nicholas Piggin
     
  • [ Upstream commit 16a1c0646e55c3345bce8e4edfc06ad119d27c04 ]

    All the members: base, idm_base and nicpm_base should be annotated with
    __iomem since they are pointers to register space. This fixes a bunch of
    sparse reported warnings.

    Fixes: f6a95a24957a ("net: ethernet: bgmac: Add platform device support")
    Fixes: dd5c5d037f5e ("net: ethernet: bgmac: add NS2 support")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit 60d6e6f0b9e422dd01aeda39257ee0428e5e2a3f ]

    bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian
    32-bit word without using proper accessors, fix this, and because a
    length cannot be negative, use unsigned int while at it.

    Fixes: 9cde94506eac ("bgmac: implement scatter/gather support")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit d13864b68e41c11e4231de90cf358658f6ecea45 ]

    This avoids a lot of -Wunused warnings such as:

    ====================
    kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
    ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
    #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))

    ./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
    #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
    ^~~~
    kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
    atomic_xchg(&kgdb_active, cpu);
    ^~~~~~~~~~~
    ====================

    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David S. Miller
     
  • [ Upstream commit 2c98425720233ae3e135add0c7e869b32913502f ]

    If the fscache asynchronous write operation elects to discard a page that's
    pending storage to the cache because the page would be over the store limit
    then it needs to wake the page as someone may be waiting on completion of
    the write.

    The problem is that the store limit may be updated by a different
    asynchronous operation - and so may miss the write - and that the store
    limit may not even get updated until later by the netfs.

    Fix the kernel hang by making fscache_write_op() mark as written any pages
    that are over the limit.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 92571a1aae40d291158d16e7142637908220f470 ]

    When using wicked with a lan78xx device attached to the system, we
    end up with ethtool commands issued on the device before an ifup
    got issued. That lead to the following crash:

    Unable to handle kernel NULL pointer dereference at virtual address 0000039c
    pgd = ffff800035b30000
    [0000039c] *pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Modules linked in: [...]
    Supported: Yes
    CPU: 3 PID: 638 Comm: wickedd Tainted: G E 4.12.14-0-default #1
    Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
    task: ffff800035e74180 task.stack: ffff800036718000
    PC is at phy_ethtool_ksettings_get+0x20/0x98
    LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
    pc : [] lr : [] pstate: 20000005
    sp : ffff80003671bb20
    x29: ffff80003671bb20 x28: ffff800035e74180
    x27: ffff000008912000 x26: 000000000000001d
    x25: 0000000000000124 x24: ffff000008f74d00
    x23: 0000004000114809 x22: 0000000000000000
    x21: ffff80003671bbd0 x20: 0000000000000000
    x19: ffff80003671bbd0 x18: 000000000000040d
    x17: 0000000000000001 x16: 0000000000000000
    x15: 0000000000000000 x14: ffffffffffffffff
    x13: 0000000000000000 x12: 0000000000000020
    x11: 0101010101010101 x10: fefefefefefefeff
    x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
    x7 : 0000000080808080 x6 : ffff80003671bc9c
    x5 : ffff80003671b9f8 x4 : ffff80002c296190
    x3 : 0000000000000000 x2 : 0000000000000000
    x1 : ffff80003671bbd0 x0 : ffff80003671bc00
    Process wickedd (pid: 638, stack limit = 0xffff800036718000)
    Call trace:
    Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
    b9e0: ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
    ba00: ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
    ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
    ba40: 0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
    ba60: 0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
    ba80: 0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
    baa0: ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
    bac0: ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
    bae0: ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
    bb00: 0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
    [] phy_ethtool_ksettings_get+0x20/0x98
    [] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
    [] ethtool_get_settings+0x68/0x210
    [] dev_ethtool+0x214/0x2180
    [] dev_ioctl+0x400/0x630
    [] sock_do_ioctl+0x70/0x88
    [] sock_ioctl+0x208/0x368
    [] do_vfs_ioctl+0xb0/0x848
    [] SyS_ioctl+0x8c/0xa8
    Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
    bec0: 0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
    bee0: 0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
    bf00: 000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
    bf20: 0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
    bf40: 0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
    bf60: 0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
    bf80: 0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
    bfa0: 0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
    bfc0: 0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
    bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

    The culprit is quite simple: The driver tries to access the phy left and right,
    but only actually has a working reference to it when the device is up.

    The fix thus is quite simple too: Get a reference to the phy on probe already
    and keep it even when the device is going down.

    With this patch applied, I can successfully run wicked on my system and bring
    the interface up and down as many times as I want, without getting NULL pointer
    dereferences in between.

    Signed-off-by: Alexander Graf
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexander Graf
     
  • [ Upstream commit add5ff7a216ee545a214013f26d1ef2f44a9c9f8 ]

    Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
    an exception in Protected Mode while emulating guest due to invalid
    guest state. Unlike Big RM, KVM doesn't support emulating exceptions
    in PM, i.e. PM exceptions are always injected via the VMCS. Because
    we will never do VMRESUME due to emulation_required, the exception is
    never realized and we'll keep emulating the faulting instruction over
    and over until we receive a signal.

    Exit to userspace iff there is a pending exception, i.e. don't exit
    simply on a requested event. The purpose of this check and exit is to
    aid in debugging a guest that is in all likelihood already doomed.
    Invalid guest state in PM is extremely limited in normal operation,
    e.g. it generally only occurs for a few instructions early in BIOS,
    and any exception at this time is all but guaranteed to be fatal.
    Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
    handled/emulated, while checking for vectored interrupts, e.g. INTR
    and NMI, without hitting false positives would add a fair amount of
    complexity for almost no benefit (getting hit by lightning seems
    more likely than encountering this specific scenario).

    Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
    exception via the VMCS and emulation_required is true.

    Signed-off-by: Sean Christopherson
    Signed-off-by: Radim Krčmář
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • …cpu_has() in build_cr3_noflush()

    [ Upstream commit 162ee5a8ab49be40d253f90e94aef712470a3a24 ]

    Linus reported the following boot warning:

    WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170
    [...]
    Call Trace:
    switch_mm_irqs_off+0x267/0x590
    switch_mm+0xe/0x20
    efi_switch_mm+0x3e/0x50
    efi_enter_virtual_mode+0x43f/0x4da
    start_kernel+0x3bf/0x458
    secondary_startup_64+0xa5/0xb0

    ... after merging:

    03781e40890c: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

    When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled,
    build_cr3_noflush() (called via switch_mm()) does a sanity check to see
    if X86_FEATURE_PCID is set.

    Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to
    perform the check but this_cpu_has() works only after SMP is initialized
    (i.e. per cpu cpu_info's should be populated) and this happens to be very
    late in the boot process (during rest_init()).

    As efi_runtime_services() are called during (early) kernel boot time
    and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
    time. As suggested by Dave Hansen, this should be OK because all CPU's have
    same capabilities on x86.

    With this change the warning is fixed.

    ( Dave also suggested that we put a warning in this_cpu_has() if it's used
    early in the boot process. This is still work in progress as it affects
    MCE. )

    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Lee Chun-Yi <jlee@suse.com>
    Cc: Matt Fleming <matt@codeblueprint.co.uk>
    Cc: Michael S. Tsirkin <mst@redhat.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ravi Shankar <ravi.v.shankar@intel.com>
    Cc: Ricardo Neri <ricardo.neri@intel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

    Sai Praneeth
     
  • [ Upstream commit d29a20645d5e929aa7e8616f28e5d8e1c49263ec ]

    While running rt-tests' pi_stress program I got the following splat:

    rq->clock_update_flags < RQCF_ACT_SKIP
    WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20

    [...]


    enqueue_top_rt_rq+0xf4/0x150
    ? cpufreq_dbs_governor_start+0x170/0x170
    sched_rt_rq_enqueue+0x65/0x80
    sched_rt_period_timer+0x156/0x360
    ? sched_rt_rq_enqueue+0x80/0x80
    __hrtimer_run_queues+0xfa/0x260
    hrtimer_interrupt+0xcb/0x220
    smp_apic_timer_interrupt+0x62/0x120
    apic_timer_interrupt+0xf/0x20

    [...]

    do_idle+0x183/0x1e0
    cpu_startup_entry+0x5f/0x70
    start_secondary+0x192/0x1d0
    secondary_startup_64+0xa5/0xb0

    We can get rid of it be the "traditional" means of adding an
    update_rq_clock() call after acquiring the rq->lock in
    do_sched_rt_period_timer().

    The case for the RT task throttling (which this workload also hits)
    can be ignored in that the skip_update call is actually bogus and
    quite the contrary (the request bits are removed/reverted).

    By setting RQCF_UPDATED we really don't care if the skip is happening
    or not and will therefore make the assert_clock_updated() check happy.

    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Matt Fleming
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: linux-kernel@vger.kernel.org
    Cc: rostedt@goodmis.org
    Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • [ Upstream commit c1b25a17d24925b0961c319cfc3fd7e1dc778914 ]

    POWER8 restores AMOR when waking from deep sleep, but POWER9 does not,
    because it does not go through the subcore restore.

    Have POWER9 restore it in core restore.

    Fixes: ee97b6b99f42 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0")
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Nicholas Piggin
     
  • [ Upstream commit bb34f24c7d2c98d0c81838a7700e6068325b17a0 ]

    We should not handle migrate lockres if we are already in
    'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
    dlm domain. At last other nodes will get stuck into infinite loop when
    requsting lock from us.

    The problem is caused by concurrency umount between nodes. Before
    receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
    migrate target. So N2 will continue sending lockres to N1 even though
    N1 has left domain.

    N1 N2 (owner)
    touch file

    access the file,
    and get pr lock

    begin leave domain and
    pick up N1 as new owner

    begin leave domain and
    migrate all lockres done

    begin migrate lockres to N1

    end leave domain, but
    the lockres left
    unexpectedly, because
    migrate task has passed

    [piaojun@huawei.com: v3]
    Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
    Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
    Signed-off-by: Jun Piao
    Reviewed-by: Yiwen Jiang
    Reviewed-by: Joseph Qi
    Reviewed-by: Changwei Ge
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jun Piao
     
  • [ Upstream commit efc365e7290d040fbd43f60b0e97653489a739d4 ]

    On ppc64le arch rxe_add command causes oops in kernel log:

    [ 92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
    [ 92.499710] SMP NR_CPUS=2048 NUMA pSeries
    [ 92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
    _nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
    nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
    b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
    m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
    generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
    a2xxx(E)
    [ 92.499832] scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
    [ 92.499834] Supported: No, Unsupported modules are loaded
    [ 92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G OE NX 4.4.120-ttln.17-default #1
    [ 92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
    [ 92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
    [ 92.499844] REGS: c0000000bebab750 TRAP: 0300 Tainted: G OE NX (4.4.120-ttln.17-default)
    [ 92.499850] MSR: 8000000000009033 CR: 28424428 XER: 20000000
    [ 92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
    GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
    GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
    GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
    GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
    GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
    GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
    GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
    GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
    [ 92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
    [ 92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
    [ 92.499886] Call Trace:
    [ 92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
    [ 92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
    [ 92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
    [ 92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
    [ 92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
    [ 92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
    [ 92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
    [ 92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
    [ 92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
    [ 92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
    [ 92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
    [ 92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
    [ 92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
    [ 92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
    [ 92.499949] Instruction dump:
    [ 92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
    [ 92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 7c7e1b78 2fa90000 419e0078
    [ 92.499962] ---[ end trace bed077e15eb420cf ]---

    It fails in dma_get_required_mask, that has ppc-specific implementation,
    and fail if provided device argument is NULL

    Signed-off-by: Mikhail Malygin
    Reviewed-by: Yonatan Cohen
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mikhail Malygin
     
  • [ Upstream commit 1e1c50a929bc9e49bc3f9935b92450d9e69f8158 ]

    do_chunk_alloc implements a loop checking whether there is a pending
    chunk allocation and if so causes the caller do loop. Generally this
    loop is executed only once, however testing with btrfs/072 on a single
    core vm machines uncovered an extreme case where the system could loop
    indefinitely. This is due to a missing cond_resched when loop which
    doesn't give a chance to the previous chunk allocator finish its job.

    The fix is to simply add the missing cond_resched.

    Fixes: 6d74119f1a3e ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     
  • [ Upstream commit 80c0b4210a963e31529e15bf90519708ec947596 ]

    0, 1 and nodes[0] could be NULL, log_dir_items lacks such a
    check for
    Signed-off-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • [ Upstream commit b98def7ca6e152ee55e36863dddf6f41f12d1dc6 ]

    If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
    to bail out, otherwise @ret would be forced to be 0 after 'break;' and
    the caller won't be aware of it.

    Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
    Reviewed-by: Nikolay Borisov
    Signed-off-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • [ Upstream commit f0849ac0b8e072073ec5fcc7fadd05a77434364e ]

    For PTE-mapped THP, the compound THP has not been split to normal 4K
    pages yet, the whole THP is considered referenced if any one of sub page
    is referenced.

    When walking PTE-mapped THP by pvmw, all relevant PTEs will be checked
    to retrieve referenced bit. But, the current code just returns the
    result of the last PTE. If the last PTE has not referenced, the
    referenced flag will be cleared.

    Just set referenced when ptep{pmdp}_clear_young_notify() returns true.

    Link: http://lkml.kernel.org/r/1518212451-87134-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Yang Shi
    Reported-by: Gang Deng
    Suggested-by: Kirill A. Shutemov
    Reviewed-by: Andrew Morton
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Yang Shi
     
  • [ Upstream commit e92bb4dd9673945179b1fc738c9817dd91bfb629 ]

    When page_mapping() is called and the mapping is dereferenced in
    page_evicatable() through shrink_active_list(), it is possible for the
    inode to be truncated and the embedded address space to be freed at the
    same time. This may lead to the following race.

    CPU1 CPU2

    truncate(inode) shrink_active_list()
    ... page_evictable(page)
    truncate_inode_page(mapping, page);
    delete_from_page_cache(page)
    spin_lock_irqsave(&mapping->tree_lock, flags);
    __delete_from_page_cache(page, NULL)
    page_cache_tree_delete(..)
    ... mapping = page_mapping(page);
    page->mapping = NULL;
    ...
    spin_unlock_irqrestore(&mapping->tree_lock, flags);
    page_cache_free_page(mapping, page)
    put_page(page)
    if (put_page_testzero(page)) -> false
    - inode now has no pages and can be freed including embedded address_space

    mapping_unevictable(mapping)
    test_bit(AS_UNEVICTABLE, &mapping->flags);
    - we've dereferenced mapping which is potentially already free.

    Similar race exists between swap cache freeing and page_evicatable()
    too.

    The address_space in inode and swap cache will be freed after a RCU
    grace period. So the races are fixed via enclosing the page_mapping()
    and address_space usage in rcu_read_lock/unlock(). Some comments are
    added in code to make it clear what is protected by the RCU read lock.

    Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Reviewed-by: Jan Kara
    Reviewed-by: Andrew Morton
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Huang Ying
     
  • [ Upstream commit 77da2ba0648a4fd52e5ff97b8b2b8dd312aec4b0 ]

    This patch fixes a corner case for KSM. When two pages belong or
    belonged to the same transparent hugepage, and they should be merged,
    KSM fails to split the page, and therefore no merging happens.

    This bug can be reproduced by:
    * making sure ksm is running (in case disabling ksmtuned)
    * enabling transparent hugepages
    * allocating a THP-aligned 1-THP-sized buffer
    e.g. on amd64: posix_memalign(&p, 1<<<<
    Co-authored-by: Gerald Schaefer
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Christian Borntraeger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Claudio Imbrenda
     
  • [ Upstream commit 41f714672f93608751dbd2fa2291d476a8ff0150 ]

    The counter that tracks used TX descriptors pending completion
    needs to be zeroed as part of a device reset. This change fixes
    a bug causing transmit queues to be stopped unnecessarily and in
    some cases a transmit queue stall and timeout reset. If the counter
    is not reset, the remaining descriptors will not be "removed",
    effectively reducing queue capacity. If the queue is over half full,
    it will cause the queue to stall if stopped.

    Signed-off-by: Thomas Falcon
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Falcon
     
  • [ Upstream commit 76327a35caabd1a932e83d6a42b967aa08584e5d ]

    The datasheet specifies a 3uS pause after performing a software
    reset. The default implementation of genphy_soft_reset() does not
    provide this, so implement soft_reset with the needed pause.

    Signed-off-by: Esben Haabendal
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Esben Haabendal